-
Notifications
You must be signed in to change notification settings - Fork 51
Description
Hi,
I wanted to ask if batched inference is supported for Ming-Lite-Omni-1.5? All the usage examples in the repo (examples/) and the cookbook.ipynb only demonstrate inference on a single inference inference (multi-turn conversations are still a singular conversation, hence only one input).
I tried to batching inputs (text-only, and passed None to all other input modalities within the processor) but I kept getting errors about past_token_values and rope_deltas. When I passed a fully batched input to the other modalities as well (List[None]), then I was getting the error processor cannot convert list of None to list of images (something very similar to that).
Even the vLLM inference examples are only for singular input samples. Is there no way to do batched inference?