Add VLM inference infrastructure: engine, protocol, and CLI support by stikves · Pull Request #65 · apple/coreai-models

stikves · 2026-06-25T17:52:21Z

Runtime:

MultimodalInferenceEngine protocol with encodeImage() and generate()
CoreAISequentialVLMEngine: vision encoder + projector + embed_tokens + LLM decoder with scatter-merge of image embeddings at placeholder positions
EmbeddedInput type wrapping NDArray embeddings with position metadata
VisionConfig in LanguageConfig for image_size, patch_size, token count/id
LanguageBundle parses top-level "vision" block from metadata.json

CLI (llm-runner):

--image flag routes through VLM engine when bundle kind is .vlm
Chat template detection with generic fallback for prompt construction
Accumulated token decode for correct spacing
Stop sequence support in VLM path

Supports any VLM that exports 3 components (vision.aimodel, embed.aimodel, model.aimodel) with a vision config block in metadata.json. Model-family- specific export code lives in internal/python.

Runtime: - MultimodalInferenceEngine protocol with encodeImage() and generate() - CoreAISequentialVLMEngine: vision encoder + projector + embed_tokens + LLM decoder with scatter-merge of image embeddings at placeholder positions - EmbeddedInput type wrapping NDArray embeddings with position metadata - VisionConfig in LanguageConfig for image_size, patch_size, token count/id - LanguageBundle parses top-level "vision" block from metadata.json CLI (llm-runner): - --image flag routes through VLM engine when bundle kind is .vlm - Chat template detection with generic fallback for prompt construction - Accumulated token decode for correct spacing - Stop sequence support in VLM path Supports any VLM that exports 3 components (vision.aimodel, embed.aimodel, model.aimodel) with a vision config block in metadata.json. Model-family- specific export code lives in internal/python.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add VLM inference infrastructure: engine, protocol, and CLI support#65

Add VLM inference infrastructure: engine, protocol, and CLI support#65
stikves wants to merge 1 commit into
apple:mainfrom
stikves:sukru/vlm-infra

stikves commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

stikves commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant