Skip to content

Latest commit

 

History

History
131 lines (93 loc) · 3.38 KB

File metadata and controls

131 lines (93 loc) · 3.38 KB

Providers

Microcode ships two Provider implementations: anthropic (Claude Messages API) and openai (any OpenAI-compatible Chat Completions endpoint).

Anthropic (default)

[provider]
kind = "anthropic"
# Optional overrides:
# base_url    = "https://api.anthropic.com"
# api_key_env = "ANTHROPIC_API_KEY"

model = "claude-sonnet-4-6"

Set ANTHROPIC_API_KEY in your environment. Get a key at https://console.anthropic.com/settings/keys.

OpenAI

[provider]
kind = "openai"
base_url = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"

model = "gpt-4o"

OpenRouter

[provider]
kind = "openai"
base_url = "https://openrouter.ai/api/v1"
api_key_env = "OPENROUTER_API_KEY"

model = "anthropic/claude-sonnet-4"

Together / Fireworks / DeepSeek

[provider]
kind = "openai"
base_url = "https://api.together.xyz/v1"
api_key_env = "TOGETHER_API_KEY"

model = "Qwen/Qwen2.5-Coder-32B-Instruct"
[provider]
kind = "openai"
base_url = "https://api.deepseek.com/v1"
api_key_env = "DEEPSEEK_API_KEY"

model = "deepseek-coder"

Local: Ollama

Ollama exposes an OpenAI-compatible API at /v1:

[provider]
kind = "openai"
base_url = "http://localhost:11434/v1"
# No api_key_env — Ollama doesn't require auth.

model = "qwen3-coder:30b"

Make sure the model supports tool calling. Pure base models without tool-call training will not be useful.

Local: vLLM

[provider]
kind = "openai"
base_url = "http://localhost:8000/v1"
# api_key_env = "VLLM_API_KEY"   # only if you launched vllm with an API key

model = "Qwen/Qwen3-Coder-30B-A3B-Instruct"

Local: LM Studio

[provider]
kind = "openai"
base_url = "http://localhost:1234/v1"

model = "qwen-3-coder-30b"

Custom pricing

Most local and aggregator endpoints aren't in microcode's built-in price table. For accurate /cost, configure prices manually:

[pricing."qwen3-coder:30b"]
input_per_mtok  = 0.0
output_per_mtok = 0.0

[pricing."deepseek-coder"]
input_per_mtok  = 0.14
output_per_mtok = 0.28

/cost and /tokens will use these when computing the running total.

Tips

  • Tool calling support is required. Microcode depends on streaming tool calls (tool_use for Anthropic, tool_calls for OpenAI). Models without tool support won't work.
  • Streaming behaviour differs. Anthropic emits typed events (content_block_start / content_block_delta). OpenAI emits opaque JSON chunks where arguments is a fragmented JSON string. Microcode handles both internally; you don't have to think about it.
  • Latency varies wildly across endpoints. Cloud Claude/GPT typically respond in 100–500ms. Local Ollama on a Mac M3 with 30B models is more like 1–3s to first token. Be patient with /verify runs especially.
  • Microcode never sends tool output to the model wholesale. Each tool result is bounded (200 KB for bash, 1.5 MB for read). Long-running commands should have explicit timeouts.

Adding a new provider

If you need a third style (e.g. Cohere, Vertex AI), implement the Provider trait in a new file under src/provider/:

#[async_trait]
pub trait Provider: Send + Sync {
    async fn complete(&self, req: CompletionRequest<'_>) -> Result<EventStream<'_>>;
}

Translate request/response and emit CompletionEvents. Then extend ProviderConfig and provider_from() in src/run.rs. ~150 LoC for a typical streaming API.