Microcode ships two Provider implementations: anthropic (Claude Messages API) and openai (any OpenAI-compatible Chat Completions endpoint).
[provider]
kind = "anthropic"
# Optional overrides:
# base_url = "https://api.anthropic.com"
# api_key_env = "ANTHROPIC_API_KEY"
model = "claude-sonnet-4-6"Set ANTHROPIC_API_KEY in your environment. Get a key at https://console.anthropic.com/settings/keys.
[provider]
kind = "openai"
base_url = "https://api.openai.com/v1"
api_key_env = "OPENAI_API_KEY"
model = "gpt-4o"[provider]
kind = "openai"
base_url = "https://openrouter.ai/api/v1"
api_key_env = "OPENROUTER_API_KEY"
model = "anthropic/claude-sonnet-4"[provider]
kind = "openai"
base_url = "https://api.together.xyz/v1"
api_key_env = "TOGETHER_API_KEY"
model = "Qwen/Qwen2.5-Coder-32B-Instruct"[provider]
kind = "openai"
base_url = "https://api.deepseek.com/v1"
api_key_env = "DEEPSEEK_API_KEY"
model = "deepseek-coder"Ollama exposes an OpenAI-compatible API at /v1:
[provider]
kind = "openai"
base_url = "http://localhost:11434/v1"
# No api_key_env — Ollama doesn't require auth.
model = "qwen3-coder:30b"Make sure the model supports tool calling. Pure base models without tool-call training will not be useful.
[provider]
kind = "openai"
base_url = "http://localhost:8000/v1"
# api_key_env = "VLLM_API_KEY" # only if you launched vllm with an API key
model = "Qwen/Qwen3-Coder-30B-A3B-Instruct"[provider]
kind = "openai"
base_url = "http://localhost:1234/v1"
model = "qwen-3-coder-30b"Most local and aggregator endpoints aren't in microcode's built-in price table. For accurate /cost, configure prices manually:
[pricing."qwen3-coder:30b"]
input_per_mtok = 0.0
output_per_mtok = 0.0
[pricing."deepseek-coder"]
input_per_mtok = 0.14
output_per_mtok = 0.28/cost and /tokens will use these when computing the running total.
- Tool calling support is required. Microcode depends on streaming tool calls (
tool_usefor Anthropic,tool_callsfor OpenAI). Models without tool support won't work. - Streaming behaviour differs. Anthropic emits typed events (
content_block_start/content_block_delta). OpenAI emits opaque JSON chunks whereargumentsis a fragmented JSON string. Microcode handles both internally; you don't have to think about it. - Latency varies wildly across endpoints. Cloud Claude/GPT typically respond in 100–500ms. Local Ollama on a Mac M3 with 30B models is more like 1–3s to first token. Be patient with
/verifyruns especially. - Microcode never sends tool output to the model wholesale. Each tool result is bounded (200 KB for bash, 1.5 MB for read). Long-running commands should have explicit timeouts.
If you need a third style (e.g. Cohere, Vertex AI), implement the Provider trait in a new file under src/provider/:
#[async_trait]
pub trait Provider: Send + Sync {
async fn complete(&self, req: CompletionRequest<'_>) -> Result<EventStream<'_>>;
}Translate request/response and emit CompletionEvents. Then extend ProviderConfig and provider_from() in src/run.rs. ~150 LoC for a typical streaming API.