Run Limit with local models using Ollama, LM Studio, vLLM, or any OpenAI-compatible server.
# ~/.limit/config.toml
provider = "local"
[providers.local]
model = "llama3.2" # Your model name
base_url = "http://localhost:11434/v1/chat/completions"That's it! The local provider requires no API key and uses sensible defaults for local servers.
| Provider | Default Port | Status |
|---|---|---|
| Ollama | 11434 | Full support |
| LM Studio | 1234 | Full support |
| vLLM | 8000 | Full support |
| Other | Varies | OpenAI-compatible |
Limit accepts these provider names (all use the same OpenAI-compatible protocol):
local- Generic local provider (recommended)ollama- Ollama-specific aliaslmstudio- LM Studio-specific aliasvllm- vLLM-specific alias
# All equivalent:
provider = "local"
provider = "ollama"
provider = "lmstudio"
provider = "vllm"Ollama is the most popular way to run LLMs locally.
# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Or via Homebrew
brew install ollamaollama serveollama pull llama3.2
ollama pull qwen2.5-coder:7b
ollama pull deepseek-coder:6.7bprovider = "ollama"
[providers.ollama]
model = "llama3.2"
base_url = "http://localhost:11434/v1/chat/completions"
# api_key not required for Ollamaollama list| Model | Size | Best For |
|---|---|---|
qwen2.5-coder:7b |
7B | General coding, fast |
deepseek-coder:6.7b |
6.7B | Code generation |
codellama:7b |
7B | Code completion |
llama3.2:3b |
3B | Lightweight, fast responses |
llama3.1:8b |
8B | General purpose |
LM Studio provides a GUI to run local models.
- Download from lmstudio.ai
- Open LM Studio
- Go to the "Local Server" tab
- Start the server (default:
http://localhost:1234) - Load a model
provider = "lmstudio"
[providers.lmstudio]
model = "local-model" # Model name shown in LM Studio
base_url = "http://localhost:1234/v1/chat/completions"- LM Studio must be running with a model loaded
- The model name in config should match what's shown in LM Studio
- Supports GGUF format models from Hugging Face
vLLM is a high-performance inference server.
pip install vllmvllm serve meta-llama/Llama-3.2-3B-Instruct --port 8000provider = "vllm"
[providers.vllm]
model = "meta-llama/Llama-3.2-3B-Instruct"
base_url = "http://localhost:8000/v1/chat/completions"[providers.vllm]
model = "meta-llama/Llama-3.2-3B-Instruct"
base_url = "http://localhost:8000/v1/chat/completions"
api_key = "hf_xxx" # If server requires authAny OpenAI-compatible API server works with the local provider.
provider = "local"
[providers.local]
model = "your-model-name"
base_url = "http://your-server:port/v1/chat/completions"
api_key = "" # Optional, if server requires auth
max_tokens = 4096
timeout = 120| Server | Typical Endpoint |
|---|---|
| Ollama | /v1/chat/completions |
| LM Studio | /v1/chat/completions |
| vLLM | /v1/chat/completions |
| text-generation-webui | /v1/chat/completions |
| LocalAI | /v1/chat/completions |
Important: Always include the full endpoint path in
base_url. Limit does not auto-append paths.
[providers.local]
model = "llama3.2" # Required: model identifier
base_url = "http://..." # Required: full API endpoint
api_key = "" # Optional: auth key if needed
max_tokens = 4096 # Optional: max output tokens (default: 4096)
timeout = 120 # Optional: request timeout in seconds (default: 60)
max_iterations = 100 # Optional: agent loop limit (default: 100)You can also set the base URL via environment variable:
export LOCAL_API_KEY="" # Not needed for most local servers
lim- Ensure your local server is running
- Check the port matches your server
- Verify
base_urlincludes the full path
- Verify
base_urlpath is correct - Check server logs for the correct endpoint
- Some servers use
/v1/api/completionsinstead of/v1/chat/completions
- Try a smaller model (e.g.,
llama3.2:3binstead ofllama3.1:8b) - Increase
timeoutif model is slow to generate - Check GPU/CPU utilization
- Use a quantized model (GGUF Q4_K_M or similar)
- Reduce model size (fewer parameters)
- Close other applications
# Start Limit
lim
# Check current model
lim> /model
# Simple test
lim> hello, can you help me with code?- Configuration Guide - Full configuration reference
- OpenAI Setup - Using OpenAI or compatible APIs
- Development Guide - Contributing to Limit