EvalView provides dedicated testing adapters for the most popular AI agent frameworks. Each adapter handles framework-specific API formats, tool call extraction, and response parsing automatically.
EvalView supports multiple AI agent frameworks out of the box. Each framework has a dedicated adapter that handles its specific API format.
| Framework | Adapter | Auto-Detect | Default Port | Endpoint |
|---|---|---|---|---|
| LangGraph | langgraph |
✅ | 8000 | /api/chat or /invoke |
| LangServe | http or streaming |
✅ | 8000 | /agent or /agent/stream |
| CrewAI | crewai |
✅ | 8000 | /crew |
| OpenAI Assistants | openai-assistants |
N/A | N/A | Uses OpenAI API |
| TapeScope | streaming |
✅ | 3000 | /api/unifiedchat |
| Generic REST | http |
✅ | Any | Any |
| Generic Streaming | streaming |
✅ | Any | Any |
# Start your agent server first
# Then let EvalView detect it automatically
evalview connectThe connect command will:
- Try common endpoints
- Detect which framework is running
- Configure the correct adapter automatically
- Update
.evalview/config.yaml
Edit .evalview/config.yaml:
adapter: langgraph # or crewai, http, streaming, etc.
endpoint: http://localhost:8000/api/chat
timeout: 30.0What it supports:
- Standard invoke endpoint
- Streaming responses
- Message-based APIs
- Thread tracking
Setup:
# Start LangGraph agent
cd /path/to/langgraph-agent
python main.py
# or
uvicorn main:app --reload --port 8000
# Connect EvalView
evalview connectConfig:
adapter: langgraph
endpoint: http://localhost:8000/api/chat
streaming: false # Set to true for streaming endpoints
timeout: 30.0
model:
name: gpt-4o-miniTest Case Example:
name: "LangGraph Test"
input:
query: "What is the weather in SF?"
context: {}
expected:
tools: [tavily_search] # Update with your actual tools
output:
contains: ["San Francisco", "weather"]
thresholds:
min_score: 70
max_cost: 0.50
max_latency: 10000Response Format Expected:
{
"messages": [
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."}
],
"thread_id": "...",
"intermediate_steps": [...]
}What it supports:
- Task-based execution
- Multi-agent crews
- Usage metrics
Setup:
# Start CrewAI API
cd /path/to/crewai-agent
python api.py # or however you serve it
# Connect
evalview connectConfig:
adapter: crewai
endpoint: http://localhost:8000/crew
timeout: 120.0 # CrewAI can be slowTest Case Example:
name: "CrewAI Research Test"
input:
query: "Research AI trends in 2025"
context: {}
expected:
tools: [] # CrewAI uses agents, not direct tools
output:
contains: ["AI", "trends", "2025"]
thresholds:
min_score: 75
max_cost: 2.00
max_latency: 60000 # 60 secondsResponse Format Expected:
{
"result": "Final crew output",
"tasks": [
{
"id": "task-1",
"description": "Research task",
"output": "...",
"status": "completed"
}
],
"usage_metrics": {
"total_tokens": 1500,
"total_cost": 0.045
}
}What it supports:
- OpenAI Assistants API
- Function calling
- Code interpreter
- File search/retrieval
Setup:
# Set your OpenAI API key
export OPENAI_API_KEY=sk-...
# No server needed - uses OpenAI API directlyConfig:
adapter: openai-assistants
assistant_id: asst_xxxxxxxxxxxxx # Your assistant ID
timeout: 120.0Test Case Example:
name: "OpenAI Assistant Test"
input:
query: "Calculate the fibonacci sequence up to 10"
context:
assistant_id: asst_xxxxxxxxxxxxx # Can override here too
expected:
tools: [code_interpreter]
output:
contains: ["fibonacci", "0, 1, 1, 2, 3, 5, 8"]
thresholds:
min_score: 80
max_cost: 0.50
max_latency: 30000Notes:
- Requires
openaiPython package:pip install openai - Uses threads and runs under the hood
- Automatically polls for completion
What it supports:
- Standard REST endpoints
- Streaming via Server-Sent Events
- Batch processing
Setup:
# Start LangServe
cd /path/to/langserve-app
python server.py
# Connect
evalview connectConfig (non-streaming):
adapter: http
endpoint: http://localhost:8000/agent/invoke
timeout: 30.0Config (streaming):
adapter: streaming
endpoint: http://localhost:8000/agent/stream
timeout: 60.0For any custom REST API
Config:
adapter: http
endpoint: http://localhost:YOUR_PORT/YOUR_PATH
timeout: 30.0
headers:
Authorization: Bearer YOUR_TOKEN
Content-Type: application/jsonExpected Request Format:
{
"query": "User query here",
"context": {}
}Expected Response Format:
{
"session_id": "...",
"output": "Final response",
"steps": [
{
"id": "step-1",
"name": "Step name",
"tool": "tool_name",
"parameters": {...},
"output": {...},
"latency": 123,
"cost": 0.001
}
],
"cost": 0.05,
"tokens": 1000
}If your framework isn't supported, create a custom adapter:
# evalview/adapters/my_adapter.py
from evalview.adapters.base import AgentAdapter
from evalview.core.types import ExecutionTrace, StepTrace, StepMetrics, ExecutionMetrics
from datetime import datetime
class MyAdapter(AgentAdapter):
@property
def name(self) -> str:
return "my-adapter"
async def execute(self, query: str, context=None) -> ExecutionTrace:
# 1. Call your agent API
# 2. Parse response
# 3. Extract steps and output
# 4. Return ExecutionTrace
passRegister in cli.py:
from evalview.adapters.my_adapter import MyAdapter
# In _run_async():
elif adapter_type == "my-adapter":
adapter = MyAdapter(...)See ADAPTERS.md for full guide.
# Test endpoint manually
curl -X POST http://localhost:8000/api/chat \
-H "Content-Type: application/json" \
-d '{"query": "test"}'
# Check if server is running
lsof -i :8000
# Try auto-detect
evalview connectManually set in .evalview/config.yaml:
adapter: langgraph # Override auto-detectionRun with verbose to see actual response:
evalview run --verboseThen adjust your test case or create a custom adapter.
Increase timeout:
timeout: 120.0 # 2 minutes| Feature | LangGraph | CrewAI | OpenAI | LangServe |
|---|---|---|---|---|
| Streaming | ✅ | ❌ | ❌ | ✅ |
| Multi-step | ✅ | ✅ | ✅ | ✅ |
| Self-hosted | ✅ | ✅ | ❌ | ✅ |
| Tool tracking | ✅ | Partial | ✅ | ✅ |
| Cost tracking | Manual | ✅ | ✅ | Manual |
- Always use
evalview connectfirst - Let it auto-detect - Start with verbose mode - Understand API responses
- Check framework docs - Verify endpoint paths
- Use framework-specific adapters - Better parsing and metrics
- Monitor timeouts - Some agents can be slow
- Check QUICKSTART_LANGGRAPH.md for LangGraph
- Check SETUP_LANGGRAPH_EXAMPLE.md for detailed setup
- Check ADAPTERS.md for custom adapters
- Open an issue: https://github.com/hidai25/eval-view/issues
- Adapters Guide — How to build custom adapters
- Getting Started — Install and run your first test
- Quick Start: LangGraph — LangGraph-specific setup
- Quick Start: HuggingFace — Free, open-source testing
- Backend Requirements — What your agent backend needs to expose
- Troubleshooting — Common issues and solutions
Sources: