Skip to content

Add CrewAI + EvalView integration — regression testing for crews#352

Open
hidai25 wants to merge 1 commit intocrewAIInc:mainfrom
hidai25:feat/evalview-integration
Open

Add CrewAI + EvalView integration — regression testing for crews#352
hidai25 wants to merge 1 commit intocrewAIInc:mainfrom
hidai25:feat/evalview-integration

Conversation

@hidai25
Copy link
Copy Markdown

@hidai25 hidai25 commented Mar 25, 2026

Summary

Adds a self-contained integration example showing how to use EvalView for regression testing CrewAI crews.

EvalView complements crewai test — while crewai test runs crews N times and shows scores, EvalView snapshots the full execution trace (which agent called which tool, with what parameters, in what order) and diffs it against a golden baseline on every change.

This addresses the use case described in issue #4174 — deterministic CI regression checks for tool-using agents.

What's included

  • integrations/CrewAI-EvalView/
    • README.md — setup guide, test examples, CI config, watch mode
    • crew.py — example research + writing crew with tools
    • main.py — runnable entry point using EvalView's Python API
    • tests/research-report.yaml — tool-calling regression test
    • tests/safety-check.yaml — forbidden tool safety test
    • requirements.txt — crewai + evalview dependencies
    • .env.example

How it works

The native adapter calls crew.kickoff() directly (no HTTP server), captures tool calls via CrewAI's event bus (ToolUsageFinishedEvent), and returns structured traces for diffing.

from evalview.adapters.crewai_native_adapter import CrewAINativeAdapter

adapter = CrewAINativeAdapter(crew=crew)
# Then: evalview snapshot → evalview check → CI gate

Test plan

  • Example runs with python main.py
  • evalview snapshot captures baseline
  • evalview check detects regressions after changes
  • CI workflow blocks PRs on regression

Self-contained example showing how to use EvalView to regression-test
CrewAI crews. Uses the native adapter (crew.kickoff() in-process,
tool call capture via event bus) — no HTTP server needed.

Includes: example crew, test YAMLs, CI config, safety test with
forbidden_tools, and watch mode for prompt iteration.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant