kensa is an open source eval harness for agent codebases. It gives coding agents an opinionated CLI and bundled skills to generate scenarios, run them in subprocesses, judge results, and report failures.
npx skills add satyaborg/kensa
uv add kensaWorks for Claude Code, Codex, Cursor, OpenCode, Gemini CLI, and similar coding agents.
If you primarily use Claude Code, you can install it as a plugin:
/plugin marketplace add satyaborg/kensa
/plugin install kensa
Tell your coding agent:
evaluate this agent
That gives you the basic loop:
- your coding agent inspects the repo, sets up instrumentation and writes evals
- it runs
kensato execute scenarios and capture traces - deterministic checks run first
- the LLM judge only runs when those pass
- reports show what failed and why
- you review changes, approve fixes and iterate
Add instrument() before importing your LLM SDK:
from kensa import instrument
instrument()If you use the bundled skills, your coding agent will usually add this for you.
Provider extras
uv add "kensa[anthropic]"
uv add "kensa[openai]"
uv add "kensa[langchain]"
uv add "kensa[all]"| Command | What it does |
|---|---|
kensa init --blank |
Scaffold .kensa/ without example content |
kensa doctor |
Check instrumentation, config, and environment readiness |
kensa eval |
Run + judge + report in one command |
kensa report |
Show the latest results in terminal, Markdown, JSON, or HTML |
kensa analyze |
Flag slow, expensive, flaky, or error-prone traces |
If you want to author evals yourself:
kensa init --blank
kensa doctorScenarios live in .kensa/scenarios/*.yaml and point at your agent entrypoint with run_command.
id: classify_ticket
input: "Our entire team can't log in. SSO has returned 502 since 7am."
run_command: python agent.py {{input}}
checks:
- type: output_matches
params: { pattern: "^P[123]$" }
criteria: |
P1 is for outages or data loss affecting multiple users.For complete examples, see examples/.
- name: Run evals
run: uv run kensa eval --format markdownIf you only use deterministic checks, you do not need API keys. If you use criteria or judge, add judge provider secrets in CI.
- Docs
examples/has sample agents and scenariosCONTRIBUTING.mdcovers local development- Homepage
- Issues
- MIT License
