A goal-directed cryptographic bug finder driven by Claude Code or OpenAI Codex. Both agents see an identical containerized workspace and toolchain — swap between them with a flag.
- Goal-directed, not procedural. The agent picks the approach (pattern matching, differential testing, formal modeling, exploit reproduction); methodology is a means, not a playbook.
- Two cooperating modes.
formalizegrows a cumulative Lean model;huntattacks adversarially. Both share a per-target state directory; can run concurrently in isolated state snapshots. - Agent-agnostic. Identical Docker workspace for Claude Code and Codex; reasoning effort and model are flags.
- Three-knob budget.
--cycles×--cycle-budget×--timeout. No SDK iteration or spend caps layered on top.
# 1. Build (15-25 min one-time, Mathlib cache).
./scripts/build-image
# 2. Set Claude credentials (or codex login).
claude setup-token # interactive; prints a token
export CLAUDE_CODE_OAUTH_TOKEN=<paste it>
# 3. Verify the install end-to-end (one cycle on a smoke target).
./scripts/test-install
# 4. Run on any target.
./scripts/hunt targets/smoke/smoke-01Findings land at runs/<run-id>/artifacts/findings.json. Before running
applied-tier targets, run ./scripts/generate-fixtures once.
Get started
- Setup — hardware, build, auth (Claude / Codex)
- Running — flags, modes, budgets, snapshot pipeline, output
Reference
- Targets — tiers, anonymization,
audit.mdconventions - Troubleshooting — common issues
Internals
- Design — architecture, decisions, trade-offs
- Roadmap — bug-class roadmap
- Agent system prompt — methodology the agent follows
- Mode prompts — per-cycle prompts for
huntandformalize
runner/ per-cycle loop + adapter protocol (claude.py, codex.py, base.py)
prompts/ per-cycle agent prompts (hunt.md, formalize.md)
instructions/ agent's always-loaded system prompt (AGENTS.md)
scripts/ hunt, hunt-all, hunt-local, republish, build-image, generate-fixtures, scrub-secrets
env/ Docker build context — Dockerfile, lakefile, requirements.txt, vendored Lean skills, MCP servers (mcp/lean, mcp/rocq)
targets/ smoke / applied / blind / production / private (private + production gitignored)
docs/ design notes, install / running / troubleshooting guides
runs/ per-run output (gitignored except runs/published/ for demo evidence)
MIT.
Research / experimental. Findings are agent-surfaced candidates with line citations and runnable repros; the operator verifies.