Summary
The multi-agent orchestration workflow (discovery → library analysis → application analysis → synthesis) currently has no structured telemetry or real-time visibility into its execution. There is a nascent observability/ directory with a static session_profiler.html and a live_monitor.sh script, but these are post-hoc tools that require manual wiring. This issue tracks adding first-class, always-on observability so any run can be profiled and monitored in a dashboard without extra setup.
Problem
When a full analysis run executes (e.g., 24 libraries + 10 applications across 3 language stacks), there is no way to:
- See which subagents are currently active vs. queued vs. complete in real time
- Measure wall-clock time per phase (depth-0 libraries, depth-N libraries, application analysis, synthesis)
- Measure per-subagent token consumption, tool-call counts, and latency
- Detect stalled or failed subagents without tailing raw log files
- Compare performance across runs (e.g., a re-analysis after a diff vs. a full cold run)
The only current signal is logs/latest/tool_calls.jsonl and transcript.txt, which require manual parsing.
Proposed Work
1. Structured Telemetry Emission
Instrument the lead agent and subagent lifecycle with structured span events written to a telemetry.jsonl file alongside the existing tool_calls.jsonl:
Spans to instrument:
- Full run (root span)
- Each phase (depth-0 libs, depth-N libs, application analysis, synthesis)
- Each subagent invocation (library / application / external-service / architecture-documenter)
- Discovery engine execution
- Manifest write
2. Live Dashboard (upgrade session_profiler.html)
Upgrade observability/session_profiler.html into a proper live dashboard that:
- Auto-refreshes by polling
telemetry.jsonl (or a small SSE/WebSocket endpoint from serve_logs.py) every ~2s
- Gantt / swimlane view — one row per subagent, colored by phase, with wall-clock time on the x-axis; shows in-progress spans with an animated fill
- Phase summary bar — at the top: total elapsed, current phase, % complete, active agent count
- Per-agent cards — name, kind, status (queued / running / done / error), elapsed, tokens, tool calls
- Token burn rate chart — rolling 30s tokens/minute across all active subagents
- Error/warning panel — surfaces any
"status": "error" spans immediately
Tech: keep it as a single-file HTML + vanilla JS + D3 (already imported); serve via the existing serve_logs.py.
3. serve_logs.py SSE endpoint
Add a /events Server-Sent Events endpoint to serve_logs.py that tails telemetry.jsonl and pushes new lines to connected browsers. This removes the need for the dashboard to poll a file and enables sub-second latency updates.
4. Run Summary Report
After synthesis completes, write logs/latest/run_summary.json with:
This enables cross-run benchmarking and regression detection.
Acceptance Criteria
Context
The observability/ directory already has scaffolding (session_profiler.html, live_monitor.sh, serve_logs.py) — this issue is about making that scaffolding production-quality and always-on rather than opt-in.
Summary
The multi-agent orchestration workflow (discovery → library analysis → application analysis → synthesis) currently has no structured telemetry or real-time visibility into its execution. There is a nascent
observability/directory with a staticsession_profiler.htmland alive_monitor.shscript, but these are post-hoc tools that require manual wiring. This issue tracks adding first-class, always-on observability so any run can be profiled and monitored in a dashboard without extra setup.Problem
When a full analysis run executes (e.g., 24 libraries + 10 applications across 3 language stacks), there is no way to:
The only current signal is
logs/latest/tool_calls.jsonlandtranscript.txt, which require manual parsing.Proposed Work
1. Structured Telemetry Emission
Instrument the lead agent and subagent lifecycle with structured span events written to a
telemetry.jsonlfile alongside the existingtool_calls.jsonl:Spans to instrument:
2. Live Dashboard (upgrade
session_profiler.html)Upgrade
observability/session_profiler.htmlinto a proper live dashboard that:telemetry.jsonl(or a small SSE/WebSocket endpoint fromserve_logs.py) every ~2s"status": "error"spans immediatelyTech: keep it as a single-file HTML + vanilla JS + D3 (already imported); serve via the existing
serve_logs.py.3.
serve_logs.pySSE endpointAdd a
/eventsServer-Sent Events endpoint toserve_logs.pythat tailstelemetry.jsonland pushes new lines to connected browsers. This removes the need for the dashboard to poll a file and enables sub-second latency updates.4. Run Summary Report
After synthesis completes, write
logs/latest/run_summary.jsonwith:{ "run_id": "eigenda-20260408-abc123", "source_repo": "https://github.com/Layr-Labs/eigenda", "source_commit": "61019b4", "total_wall_ms": 312500, "phases": { "discovery": { "wall_ms": 1200 }, "library_depth_0": { "wall_ms": 68400, "agents": 9 }, "library_depth_n": { "wall_ms": 112000, "agents": 15 }, "application_analysis": { "wall_ms": 98000, "agents": 10 }, "synthesis": { "wall_ms": 32900, "agents": 1 } }, "totals": { "agents_spawned": 35, "total_tokens": 1842000, "total_tool_uses": 847, "analyses_written": 45 } }This enables cross-run benchmarking and regression detection.
Acceptance Criteria
telemetry.jsonlis written automatically on every run with span_start / span_end / phase eventsobservability/session_profiler.htmlshows a live Gantt view that updates without page refresh during an active runserve_logs.pyexposes a/eventsSSE endpointlogs/latest/run_summary.jsonis written at the end of every runContext
The
observability/directory already has scaffolding (session_profiler.html,live_monitor.sh,serve_logs.py) — this issue is about making that scaffolding production-quality and always-on rather than opt-in.