Observability: real-time lifecycle dashboard + telemetry instrumentation for analysis runs

## Summary

The multi-agent orchestration workflow (discovery → library analysis → application analysis → synthesis) currently has no structured telemetry or real-time visibility into its execution. There is a nascent `observability/` directory with a static `session_profiler.html` and a `live_monitor.sh` script, but these are post-hoc tools that require manual wiring. This issue tracks adding first-class, always-on observability so any run can be profiled and monitored in a dashboard without extra setup.

---

## Problem

When a full analysis run executes (e.g., 24 libraries + 10 applications across 3 language stacks), there is no way to:
- See which subagents are currently active vs. queued vs. complete in real time
- Measure wall-clock time per phase (depth-0 libraries, depth-N libraries, application analysis, synthesis)
- Measure per-subagent token consumption, tool-call counts, and latency
- Detect stalled or failed subagents without tailing raw log files
- Compare performance across runs (e.g., a re-analysis after a diff vs. a full cold run)

The only current signal is `logs/latest/tool_calls.jsonl` and `transcript.txt`, which require manual parsing.

---

## Proposed Work

### 1. Structured Telemetry Emission

Instrument the lead agent and subagent lifecycle with structured span events written to a `telemetry.jsonl` file alongside the existing `tool_calls.jsonl`:

```jsonc
// Span open
{ "event": "span_start", "span_id": "lib-encoding", "parent": "phase-depth2", "component": "encoding", "kind": "library-analysis", "ts": 1712534400.123 }

// Span close
{ "event": "span_end", "span_id": "lib-encoding", "status": "ok", "duration_ms": 14820, "tokens": 42100, "tool_uses": 31, "ts": 1712534415.001 }

// Phase boundary
{ "event": "phase", "name": "library-depth-0-complete", "libraries": 9, "wall_ms": 68400, "ts": 1712534400.999 }
```

Spans to instrument:
- Full run (root span)
- Each phase (depth-0 libs, depth-N libs, application analysis, synthesis)
- Each subagent invocation (library / application / external-service / architecture-documenter)
- Discovery engine execution
- Manifest write

### 2. Live Dashboard (upgrade `session_profiler.html`)

Upgrade `observability/session_profiler.html` into a proper live dashboard that:

- **Auto-refreshes** by polling `telemetry.jsonl` (or a small SSE/WebSocket endpoint from `serve_logs.py`) every ~2s
- **Gantt / swimlane view** — one row per subagent, colored by phase, with wall-clock time on the x-axis; shows in-progress spans with an animated fill
- **Phase summary bar** — at the top: total elapsed, current phase, % complete, active agent count
- **Per-agent cards** — name, kind, status (queued / running / done / error), elapsed, tokens, tool calls
- **Token burn rate chart** — rolling 30s tokens/minute across all active subagents
- **Error/warning panel** — surfaces any `"status": "error"` spans immediately

Tech: keep it as a single-file HTML + vanilla JS + D3 (already imported); serve via the existing `serve_logs.py`.

### 3. `serve_logs.py` SSE endpoint

Add a `/events` Server-Sent Events endpoint to `serve_logs.py` that tails `telemetry.jsonl` and pushes new lines to connected browsers. This removes the need for the dashboard to poll a file and enables sub-second latency updates.

### 4. Run Summary Report

After synthesis completes, write `logs/latest/run_summary.json` with:

```jsonc
{
  "run_id": "eigenda-20260408-abc123",
  "source_repo": "https://github.com/Layr-Labs/eigenda",
  "source_commit": "61019b4",
  "total_wall_ms": 312500,
  "phases": {
    "discovery": { "wall_ms": 1200 },
    "library_depth_0": { "wall_ms": 68400, "agents": 9 },
    "library_depth_n": { "wall_ms": 112000, "agents": 15 },
    "application_analysis": { "wall_ms": 98000, "agents": 10 },
    "synthesis": { "wall_ms": 32900, "agents": 1 }
  },
  "totals": {
    "agents_spawned": 35,
    "total_tokens": 1842000,
    "total_tool_uses": 847,
    "analyses_written": 45
  }
}
```

This enables cross-run benchmarking and regression detection.

---

## Acceptance Criteria

- [ ] `telemetry.jsonl` is written automatically on every run with span_start / span_end / phase events
- [ ] `observability/session_profiler.html` shows a live Gantt view that updates without page refresh during an active run
- [ ] `serve_logs.py` exposes a `/events` SSE endpoint
- [ ] `logs/latest/run_summary.json` is written at the end of every run
- [ ] Dashboard correctly reflects the EigenDA run profile (9 parallel depth-0 agents, 15 sequential depth-N agents, 10 parallel app agents, 1 synthesizer)

---

## Context

The `observability/` directory already has scaffolding (`session_profiler.html`, `live_monitor.sh`, `serve_logs.py`) — this issue is about making that scaffolding production-quality and always-on rather than opt-in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability: real-time lifecycle dashboard + telemetry instrumentation for analysis runs #5

Summary

Problem

Proposed Work

1. Structured Telemetry Emission

2. Live Dashboard (upgrade `session_profiler.html`)

3. `serve_logs.py` SSE endpoint

4. Run Summary Report

Acceptance Criteria

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Observability: real-time lifecycle dashboard + telemetry instrumentation for analysis runs #5

Description

Summary

Problem

Proposed Work

1. Structured Telemetry Emission

2. Live Dashboard (upgrade session_profiler.html)

3. serve_logs.py SSE endpoint

4. Run Summary Report

Acceptance Criteria

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

2. Live Dashboard (upgrade `session_profiler.html`)

3. `serve_logs.py` SSE endpoint