Skip to content

Cross-repo adoption audit — mirror of agent-eval audit (6 consumers incl. blueprint-agent + substrate catalog + per-repo specs) #27

@tangletools

Description

@tangletools

Why this audit

The 2026-05-22 agent-eval cross-repo audit (synthesis) catalogued @tangle-network/agent-eval adoption across consumers and produced 8 issues. Since then the substrate has moved significantly — re-baselining this audit body to match the current state.

Substrate snapshot as of 2026-05-23 16:00 UTC:

Package Then (audit ref) Now
@tangle-network/agent-eval 0.31.1 0.34.1 (#86 shipped AGENT_PROFILE_KINDS, toAgentProfileJson, buildSandboxAgentProfileCell — published today)
@tangle-network/agent-runtime (not pinned) 0.17.1 — significant surface change via #38 (yanked chat-turn, intent-router, model-resolution, profile-conformance, run, trace-bridge — 1737 LOC of unused exports) and #39 (sandbox 0.2 peer-range fix)
@tangle-network/sandbox (not pinned) 0.0.3 (Blueprint) / 0.2.1 (this runtime dev dep) — peer range now >=0.1.2 <0.3.0

Consumer list as of 2026-05-23

Six consumers now, not five (Blueprint joined today via tangle-network/blueprint-agent#1758 — merged 2026-05-23 13:13 UTC):

  1. tax-agent
  2. legal-agent
  3. creative-agent
  4. gtm-agent
  5. agent-builder
  6. blueprint-agent ← new; uses createSandboxPromptBackend + runAgentTaskStream + RuntimeStreamEvent (canonical event flow). Pinned at 0.16.1; bump to 0.17.1 is mechanically clean now that Blueprint's dead createBlueprintTraceBridge wrapper is being deleted (filed against Blueprint as a follow-up after the false-alarm #40 closed).

Audit shape — unchanged

Spawn six parallel sub-agents:

  1. Substrate catalog — enumerate every public export from @tangle-network/agent-runtime@0.17.1 at /Users/drew/webb/agent-runtime (or /home/drew/code/agent-runtime), group by capability area, flag post-0.15.0 additions, identify deprecations, identify surface still kept that has zero consumers (the #38 yank cut 1737 LOC of unused exports — repeat the same exercise on the 0.17.1 surface to catch the next round of dead code, e.g. analyst-loop if no consumer wires it).
    2-7. Per-consumer integration audittax-agent, legal-agent, creative-agent, gtm-agent, agent-builder, blueprint-agent. Each inventories imports, traces the integration shape, identifies gaps vs the current surface, identifies drift / staleness, produces verdict + 5 highest-leverage upgrades.

Each report writes to /tmp/audit/agent-runtime/<repo>-integration.md and the catalog writes to /tmp/audit/agent-runtime/catalog.md.

Lessons baked in from the false-alarm cycle on #40

The #40 false-alarm proved that the audit's value isn't just "find dead exports to yank" but also "distinguish dead wrappers in consumers from active wrappers." Blueprint had a createBlueprintTraceBridge wrapper that imported the yanked createTraceBridge — looked like a regression, was actually dead scaffolding the consumer should also delete.

Every per-consumer audit must explicitly call out:

  • Wrapper-around-runtime classes that are themselves never called. A consumer wrapping createXxx and exposing createConsumerXxx is only a real consumer if createConsumerXxx has live callsites. Otherwise the wrapper is dead, the runtime export it wraps is dead, and the right call is to delete BOTH (consumer follow-up issue) rather than restore the runtime export.

Update the spec template's §2 to include a "live callsites of the wrapper" check, not just "exported by the consumer".

Synthesis output

Synthesize the 6 reports into a CTO-level cross-repo report following the exact shape of the agent-eval audit:

  • Adoption matrix (rows = primitives, columns = 6 consumers, cells = ✅/⚠/❌)
  • Hand-rolled patterns universally duplicated (lift candidates)
  • Execution gaps in the substrate itself (shipped + unused — second-round yank candidates)
  • Scaffold-template gaps in agent-builder
  • Wrapper-deletion candidates per consumer (the lesson from #40)
  • Verdict on substrate usefulness
  • Ranked concrete actions

Per-repo CTO specs

Produce seven execution specs — six per consumer plus one for agent-runtime — at /tmp/audit/agent-runtime/spec-<repo>.md following the exact 10-section shape used by the agent-eval audit:

  • §0 Read-first context · §1 Executive summary · §2 Current state inventory (incl. live-callsites check on wrappers) · §3 Target architecture · §4 File-by-file migration tasks (T0X with file:line / current / target / why / test impact / completion check) · §5 Completion checklist (25-50 boxes) · §6 Test plan · §7 Rollout · §8 Risks + non-goals · §9 Citations · §10 Coordination

File the synthesis + per-repo specs to a new branch in agent-runtime: chore/cross-repo-runtime-audit-2026q2, mirroring the agent-eval audit branch structure.

File the resulting issues

Once specs land, file one issue per repo:

  • agent-runtime/[N+1.0] — substrate spec (second-round yank, absorb hand-rolled patterns, etc.)
  • agent-builder/[meta-spec] — scaffold updates relevant to agent-runtime adoption
  • tax-agent, legal-agent, creative-agent, gtm-agent, blueprint-agent — per-consumer execution specs

Issue body shape: executive summary + completion checklist + cross-spec coordination + raw link to canonical spec on the audit branch. Full spec lives in the branch because GitHub's 65 KB issue body limit doesn't fit 1200-1800-line specs directly.

Optionally file an [N+2.0] triage issue for unused agent-runtime surface, mirroring agent-eval#77.

Read-first context for the sub-agent

The agent-eval audit's outputs are the exact template — re-read them before starting:

Acceptance criteria

  • Branch chore/cross-repo-runtime-audit-2026q2 exists on tangle-network/agent-runtime with docs/audits/2026-MM-DD-cross-repo/ carrying the synthesis + catalog + 6 consumer audits + 7 specs
  • One issue per repo filed: agent-runtime (substrate), agent-builder (meta), tax-agent, legal-agent, creative-agent, gtm-agent, blueprint-agent
  • Optional triage issue if speculative surface exists
  • Each spec follows the 10-section CTO shape: 800-1800 lines, real file:line citations, real code snippets, 25-50 completion boxes, every task carries a test impact statement
  • Every consumer audit explicitly checks live-callsites of any wrappers around runtime exports (lesson from #40)
  • No source files modified in any repo during the audit — only spec docs landed

Why this is worth doing

Layered substrate work compounds. #38 shaved 1737 LOC of unused exports based on a single-pass inspection; a structured audit will catch what that pass missed (e.g. analyst-loop is still 100+ LOC of public surface — has it ever shipped?) AND the converse — wrappers in consumers whose only existence justifies the runtime export. Without this audit, agent-runtime drift accumulates silently and we pay the same audit cost again in a year.


Edit history: Body refreshed 2026-05-23 16:45 UTC to update the substrate version baseline (agent-eval 0.31.1 → 0.34.1; agent-runtime → 0.17.1; consumer count 5 → 6 incl. Blueprint), and to bake in the wrapper-callsites lesson from the #40 false-alarm cycle.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions