feat(0.19.0): driven-loop kernel (sandbox-SDK-based) + coderProfile#44
Merged
Conversation
Phase 0 of the driven-loop substrate. Ships:
- `@tangle-network/agent-runtime/loops` — `runLoop` kernel + Refine and
FanoutVote drivers, built on the sandbox SDK's `AgentProfile` +
`streamPrompt` contract. The kernel orchestrates around the sandbox
SDK; it does not invent its own notion of "what an agent is".
- `@tangle-network/agent-runtime/profiles` — `coderProfile` +
`multiHarnessCoderFanout`. Bundle an `AgentProfile`, task-to-prompt
formatter, output adapter, and per-task validator (forbidden paths,
diff cap, tests + typecheck) into a runLoop-ready unit.
Layering:
sandbox SDK AgentProfile + Sandbox + streamPrompt
agent-runtime/loops runLoop kernel + drivers
agent-runtime/profiles presets (coder; researcher in Phase 1)
agent-runtime existing UNTOUCHED — runAgentTask, RuntimeRunHandle etc
Kernel responsibilities: iteration accounting, parallel execution
bounded by `maxConcurrency`, abort propagation, cost aggregation from
sandbox `llm_call`-shaped events (with optional `runHandle.observe`
forwarding), and trace emission via `LoopTraceEmitter`.
Driver responsibilities: topology only. Refine returns `[task]` until
the validator passes; FanoutVote returns N copies on iteration 0 then
selects the highest-scoring valid output. Drivers receive a read-only
history and a typed decision channel; the kernel terminates on
`'stop' | 'pick-winner' | 'fail' | 'done'`.
Output adapter parses an event array → typed Output. Validator scores
the typed Output → DefaultVerdict. Both are pure functions; tests
exercise them without a real sandbox.
Heterogeneous fanout is built in: pass `agentRuns: AgentRunSpec[]` and
the kernel round-robins through them when the driver plans N tasks.
`multiHarnessCoderFanout` ships a 3-harness default (claude-code,
codex, opencode/zai-coding-plan/glm-5.1).
Tests (25 new, all 154 pass):
- tests/loops/refine.test.ts (7) — refine-until-valid, maxIter cap,
error capture, trace event ordering, cost aggregation
- tests/loops/fanout-vote.test.ts (6) — winner selection, fail mode,
`maxConcurrency` enforcement, heterogeneous agentRuns, error
handling on missing options
- tests/loops/composition.test.ts (2) — recursive runLoop in
Driver.plan; static typecheck of nested kernel calls
- tests/profiles/coder.test.ts (10) — task-bound validator
(forbidden-path, diff cap, tests, typecheck), score math, output
adapter (structured result + fenced-JSON fallback), multi-harness
fanout shape
Build, typecheck, lint clean. Existing 129 tests untouched.
Smoke test (manual; requires sandbox credentials):
cd /home/drew/code/agent-runtime && pnpm build
TANGLE_SANDBOX_API_KEY=... TANGLE_ORCHESTRATOR_URL=... node -e "
import { Sandbox } from '@tangle-network/sandbox'
import { runLoop, createFanoutVoteDriver } from './dist/loops.js'
import { multiHarnessCoderFanout } from './dist/profiles.js'
const client = new Sandbox({ apiKey: process.env.TANGLE_SANDBOX_API_KEY,
baseUrl: process.env.TANGLE_ORCHESTRATOR_URL })
const { agentRuns, output, validator, driver } = multiHarnessCoderFanout()
const result = await runLoop({
driver, agentRuns, output, validator,
task: { goal: 'add a hello function', repoRoot: '/work/repo' },
ctx: { sandboxClient: client },
})
console.log(result.decision, result.winner?.iterationIndex, result.costUsd)
"
Out of scope (Phase 1+): researcherProfile, sandboxedDriver helper,
MCP wrapper, Council/Decompose/Pipeline topologies, agent-eval refactor.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 0 of the driven-loop substrate. Ships
runLoop(topology kernel) +Refine/FanoutVotedrivers +coderProfilepreset. Built on topof the sandbox SDK — the kernel composes around
AgentProfile+box.streamPromptrather than inventing its own notion of "what anagent is" (the bug class that killed the prior Phase 0 sketch).
Layering
No cycles. agent-runtime keeps its
@tangle-network/sandboxpeer.loopsandprofilesare new parallel sub-exports — every existingimport path keeps working.
Kernel signature
Each iteration:
driver.plan(task, history)→ N tasks (1 = refine, N = fanout, 0 = stop)maxConcurrency):client.create({ backend: { profile } })+ iteratebox.streamPrompt(taskToPrompt(task))output.parse(events)→ typedOutputvalidator?.validate(output)→DefaultVerdictIterationto history; emitloop.iteration.endedtracedriver.decide(history)→ if terminal, return result + winnerKernel owns: iteration accounting, parallel execution, abort
propagation, cost aggregation, trace emission. Kernel does not
own: what the agent runs, how outputs are decoded, how outputs are
scored, or topology.
Example — multi-harness coder fanout
Capability-scoping notes
AgentRunSpec.profile.toolsis the only place tools are declared.The coder preset enables
{ git, fs, shell, test_runner }— adjustper harness. Sandbox-SDK-level capability scoping (network,
permissions, confidential exec) flows through
sandboxOverrides.already-authenticated
sandboxClientinstance.not constrain the agent at sandbox-creation time. Use sandbox-SDK
permissionsfor prevention; the validator catches what slipsthrough.
Smoke test (manual; requires sandbox creds)
The live smoke is documented, not automated — burning credits in
CI for one variant proves nothing useful. Unit tests are the primary
validation; 25 new vitest cases exercise every kernel boundary
against a stub sandbox client.
Test plan
pnpm buildcleanpnpm typecheckcleanpnpm lintclean (biome)pnpm test— 129 existing + 25 new = 154 total passloop.started, N×(loop.iteration.started,loop.iteration.ended,loop.decision),loop.endedmaxConcurrencycap enforced under parallel fanoutIteration.errorwithoutaborting the loop
llm_call-shaped events sum intoLoopResult.costUsdand forward torunHandle.observeagentRunsand thestub client sees each profile by name
Driver.plancompiles and runsOut of scope (Phase 1+)
researcherProfile, sandboxedDriver helper, MCP wrapper, Council /
Decompose / Pipeline topologies, agent-eval refactor to compose
runLoop. No product repo is touched.