feat(0.19.0): driven-loop kernel (sandbox-SDK-based) + coderProfile by tangletools · Pull Request #44 · tangle-network/agent-runtime

tangletools · 2026-05-24T17:02:49Z

Summary

Phase 0 of the driven-loop substrate. Ships runLoop (topology kernel) +
Refine / FanoutVote drivers + coderProfile preset. Built on top
of the sandbox SDK — the kernel composes around AgentProfile +
box.streamPrompt rather than inventing its own notion of "what an
agent is" (the bug class that killed the prior Phase 0 sketch).

Layering

sandbox SDK              owns: AgentProfile + Sandbox + streamPrompt — execution
agent-runtime/loops      owns: runLoop kernel + drivers (Refine, FanoutVote)
agent-runtime/profiles   owns: presets bundling AgentProfile + adapters + validators
agent-runtime existing   UNTOUCHED — runAgentTask, RuntimeRunHandle stay

No cycles. agent-runtime keeps its @tangle-network/sandbox peer.
loops and profiles are new parallel sub-exports — every existing
import path keeps working.

Kernel signature

export async function runLoop<Task, Output, Decision>(opts: {
  driver: Driver<Task, Output, Decision>
  // Single profile for every iteration:
  agentRun?: AgentRunSpec<Task>
  // OR multiple profiles — kernel round-robins for heterogeneous fanout:
  agentRuns?: AgentRunSpec<Task>[]
  output: OutputAdapter<Output>
  validator?: Validator<Output>
  task: Task
  ctx: {
    sandboxClient: { create(opts?: CreateSandboxOptions): Promise<SandboxInstance> }
    traceEmitter?: LoopTraceEmitter
    runHandle?: RuntimeRunHandle  // ⇒ kernel forwards synthesized llm_call events
    signal?: AbortSignal
  }
  maxIterations?: number  // default 10
  maxConcurrency?: number // default 4
}): Promise<LoopResult<Task, Output, Decision>>

Each iteration:

driver.plan(task, history) → N tasks (1 = refine, N = fanout, 0 = stop)
For each task (parallel, bounded by maxConcurrency):
client.create({ backend: { profile } }) + iterate box.streamPrompt(taskToPrompt(task))
output.parse(events) → typed Output
validator?.validate(output) → DefaultVerdict
Append Iteration to history; emit loop.iteration.ended trace
driver.decide(history) → if terminal, return result + winner

Kernel owns: iteration accounting, parallel execution, abort
propagation, cost aggregation, trace emission. Kernel does not
own: what the agent runs, how outputs are decoded, how outputs are
scored, or topology.

Example — multi-harness coder fanout

import { Sandbox } from '@tangle-network/sandbox'
import { runLoop } from '@tangle-network/agent-runtime/loops'
import { multiHarnessCoderFanout, type CoderTask } from '@tangle-network/agent-runtime/profiles'

const client = new Sandbox({ apiKey, baseUrl })
const { agentRuns, output, validator, driver } = multiHarnessCoderFanout()

const task: CoderTask = {
  goal: 'add a hello() function',
  repoRoot: '/work/repo',
  forbiddenPaths: ['secrets/', 'dist/'],
  maxDiffLines: 200,
}

const result = await runLoop({
  driver, agentRuns, output, validator, task,
  ctx: { sandboxClient: client },
})

if (result.winner) {
  console.log(`winner: ${result.winner.agentRunName}`,
              `branch: ${result.winner.output.branch}`,
              `score: ${result.winner.verdict?.score}`)
}

Capability-scoping notes

AgentRunSpec.profile.tools is the only place tools are declared.
The coder preset enables { git, fs, shell, test_runner } — adjust
per harness. Sandbox-SDK-level capability scoping (network,
permissions, confidential exec) flows through sandboxOverrides.
The kernel does not inject auth or secrets. Callers pass an
already-authenticated sandboxClient instance.
Forbidden paths are enforced post-hoc by the validator — they do
not constrain the agent at sandbox-creation time. Use sandbox-SDK
permissions for prevention; the validator catches what slips
through.

Smoke test (manual; requires sandbox creds)

cd /home/drew/code/agent-runtime
pnpm build

TANGLE_SANDBOX_API_KEY=... TANGLE_ORCHESTRATOR_URL=... node --input-type=module -e "
  import { Sandbox } from '@tangle-network/sandbox'
  import { runLoop } from './dist/loops.js'
  import { multiHarnessCoderFanout } from './dist/profiles.js'
  const client = new Sandbox({
    apiKey: process.env.TANGLE_SANDBOX_API_KEY,
    baseUrl: process.env.TANGLE_ORCHESTRATOR_URL,
  })
  const { agentRuns, output, validator, driver } = multiHarnessCoderFanout()
  const result = await runLoop({
    driver, agentRuns, output, validator,
    task: {
      goal: 'add a hello function to src/index.ts',
      repoRoot: '/work/repo',
      forbiddenPaths: ['dist/'],
      maxDiffLines: 200,
    },
    ctx: { sandboxClient: client },
  })
  console.log(result.decision, result.winner?.agentRunName, result.costUsd)
"

The live smoke is documented, not automated — burning credits in
CI for one variant proves nothing useful. Unit tests are the primary
validation; 25 new vitest cases exercise every kernel boundary
against a stub sandbox client.

Test plan

Out of scope (Phase 1+)

researcherProfile, sandboxedDriver helper, MCP wrapper, Council /
Decompose / Pipeline topologies, agent-eval refactor to compose
runLoop. No product repo is touched.

Phase 0 of the driven-loop substrate. Ships: - `@tangle-network/agent-runtime/loops` — `runLoop` kernel + Refine and FanoutVote drivers, built on the sandbox SDK's `AgentProfile` + `streamPrompt` contract. The kernel orchestrates around the sandbox SDK; it does not invent its own notion of "what an agent is". - `@tangle-network/agent-runtime/profiles` — `coderProfile` + `multiHarnessCoderFanout`. Bundle an `AgentProfile`, task-to-prompt formatter, output adapter, and per-task validator (forbidden paths, diff cap, tests + typecheck) into a runLoop-ready unit. Layering: sandbox SDK AgentProfile + Sandbox + streamPrompt agent-runtime/loops runLoop kernel + drivers agent-runtime/profiles presets (coder; researcher in Phase 1) agent-runtime existing UNTOUCHED — runAgentTask, RuntimeRunHandle etc Kernel responsibilities: iteration accounting, parallel execution bounded by `maxConcurrency`, abort propagation, cost aggregation from sandbox `llm_call`-shaped events (with optional `runHandle.observe` forwarding), and trace emission via `LoopTraceEmitter`. Driver responsibilities: topology only. Refine returns `[task]` until the validator passes; FanoutVote returns N copies on iteration 0 then selects the highest-scoring valid output. Drivers receive a read-only history and a typed decision channel; the kernel terminates on `'stop' | 'pick-winner' | 'fail' | 'done'`. Output adapter parses an event array → typed Output. Validator scores the typed Output → DefaultVerdict. Both are pure functions; tests exercise them without a real sandbox. Heterogeneous fanout is built in: pass `agentRuns: AgentRunSpec[]` and the kernel round-robins through them when the driver plans N tasks. `multiHarnessCoderFanout` ships a 3-harness default (claude-code, codex, opencode/zai-coding-plan/glm-5.1). Tests (25 new, all 154 pass): - tests/loops/refine.test.ts (7) — refine-until-valid, maxIter cap, error capture, trace event ordering, cost aggregation - tests/loops/fanout-vote.test.ts (6) — winner selection, fail mode, `maxConcurrency` enforcement, heterogeneous agentRuns, error handling on missing options - tests/loops/composition.test.ts (2) — recursive runLoop in Driver.plan; static typecheck of nested kernel calls - tests/profiles/coder.test.ts (10) — task-bound validator (forbidden-path, diff cap, tests, typecheck), score math, output adapter (structured result + fenced-JSON fallback), multi-harness fanout shape Build, typecheck, lint clean. Existing 129 tests untouched. Smoke test (manual; requires sandbox credentials): cd /home/drew/code/agent-runtime && pnpm build TANGLE_SANDBOX_API_KEY=... TANGLE_ORCHESTRATOR_URL=... node -e " import { Sandbox } from '@tangle-network/sandbox' import { runLoop, createFanoutVoteDriver } from './dist/loops.js' import { multiHarnessCoderFanout } from './dist/profiles.js' const client = new Sandbox({ apiKey: process.env.TANGLE_SANDBOX_API_KEY, baseUrl: process.env.TANGLE_ORCHESTRATOR_URL }) const { agentRuns, output, validator, driver } = multiHarnessCoderFanout() const result = await runLoop({ driver, agentRuns, output, validator, task: { goal: 'add a hello function', repoRoot: '/work/repo' }, ctx: { sandboxClient: client }, }) console.log(result.decision, result.winner?.iterationIndex, result.costUsd) " Out of scope (Phase 1+): researcherProfile, sandboxedDriver helper, MCP wrapper, Council/Decompose/Pipeline topologies, agent-eval refactor.

tangletools merged commit 0b97163 into main May 24, 2026
1 check passed

tangletools deleted the feat/loops-kernel branch May 24, 2026 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(0.19.0): driven-loop kernel (sandbox-SDK-based) + coderProfile#44

feat(0.19.0): driven-loop kernel (sandbox-SDK-based) + coderProfile#44
tangletools merged 1 commit into
mainfrom
feat/loops-kernel

tangletools commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tangletools commented May 24, 2026

Summary

Layering

Kernel signature

Example — multi-harness coder fanout

Capability-scoping notes

Smoke test (manual; requires sandbox creds)

Test plan

Out of scope (Phase 1+)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants