Skip to content

feat(0.19.0): driven-loop kernel (sandbox-SDK-based) + coderProfile#44

Merged
tangletools merged 1 commit into
mainfrom
feat/loops-kernel
May 24, 2026
Merged

feat(0.19.0): driven-loop kernel (sandbox-SDK-based) + coderProfile#44
tangletools merged 1 commit into
mainfrom
feat/loops-kernel

Conversation

@tangletools
Copy link
Copy Markdown
Contributor

Summary

Phase 0 of the driven-loop substrate. Ships runLoop (topology kernel) +
Refine / FanoutVote drivers + coderProfile preset. Built on top
of
the sandbox SDK — the kernel composes around AgentProfile +
box.streamPrompt rather than inventing its own notion of "what an
agent is" (the bug class that killed the prior Phase 0 sketch).

Layering

sandbox SDK              owns: AgentProfile + Sandbox + streamPrompt — execution
agent-runtime/loops      owns: runLoop kernel + drivers (Refine, FanoutVote)
agent-runtime/profiles   owns: presets bundling AgentProfile + adapters + validators
agent-runtime existing   UNTOUCHED — runAgentTask, RuntimeRunHandle stay

No cycles. agent-runtime keeps its @tangle-network/sandbox peer.
loops and profiles are new parallel sub-exports — every existing
import path keeps working.

Kernel signature

export async function runLoop<Task, Output, Decision>(opts: {
  driver: Driver<Task, Output, Decision>
  // Single profile for every iteration:
  agentRun?: AgentRunSpec<Task>
  // OR multiple profiles — kernel round-robins for heterogeneous fanout:
  agentRuns?: AgentRunSpec<Task>[]
  output: OutputAdapter<Output>
  validator?: Validator<Output>
  task: Task
  ctx: {
    sandboxClient: { create(opts?: CreateSandboxOptions): Promise<SandboxInstance> }
    traceEmitter?: LoopTraceEmitter
    runHandle?: RuntimeRunHandle  // ⇒ kernel forwards synthesized llm_call events
    signal?: AbortSignal
  }
  maxIterations?: number  // default 10
  maxConcurrency?: number // default 4
}): Promise<LoopResult<Task, Output, Decision>>

Each iteration:

  1. driver.plan(task, history) → N tasks (1 = refine, N = fanout, 0 = stop)
  2. For each task (parallel, bounded by maxConcurrency):
    client.create({ backend: { profile } }) + iterate box.streamPrompt(taskToPrompt(task))
  3. output.parse(events) → typed Output
  4. validator?.validate(output)DefaultVerdict
  5. Append Iteration to history; emit loop.iteration.ended trace
  6. driver.decide(history) → if terminal, return result + winner

Kernel owns: iteration accounting, parallel execution, abort
propagation, cost aggregation, trace emission. Kernel does not
own: what the agent runs, how outputs are decoded, how outputs are
scored, or topology.

Example — multi-harness coder fanout

import { Sandbox } from '@tangle-network/sandbox'
import { runLoop } from '@tangle-network/agent-runtime/loops'
import { multiHarnessCoderFanout, type CoderTask } from '@tangle-network/agent-runtime/profiles'

const client = new Sandbox({ apiKey, baseUrl })
const { agentRuns, output, validator, driver } = multiHarnessCoderFanout()

const task: CoderTask = {
  goal: 'add a hello() function',
  repoRoot: '/work/repo',
  forbiddenPaths: ['secrets/', 'dist/'],
  maxDiffLines: 200,
}

const result = await runLoop({
  driver, agentRuns, output, validator, task,
  ctx: { sandboxClient: client },
})

if (result.winner) {
  console.log(`winner: ${result.winner.agentRunName}`,
              `branch: ${result.winner.output.branch}`,
              `score: ${result.winner.verdict?.score}`)
}

Capability-scoping notes

  • AgentRunSpec.profile.tools is the only place tools are declared.
    The coder preset enables { git, fs, shell, test_runner } — adjust
    per harness. Sandbox-SDK-level capability scoping (network,
    permissions, confidential exec) flows through sandboxOverrides.
  • The kernel does not inject auth or secrets. Callers pass an
    already-authenticated sandboxClient instance.
  • Forbidden paths are enforced post-hoc by the validator — they do
    not constrain the agent at sandbox-creation time. Use sandbox-SDK
    permissions for prevention; the validator catches what slips
    through.

Smoke test (manual; requires sandbox creds)

cd /home/drew/code/agent-runtime
pnpm build

TANGLE_SANDBOX_API_KEY=... TANGLE_ORCHESTRATOR_URL=... node --input-type=module -e "
  import { Sandbox } from '@tangle-network/sandbox'
  import { runLoop } from './dist/loops.js'
  import { multiHarnessCoderFanout } from './dist/profiles.js'
  const client = new Sandbox({
    apiKey: process.env.TANGLE_SANDBOX_API_KEY,
    baseUrl: process.env.TANGLE_ORCHESTRATOR_URL,
  })
  const { agentRuns, output, validator, driver } = multiHarnessCoderFanout()
  const result = await runLoop({
    driver, agentRuns, output, validator,
    task: {
      goal: 'add a hello function to src/index.ts',
      repoRoot: '/work/repo',
      forbiddenPaths: ['dist/'],
      maxDiffLines: 200,
    },
    ctx: { sandboxClient: client },
  })
  console.log(result.decision, result.winner?.agentRunName, result.costUsd)
"

The live smoke is documented, not automated — burning credits in
CI for one variant proves nothing useful. Unit tests are the primary
validation; 25 new vitest cases exercise every kernel boundary
against a stub sandbox client.

Test plan

  • pnpm build clean
  • pnpm typecheck clean
  • pnpm lint clean (biome)
  • pnpm test — 129 existing + 25 new = 154 total pass
  • Trace event order verified: loop.started, N×(loop.iteration.started,
    loop.iteration.ended, loop.decision), loop.ended
  • maxConcurrency cap enforced under parallel fanout
  • Per-iteration errors captured in Iteration.error without
    aborting the loop
  • Cost aggregation: sandbox llm_call-shaped events sum into
    LoopResult.costUsd and forward to runHandle.observe
  • Heterogeneous fanout: kernel round-robins agentRuns and the
    stub client sees each profile by name
  • Nested runLoop in Driver.plan compiles and runs

Out of scope (Phase 1+)

researcherProfile, sandboxedDriver helper, MCP wrapper, Council /
Decompose / Pipeline topologies, agent-eval refactor to compose
runLoop. No product repo is touched.

Phase 0 of the driven-loop substrate. Ships:

- `@tangle-network/agent-runtime/loops` — `runLoop` kernel + Refine and
  FanoutVote drivers, built on the sandbox SDK's `AgentProfile` +
  `streamPrompt` contract. The kernel orchestrates around the sandbox
  SDK; it does not invent its own notion of "what an agent is".
- `@tangle-network/agent-runtime/profiles` — `coderProfile` +
  `multiHarnessCoderFanout`. Bundle an `AgentProfile`, task-to-prompt
  formatter, output adapter, and per-task validator (forbidden paths,
  diff cap, tests + typecheck) into a runLoop-ready unit.

Layering:

  sandbox SDK              AgentProfile + Sandbox + streamPrompt
  agent-runtime/loops      runLoop kernel + drivers
  agent-runtime/profiles   presets (coder; researcher in Phase 1)
  agent-runtime existing   UNTOUCHED — runAgentTask, RuntimeRunHandle etc

Kernel responsibilities: iteration accounting, parallel execution
bounded by `maxConcurrency`, abort propagation, cost aggregation from
sandbox `llm_call`-shaped events (with optional `runHandle.observe`
forwarding), and trace emission via `LoopTraceEmitter`.

Driver responsibilities: topology only. Refine returns `[task]` until
the validator passes; FanoutVote returns N copies on iteration 0 then
selects the highest-scoring valid output. Drivers receive a read-only
history and a typed decision channel; the kernel terminates on
`'stop' | 'pick-winner' | 'fail' | 'done'`.

Output adapter parses an event array → typed Output. Validator scores
the typed Output → DefaultVerdict. Both are pure functions; tests
exercise them without a real sandbox.

Heterogeneous fanout is built in: pass `agentRuns: AgentRunSpec[]` and
the kernel round-robins through them when the driver plans N tasks.
`multiHarnessCoderFanout` ships a 3-harness default (claude-code,
codex, opencode/zai-coding-plan/glm-5.1).

Tests (25 new, all 154 pass):
  - tests/loops/refine.test.ts (7) — refine-until-valid, maxIter cap,
    error capture, trace event ordering, cost aggregation
  - tests/loops/fanout-vote.test.ts (6) — winner selection, fail mode,
    `maxConcurrency` enforcement, heterogeneous agentRuns, error
    handling on missing options
  - tests/loops/composition.test.ts (2) — recursive runLoop in
    Driver.plan; static typecheck of nested kernel calls
  - tests/profiles/coder.test.ts (10) — task-bound validator
    (forbidden-path, diff cap, tests, typecheck), score math, output
    adapter (structured result + fenced-JSON fallback), multi-harness
    fanout shape

Build, typecheck, lint clean. Existing 129 tests untouched.

Smoke test (manual; requires sandbox credentials):

  cd /home/drew/code/agent-runtime && pnpm build
  TANGLE_SANDBOX_API_KEY=... TANGLE_ORCHESTRATOR_URL=... node -e "
    import { Sandbox } from '@tangle-network/sandbox'
    import { runLoop, createFanoutVoteDriver } from './dist/loops.js'
    import { multiHarnessCoderFanout } from './dist/profiles.js'
    const client = new Sandbox({ apiKey: process.env.TANGLE_SANDBOX_API_KEY,
                                  baseUrl: process.env.TANGLE_ORCHESTRATOR_URL })
    const { agentRuns, output, validator, driver } = multiHarnessCoderFanout()
    const result = await runLoop({
      driver, agentRuns, output, validator,
      task: { goal: 'add a hello function', repoRoot: '/work/repo' },
      ctx: { sandboxClient: client },
    })
    console.log(result.decision, result.winner?.iterationIndex, result.costUsd)
  "

Out of scope (Phase 1+): researcherProfile, sandboxedDriver helper,
MCP wrapper, Council/Decompose/Pipeline topologies, agent-eval refactor.
@tangletools tangletools merged commit 0b97163 into main May 24, 2026
1 check passed
@tangletools tangletools deleted the feat/loops-kernel branch May 24, 2026 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants