Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,10 @@ Runnable in [`examples/`](./examples/). Every example imports from
- [`model-resolution/`](./examples/model-resolution/) — router catalog + fail-closed admission
- [`agent-into-reviewer/`](./examples/agent-into-reviewer/) — pipe one runtime's stream into a reviewer agent
- [`chat-handler/`](./examples/chat-handler/) — `handleChatTurn` (the centerpiece production pattern)
- [`coder-loop/`](./examples/coder-loop/) — `coderProfile` + `runLoop` + `FanoutVote` (driven-loop kernel)
- [`researcher-loop/`](./examples/researcher-loop/) — `researcherProfile` + `runLoop` + `FanoutVote` (peer dep: `@tangle-network/agent-knowledge`)
- [`mcp-delegation/`](./examples/mcp-delegation/) — mount `agent-runtime-mcp` in a product `AgentProfile` + stdio `tools/list` smoke
- [`fleet-delegation/`](./examples/fleet-delegation/) — `TANGLE_FLEET_ID` env flip + `createFleetWorkspaceExecutor` topology

## Tests

Expand Down
43 changes: 40 additions & 3 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# agent-runtime examples

Each example is a single runnable `.ts` file plus a short README. All
ten are synthetic — no credentials required, except `openai-stream-backend`
which needs an `OPENAI_API_KEY`.
Each example is a single runnable `.ts` file plus a short README. Most are
synthetic — no credentials required. `openai-stream-backend` needs an
`OPENAI_API_KEY`; `mcp-delegation` needs `pnpm build` to have run so the
local MCP bin exists.

| Example | What it covers |
|---|---|
Expand All @@ -16,6 +17,10 @@ which needs an `OPENAI_API_KEY`.
| [`runtime-run/`](./runtime-run/) | `startRuntimeRun` + cost ledger + persistence adapter |
| [`agent-into-reviewer/`](./agent-into-reviewer/) | Pipe one runtime's stream into a reviewer agent (the "2-runtime" pattern) |
| [`chat-handler/`](./chat-handler/) | `handleChatTurn` — the centerpiece production chat handler |
| [`coder-loop/`](./coder-loop/) | `coderProfile` + `runLoop` + `FanoutVote` — minimum end-to-end coder loop |
| [`researcher-loop/`](./researcher-loop/) | `researcherProfile` + `runLoop` + `FanoutVote` (peer dep: `@tangle-network/agent-knowledge`) |
| [`mcp-delegation/`](./mcp-delegation/) | Mount `agent-runtime-mcp` in a product's `AgentProfile` + stdio `tools/list` smoke |
| [`fleet-delegation/`](./fleet-delegation/) | `TANGLE_FLEET_ID` env flip + `createFleetWorkspaceExecutor` — sibling vs fleet topology |

## Conventions

Expand Down Expand Up @@ -43,7 +48,39 @@ pnpm tsx examples/sandbox-stream-backend/sandbox-stream-backend.ts
pnpm tsx examples/runtime-run/runtime-run.ts
pnpm tsx examples/agent-into-reviewer/agent-into-reviewer.ts
pnpm tsx examples/chat-handler/chat-handler.ts
pnpm tsx examples/coder-loop/coder-loop.ts
pnpm tsx examples/researcher-loop/researcher-loop.ts
pnpm tsx examples/fleet-delegation/fleet-delegation.ts

# requires `pnpm build` first (uses dist/mcp/bin.js)
pnpm tsx examples/mcp-delegation/mcp-delegation.ts

# requires creds
OPENAI_API_KEY=... pnpm tsx examples/openai-stream-backend/openai-stream-backend.ts
```

## Trace derivation

The driven-loop kernel emits `loop.*` trace events as it runs. Combined with
the per-event sandbox stream and the kernel's cost ledger, these feed the
production observability pipeline:

```
runLoop iteration N
↓ driver.plan returns task(s)
↓ for each task: sandbox.create(agentRun.profile) OR fleet.dispatchPrompt(...)
↓ box.streamPrompt(taskToPrompt(task))
emits SandboxEvent stream
├─ llm_call { model, tokensIn, tokensOut, costUsd }
├─ tool_call { toolName, args }
├─ tool_result { result }
└─ result { finalText }
↓ output.parse(events) → typed Output
↓ validator.validate(output) → verdict
↓ kernel auto-emits loop.iteration.ended event into ctx.traceEmitter
→ flows into RuntimeRunHandle telemetry
→ flows into .production-data/traces/events.ndjson (when ingestion mount is wired)
→ analyst loop reads + finds patterns
→ production-loop CI mutates agent surface
→ re-eval + ship if gate passes
```
40 changes: 40 additions & 0 deletions examples/coder-loop/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# coder-loop

`coderProfile()` + `runLoop()` + `createFanoutVoteDriver()` — the smallest
end-to-end coder loop. Two parallel iterations attempt the same goal; the
validator scores test + typecheck + diff size; the kernel picks the
highest-scoring valid winner.

## Run

```bash
pnpm tsx examples/coder-loop/coder-loop.ts
```

## What it shows

- How `coderProfile({ task, harness })` bundles `profile`, `taskToPrompt`,
`output` (event-stream → `CoderOutput`), `validator` (test + typecheck +
diff cap + forbidden-path enforcement), and `agentRunSpec` together.
- How `createFanoutVoteDriver({ n })` makes the kernel plan N parallel
iterations and pick the winning output.
- How the synthetic `sandboxClient` mirrors the production
`@tangle-network/sandbox` `Sandbox` surface — swap it for `new Sandbox(...)`
when you wire to production.
- How `result.winner` carries the typed `CoderOutput`, the verdict, and the
iteration index — everything you need to merge the patch in CI.

## Wire to production

Swap the synthetic `sandboxClient` for:

```ts
import { Sandbox } from '@tangle-network/sandbox'

const sandboxClient = new Sandbox({ apiKey: process.env.TANGLE_API_KEY! })
```

Then `runLoop` creates a fresh sandbox per iteration via `sandboxClient.create()`
and streams the prompt through `box.streamPrompt(taskToPrompt(task))`. Each
iteration's events feed the same `output.parse` → `validator.validate`
pipeline.
131 changes: 131 additions & 0 deletions examples/coder-loop/coder-loop.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
/**
* `coderProfile` + `runLoop` + `FanoutVote` driver — the smallest end-to-end
* coder loop. Two parallel coder iterations attempt the goal; the validator
* scores test + typecheck + diff size; the kernel picks the highest-score
* valid winner.
*
* No real sandbox SDK or harness is required. The synthetic `sandboxClient`
* mirrors the production `Sandbox` surface one-for-one (`create()` returns
* an object with `streamPrompt(message, opts)`), and emits a `result` event
* whose `data.result` matches the `CoderOutput` shape `coderProfile`'s
* `parseCoderEvents` walks back-to-front.
*
* Run with:
* pnpm tsx examples/coder-loop/coder-loop.ts
*/

import { createFanoutVoteDriver, runLoop } from '@tangle-network/agent-runtime/loops'
import { type CoderTask, coderProfile } from '@tangle-network/agent-runtime/profiles'
import type { SandboxEvent, SandboxInstance } from '@tangle-network/sandbox'

const task: CoderTask = {
goal: 'add util.ts that exports add(a,b)',
repoRoot: '/tmp/coder-loop-example',
testCmd: 'node -e \'require("./util").add(1,2)===3 || process.exit(1)\'',
typecheckCmd: 'pnpm typecheck',
maxDiffLines: 50,
forbiddenPaths: ['secrets/', 'node_modules/'],
}

// ── Synthetic sandbox client ─────────────────────────────────────────────
// Two iterations: the first emits a valid CoderOutput (tests + typecheck
// pass, small diff); the second emits a near-miss (tests pass, typecheck
// fails). The FanoutVote driver picks #1 as the winner.
const candidateOutputs = [
{
branch: 'coder/util-add-A',
patch: [
'diff --git a/util.ts b/util.ts',
'new file mode 100644',
'--- /dev/null',
'+++ b/util.ts',
'@@ -0,0 +1,1 @@',
'+export const add = (a: number, b: number): number => a + b',
].join('\n'),
testResult: { passed: true, output: '1 test passed' },
typecheckResult: { passed: true, output: 'no errors' },
diffStats: { filesChanged: 1, insertions: 1, deletions: 0 },
reviewerNotes: 'Minimal arrow-function impl. No external deps.',
},
{
branch: 'coder/util-add-B',
patch: [
'diff --git a/util.ts b/util.ts',
'new file mode 100644',
'--- /dev/null',
'+++ b/util.ts',
'@@ -0,0 +1,3 @@',
'+export function add(a, b) {',
'+ return a + b',
'+}',
].join('\n'),
testResult: { passed: true, output: '1 test passed' },
typecheckResult: { passed: false, output: 'TS7006: Parameter implicitly has any type' },
diffStats: { filesChanged: 1, insertions: 3, deletions: 0 },
reviewerNotes: 'Untyped params — typecheck fails.',
},
]

let dispatchIndex = 0
const sandboxClient = {
async create(): Promise<SandboxInstance> {
const index = dispatchIndex++
const output = candidateOutputs[index % candidateOutputs.length]
const id = `sandbox-coder-${index + 1}`
const box = {
id,
async *streamPrompt(): AsyncIterable<SandboxEvent> {
// Mirror cost-bearing events so the kernel's per-iteration costUsd
// aggregator picks them up.
yield {
type: 'llm_call',
data: { model: 'claude-code/sonnet', tokensIn: 800, tokensOut: 120, costUsd: 0.0036 },
}
yield { type: 'result', data: { result: output } }
},
} as unknown as SandboxInstance
return box
},
}

async function main(): Promise<void> {
const { output, validator, agentRunSpec } = coderProfile({ task, harness: 'claude-code' })
const driver = createFanoutVoteDriver<CoderTask, ReturnType<typeof output.parse>>({ n: 2 })

const result = await runLoop({
driver,
agentRun: agentRunSpec,
output,
validator,
task,
ctx: { sandboxClient },
})

console.log(`decision: ${result.decision}`)
console.log(`iterations: ${result.iterations.length}`)
console.log(`durationMs: ${result.durationMs}`)
console.log(`totalCostUsd: ${result.costUsd.toFixed(6)}`)
if (!result.winner) {
console.log('no winner — every iteration failed validation')
return
}
console.log(`winner: iteration #${result.winner.iterationIndex} (${result.winner.agentRunName})`)
console.log(
` score: ${result.winner.verdict?.score?.toFixed(3)} valid: ${result.winner.verdict?.valid}`,
)
console.log(` branch: ${result.winner.output.branch}`)
console.log(` diff (${result.winner.output.diffStats.insertions} insertions):`)
for (const line of result.winner.output.patch.split('\n')) {
console.log(` ${line}`)
}
console.log(` tests passed: ${result.winner.output.testResult.passed}`)
console.log(` typecheck passed: ${result.winner.output.typecheckResult.passed}`)
if (result.winner.output.reviewerNotes) {
console.log(` notes: ${result.winner.output.reviewerNotes}`)
}
}

main().catch((err) => {
console.error(err)
process.exit(1)
})
72 changes: 72 additions & 0 deletions examples/fleet-delegation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# fleet-delegation

How `TANGLE_FLEET_ID` flips `agent-runtime-mcp` from sibling-sandbox
dispatch into fleet-workspace dispatch.

## Run

```bash
pnpm tsx examples/fleet-delegation/fleet-delegation.ts
```

## Sibling vs Fleet

```
Sibling Fleet
────── ─────
parent sandbox coordinator-0 (excluded)
│ │
│ delegate_* │ delegate_*
▼ ▼
fresh sibling worker-a ←─┐
fresh sibling worker-b ←─┤ round-robin
fresh sibling worker-c ←─┘
└─ all three share the same fleet
workspace; diffs land on the
coordinator's FS in place
```

- **Sibling** (default): each `delegate_code` / `delegate_research` spawns
a fresh sandbox via `sandboxClient.create()`. Worker output flows back
through the MCP response — there is no shared filesystem.
- **Fleet** (set `TANGLE_FLEET_ID`): each delegation lands on an existing
machine in the parent fleet. The fleet's shared-workspace policy means
the worker sees the caller's filesystem and any diff lands in-place.

## Env wiring

```bash
TANGLE_API_KEY=sk_sb_* # required in both modes
SANDBOX_BASE_URL=https://sandbox.tangle.tools

# Sibling mode (default) — omit TANGLE_FLEET_ID

# Fleet mode
TANGLE_FLEET_ID=<fleet-id-the-parent-sandbox-runs-in>
TANGLE_FLEET_EXCLUDE_MACHINES=coordinator-0 # comma-separated; skip the
# coordinator machine the
# MCP server itself runs on
```

The bin (`src/mcp/bin.ts`) reads these at startup. When `TANGLE_FLEET_ID`
is set, it constructs a `SandboxFleet` handle via the SDK and passes it
into `createFleetWorkspaceExecutor` (see `src/mcp/executor.ts`); otherwise
it wraps the bare `Sandbox` client in `createSiblingSandboxExecutor`. The
selector used to pick the worker machine round-robins across the eligible
machine ids, skipping any in the exclude set.

## Trace correlation

`loop.iteration.dispatch` events carry the placement tag the executor
reports — `sibling` mode emits `{ placement: 'sibling', sandboxId }`;
fleet mode emits `{ placement: 'fleet', fleetId, machineId, sandboxId }`.
Downstream trace pipelines correlate worker logs back to the dispatch
this way.

## See also

- [`mcp-delegation`](../mcp-delegation/) — how a product mounts the MCP
server entry in its AgentProfile + a smoke that exercises tools/list
- `src/mcp/executor.ts` — the production executor factories
- `src/mcp/bin.ts` — the stdio MCP entry point that wires the env above
Loading
Loading