Reliability improvements: leak fixes, two-tier cleanup, error diagnostics by vsumner · Pull Request #12 · tintinweb/pi-subagents

vsumner · 2026-03-18T18:58:14Z

Summary

Hardens the subagent lifecycle with memory leak fixes, bounded resource cleanup, and improved error reporting.

Fix 11 memory leaks: session disposal, output file subscriptions, abort controllers, and event listener cleanup across
agent-manager, agent-runner, and worktree modules
Two-tier cleanup: Tier 1 (1x retention) disposes sessions; Tier 2 (3x retention) fully removes stale records from the Map
to prevent unbounded growth in long sessions
Async/sync flush race guard: asyncFlush now checks if syncFlush advanced writtenCount during the await, preventing
duplicate JSONL entries in output files
Bounded output file reads: parseOutputFileResult reads only the last 256KB instead of the entire file
Smarter error tips: "Check provider status" tip only shown for likely API/model errors, not user-initiated stops or
timeouts
Error message cleanup: Model info removed from record.error string (already carried in record.modelName)
Type safety: Removed any types from session_switch handler
Debug logging: Added process.env.DEBUG logging for captureStats failures instead of silent swallow
35 new tests covering cleanup tiers, flush races, output file edge cases, group-join, and agent-runner lifecycle

Depends on

Merge #8 first, then rebase this PR onto master before merging.

Test plan

npx vitest run — 242 tests passing (4 pre-existing custom-agents.test.ts failures unrelated to this branch)
Tier 1 cleanup: session disposed, record retained in Map
Tier 2 cleanup: record fully removed after 3x retention
Flush race guard: no duplicate JSONL entries when syncFlush runs during async await
Error tip suppressed for abort/timeout/stopped errors
record.error no longer includes [model: ...] suffix

Resource leaks fixed: - Session shutdown now disposes groupJoin, widget, agentActivity, batch timer - outputCleanup called in dispose() and removeRecord() before session disposal - AbortController nulled after completion/error to prevent retention - DefaultResourceLoader disposed in agent-runner finally block - execFileSync maxBuffer (10MB) added to all git calls in worktree.ts - finishedTurnAge entries pruned when exceeding ERROR_LINGER_TURNS - Stale typeListText cache replaced with inline buildTypeListText() - session_switch now updates currentCtx when provided - waitForAll gets timeout parameter (default 5min) with Promise.race - output-file rewritten to use async writes with batching - resetDefaults() added to agent-runner for global state cleanup Test coverage added: - agent-runner: resetDefaults, get/set for maxTurns and graceTurns - group-join: registration, completion flow, timeout, stragglers, dispose - output-file: JSONL format, async/sync flush, no-duplication, toolResult - agent-manager: outputCleanup, abortController lifecycle, waitForAll

P1 fixes: - Extract releaseRecordResources() helper to deduplicate 4x cleanup pattern - Replace dynamic require("node:fs") with static appendFileSync import - Clear timeout timer in waitForAll when Promise.allSettled wins the race P2 fixes: - Fix asyncFlush/syncFlush race: advance writtenCount only after successful write - Narrow (loader as any) cast to { dispose?(): void } - Replace flaky 50ms sleeps in tests with deterministic file-size polling - Expand inline batchFinalizeTimer cleanup to multi-line P3 fixes: - Remove redundant _INITIAL suffix from constants (DEFAULT_MAX_TURNS, GRACE_TURNS) - Shorten over-verbose widget prune comment to inline - Use queueMicrotask for asyncFlush re-entry instead of direct recursion

- Two-tier cleanup: Tier 1 disposes session but keeps record metadata (stats, modelName, result) so get_subagent_result works after 10+ min. Tier 2 (clearCompleted) does full removal on session switch. - Output file fallback: get_subagent_result recovers from on-disk JSONL transcript when record is fully cleaned up. - Error diagnostics: error messages include model name; get_subagent_result shows actionable tip on errors. - Configurable cleanup retention via /agents Settings menu. - Model name threaded through spawn → notifications → renderer. - Path traversal guard on agent_id in fallback path. - Extracted captureStats() helper (DRY: 3 identical blocks → 1 method). - 20 new tests covering cleanup tiers, stats capture, error diagnostics, parseOutputFileResult, and modelName threading.

…stics - Guard asyncFlush against duplicate JSONL entries when syncFlush runs during the await (output-file.ts) - Bound parseOutputFileResult to last 256KB instead of reading whole file - Stop appending model info to error string (record.modelName suffices) - Show "check provider status" tip only for likely API errors, not user-initiated stops or timeouts - Add Tier 2 cleanup (3x retention) to fully remove stale records and prevent unbounded Map growth - Remove `any` types from session_switch handler - Add debug-level logging for captureStats catch block - Update tests: tier1/tier2 cutoff coverage, model-name assertion, race-condition test stabilization

Ejecting a default agent with no explicit tool list wrote "tools: all" in frontmatter. On reload, "all" was parsed literally as ["all"], which mapped to zero built-in tools since "all" is not in TOOL_FACTORIES. Two-sided fix: - csvList() now treats "all" as equivalent to omitted, returning all defaults - ejectAgent() now writes explicit tool names instead of "all" when builtinToolNames is empty/undefined Test isolation: mock node:os homedir() in custom-agents tests so global agent files in ~/.pi/agent/agents/ do not pollute results. Tests now pass without HOME workaround.

Address Node.js review findings: - csvList: inline parsing logic instead of delegating to parseCsvField, eliminating redundant String(val).trim() - ejectAgent: preserve tools: none round-trip instead of silently expanding to all tools when builtinToolNames is an empty array - loadFromDir: add DEBUG-gated console.debug for catch blocks so silent failures are diagnosable - Tests: add global-dir mock wiring test proving agents load from mocked homedir with source: global, and project-overrides-global test for same-name agents

The csvList "all" fix (returning empty defaults) caused inheritField to silently disable extensions/skills when the frontmatter value was the string "all". csvList returns [] for "all" with empty defaults, which inheritField mapped to false. Fix: check for "all" in inheritField before delegating to csvList, returning true (inherit all) to match the intent.

README: - Remove duplicate isolation row in frontmatter table - Document "all" as valid value for tools, extensions, and skills fields Code: - Add DEBUG-gated logging to bare catch blocks in output-file.ts (parseOutputFileResult), memory.ts (safeReadFile), and agent-manager.ts (worktree cleanup on error) for diagnosability

tintinweb · 2026-03-22T14:48:20Z

thanks @vsumner! I'll look into your prs after merging the next feats ♥️

vsumner added 4 commits March 18, 2026 15:01

vsumner force-pushed the feat/reliability-improvements branch from 3c563a0 to dd79de0 Compare March 18, 2026 19:10

vsumner added 4 commits March 18, 2026 15:34

tintinweb force-pushed the master branch from 455e347 to 39494e1 Compare June 30, 2026 19:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reliability improvements: leak fixes, two-tier cleanup, error diagnostics#12

Reliability improvements: leak fixes, two-tier cleanup, error diagnostics#12
vsumner wants to merge 8 commits into
tintinweb:masterfrom
vsumner:feat/reliability-improvements

vsumner commented Mar 18, 2026 •

edited

Loading

Uh oh!

tintinweb commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vsumner commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Depends on

Test plan

Uh oh!

tintinweb commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vsumner commented Mar 18, 2026 •

edited

Loading