Reliability improvements: leak fixes, two-tier cleanup, error diagnostics#12
Open
vsumner wants to merge 8 commits into
Open
Reliability improvements: leak fixes, two-tier cleanup, error diagnostics#12vsumner wants to merge 8 commits into
vsumner wants to merge 8 commits into
Conversation
Resource leaks fixed: - Session shutdown now disposes groupJoin, widget, agentActivity, batch timer - outputCleanup called in dispose() and removeRecord() before session disposal - AbortController nulled after completion/error to prevent retention - DefaultResourceLoader disposed in agent-runner finally block - execFileSync maxBuffer (10MB) added to all git calls in worktree.ts - finishedTurnAge entries pruned when exceeding ERROR_LINGER_TURNS - Stale typeListText cache replaced with inline buildTypeListText() - session_switch now updates currentCtx when provided - waitForAll gets timeout parameter (default 5min) with Promise.race - output-file rewritten to use async writes with batching - resetDefaults() added to agent-runner for global state cleanup Test coverage added: - agent-runner: resetDefaults, get/set for maxTurns and graceTurns - group-join: registration, completion flow, timeout, stragglers, dispose - output-file: JSONL format, async/sync flush, no-duplication, toolResult - agent-manager: outputCleanup, abortController lifecycle, waitForAll
P1 fixes:
- Extract releaseRecordResources() helper to deduplicate 4x cleanup pattern
- Replace dynamic require("node:fs") with static appendFileSync import
- Clear timeout timer in waitForAll when Promise.allSettled wins the race
P2 fixes:
- Fix asyncFlush/syncFlush race: advance writtenCount only after successful write
- Narrow (loader as any) cast to { dispose?(): void }
- Replace flaky 50ms sleeps in tests with deterministic file-size polling
- Expand inline batchFinalizeTimer cleanup to multi-line
P3 fixes:
- Remove redundant _INITIAL suffix from constants (DEFAULT_MAX_TURNS, GRACE_TURNS)
- Shorten over-verbose widget prune comment to inline
- Use queueMicrotask for asyncFlush re-entry instead of direct recursion
- Two-tier cleanup: Tier 1 disposes session but keeps record metadata (stats, modelName, result) so get_subagent_result works after 10+ min. Tier 2 (clearCompleted) does full removal on session switch. - Output file fallback: get_subagent_result recovers from on-disk JSONL transcript when record is fully cleaned up. - Error diagnostics: error messages include model name; get_subagent_result shows actionable tip on errors. - Configurable cleanup retention via /agents Settings menu. - Model name threaded through spawn → notifications → renderer. - Path traversal guard on agent_id in fallback path. - Extracted captureStats() helper (DRY: 3 identical blocks → 1 method). - 20 new tests covering cleanup tiers, stats capture, error diagnostics, parseOutputFileResult, and modelName threading.
…stics - Guard asyncFlush against duplicate JSONL entries when syncFlush runs during the await (output-file.ts) - Bound parseOutputFileResult to last 256KB instead of reading whole file - Stop appending model info to error string (record.modelName suffices) - Show "check provider status" tip only for likely API errors, not user-initiated stops or timeouts - Add Tier 2 cleanup (3x retention) to fully remove stale records and prevent unbounded Map growth - Remove `any` types from session_switch handler - Add debug-level logging for captureStats catch block - Update tests: tier1/tier2 cutoff coverage, model-name assertion, race-condition test stabilization
3c563a0 to
dd79de0
Compare
Ejecting a default agent with no explicit tool list wrote "tools: all" in frontmatter. On reload, "all" was parsed literally as ["all"], which mapped to zero built-in tools since "all" is not in TOOL_FACTORIES. Two-sided fix: - csvList() now treats "all" as equivalent to omitted, returning all defaults - ejectAgent() now writes explicit tool names instead of "all" when builtinToolNames is empty/undefined Test isolation: mock node:os homedir() in custom-agents tests so global agent files in ~/.pi/agent/agents/ do not pollute results. Tests now pass without HOME workaround.
Address Node.js review findings: - csvList: inline parsing logic instead of delegating to parseCsvField, eliminating redundant String(val).trim() - ejectAgent: preserve tools: none round-trip instead of silently expanding to all tools when builtinToolNames is an empty array - loadFromDir: add DEBUG-gated console.debug for catch blocks so silent failures are diagnosable - Tests: add global-dir mock wiring test proving agents load from mocked homedir with source: global, and project-overrides-global test for same-name agents
The csvList "all" fix (returning empty defaults) caused inheritField to silently disable extensions/skills when the frontmatter value was the string "all". csvList returns [] for "all" with empty defaults, which inheritField mapped to false. Fix: check for "all" in inheritField before delegating to csvList, returning true (inherit all) to match the intent.
README: - Remove duplicate isolation row in frontmatter table - Document "all" as valid value for tools, extensions, and skills fields Code: - Add DEBUG-gated logging to bare catch blocks in output-file.ts (parseOutputFileResult), memory.ts (safeReadFile), and agent-manager.ts (worktree cleanup on error) for diagnosability
Owner
|
thanks @vsumner! I'll look into your prs after merging the next feats |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens the subagent lifecycle with memory leak fixes, bounded resource cleanup, and improved error reporting.
agent-manager, agent-runner, and worktree modules
to prevent unbounded growth in long sessions
asyncFlushnow checks ifsyncFlushadvancedwrittenCountduring theawait, preventingduplicate JSONL entries in output files
parseOutputFileResultreads only the last 256KB instead of the entire filetimeouts
record.errorstring (already carried inrecord.modelName)anytypes fromsession_switchhandlerprocess.env.DEBUGlogging forcaptureStatsfailures instead of silent swallowDepends on
Test plan
npx vitest run— 242 tests passing (4 pre-existingcustom-agents.test.tsfailures unrelated to this branch)record.errorno longer includes[model: ...]suffix