chore: PDD sync for #1086#1087
Conversation
Record models attempted during fallback (not just the model that ultimately succeeded) in the cost CSV. Adds an `attempted_models` field to llm_invoke's return dict and serializes it as a new semicolon-delimited column in track_cost's CSV output. Closes #1086
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Step 7/8: Review Loop Final ReportPR: #1087 SummaryFixer claude could not address codex's findings. Per-Reviewer Status
Findings
Fixer Rationale
Fixes Attempted
|
Step 7/8: Review Loop Final ReportPR: #1087 Summarygenerated-code-only fix refused: pdd/llm_invoke.py is generated from pdd/prompts/llm_invoke_python.prompt. Update the prompt source or run the proper PDD sync path before re-running the review loop. Per-Reviewer Status
Findings
Fixer Rationale
Fixes Attempted
|
- Scope attempted_models to each @track_cost invocation: snapshot, reset, capture this command's chain, then restore the prior shared value so chained / batched runs cannot leak a previous command's fallback chain into the next CSV row (issue #1086 regression). - Document the cloud /llmInvoke response contract requiring attemptedModels and add a multi-attempt cloud-response regression test. - Restore include_deps entries for auth_service_example, token_counter, and core/cloud in .pdd/meta/llm_invoke_python.json. - Sync architecture.json signature for extract_cost_and_model. - Remove transient error_log.txt from repo root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move attempted_models append in llm_invoke.py from immediately after credential check to immediately before each litellm.responses / batch_completion / completion call. Models rejected by pre-call context-window validation no longer pollute the attempted_models chain. Added regression test TestContextWindowValidation::test_context_window_skipped_model_not_in_attempted_models. - Strengthen llm_invoke prompt cloud contract: attemptedModels is now documented as REQUIRED in /llmInvoke success responses; the existing client-side fallback to [modelName] is preserved as a documented backward-compatibility degraded mode for non-conforming servers, and the cloud helper now emits a warning when the field is missing. Added test test_cloud_response_missing_attempted_models_warns_and_degrades. - Update track_cost prompt to encode the snapshot/reset/restore behavior for ctx.obj['attempted_models'] so a future PDD sync cannot regenerate away the cross-command leak fix (#1086). - Refresh .pdd/meta/llm_invoke_python.json (prompt_hash, code_hash, test_hash, test_files entry, timestamp) so drift checks audit the committed dev unit correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoid O(n) re-read of cost.csv on every command. Only load+rewrite rows when the existing header is missing 'attempted_models'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Attach attempted_models_chain to ctx and any pre-candidate-loop exception so track_cost can record cloud attempts even when local setup fails before the candidate loop runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When _propagate_attempted_models_to_ctx is called twice in one invocation and the incoming chain is a prefix-extension of the existing ctx chain, replace rather than append-with-dedup so the cloud prefix isn't recorded twice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- track_cost: capture ctx.obj['attempted_models'] post-command regardless of snapshot state, so commands that create ctx.obj via ensure_object still record the chain. - llm_invoke: helper now called at most once per invocation. Drop prefix-extension shortcut; the per-element dedup loop suffices for cross-invocation composition and no longer collapses repeated chains within one Click command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop per-element dedup in _propagate_attempted_models_to_ctx. Each llm_invoke chain is already pre-deduplicated; collapsing at the invocation boundary loses chronological history for commands that make multiple sibling llm_invoke calls reusing the same model (gpt-4 + gpt-4 must record both, not collapse to one). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The track_cost example was popping PYTEST_CURRENT_TEST at module top-level so it would write its CSV when run as a script. Under pytest that pop runs on import too and disables track_cost's CSV guard for the rest of the session. Move the pop inside main(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove stale 'no duplicate consecutive model names' clause in llm_invoke_python.prompt:116; line 118's pure-concat rule is authoritative. - architecture.json track_cost returns is now Callable to match the prompt's interface block. - CSV migration writes a tempfile in the same directory and os.replaces atomically. Concurrent PDD processes can no longer corrupt a legacy cost.csv during first-run upgrade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker threads' Click ctx is thread-local, so attempted_models propagation from inside _llm_invoke_cloud / candidate-loop didn't reach the main thread's ctx for parallel summarize_directory work. Capture attempted_models from each llm_result and aggregate at the main level via _propagate_attempted_models_to_ctx after the executor completes. Non-threaded code paths follow the same return-and-aggregate pattern for parity. auto_deps_main itself needs no code change — llm_invoke (via insert_includes) already best-effort propagates the chain to ctx on every success path per the round-4 'at most once' contract, so the @track_cost decorator on the auto-deps command already sees the chain. A regression test now pins this contract. Note: sync_orchestration calls auto_deps_main inside a @track_cost-decorated pdd sync, so the chain lands in the sync command's row — that's correct (sync paid for those calls). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cloud_fix_errors only parsed modelName/totalCost from the cloud response. The attemptedModels key (per issue #1086 cloud contract) was discarded, so cloud-only pdd fix runs wrote a single-model attempted_models column even when the server attempted fallbacks. Parse attemptedModels (with attempted_models / modelChain aliases and modelName fallback for non-conforming servers). Return as the 8th tuple slot, prefixed with cloud: to distinguish cloud attempts from local. In fix_error_loop's cloud branch, propagate the chain to the active Click ctx via _propagate_attempted_models_to_ctx so the @track_cost decorator on the outer pdd fix command records the full chain to cost.csv. Note: fix_main and commands/fix.py retain their legacy 6-/3-tuple return shapes — the spec's enriched-dict change had too large a blast radius (37+ callsites between sync_orchestration, pin_example_hack, and tests). ctx propagation achieves the same goal — @track_cost reads ctx.obj['attempted_models'] regardless of the result tuple shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- summarize_directory: only aggregate-and-propagate from worker threads, not in the serial path (llm_invoke's own ctx propagation already covers max_workers==1, so the parent saw 'm1;m1' for a single LLM call). - summarize_directory: capture e.attempted_models when a worker's llm_invoke raises, so failed-file rows still record the attempted chain instead of dropping it on the floor. - track_cost: serialize CSV migration + append with fcntl.flock (POSIX best-effort; Windows falls through unchanged). Prevents the lost-update race where two concurrent PDD processes hitting a legacy 6-column cost.csv would each migrate independently and the second os.replace would overwrite the first's appended row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloud /fixCode error paths (timeout, 4xx, 5xx) attach attempted_models=['cloud:auto'] to the raised RuntimeError so the caller can recover the chain. fix_error_loop's cloud branch now propagates e.attempted_models to ctx before falling back to local or re-raising. Without this, cost.csv recorded only the later local model (or stayed empty) on cloud failure, contradicting the issue #1086 contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second-pass human review flagged prompt-vs-code drift introduced by the round-9 cloud_fix_errors 8-tuple change and stale meta for summarize_directory. Resync: - fix_error_loop_python.prompt: cloud_fix_errors return is 8-tuple including attempted_models. - architecture.json: cloud_fix_errors signature updated to match the 8-tuple plus protect_tests / failure_classification params; fix_error_loop return updated from None to the real 6-tuple. - .pdd/meta/summarize_directory_python.json: refresh hashes and timestamp so a future `pdd sync` doesn't see false drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the third human review pass:
- .pdd/meta/{llm_invoke,summarize_directory}_python.json now match
PDD's own calculate_prompt_hash (incorporates <include> dep
contents) and calculate_sha256 from sync_determine_operation.
Previous round used raw sha256(prompt_bytes), which doesn't match
PDD's hash when the prompt has <include> tags. Refreshed code/
test/timestamp too.
- pdd/fix_error_loop.py: wrap a scalar attemptedModels response in
a single-element list before the cloud: prefix comprehension.
Without the guard, a non-conforming cloud server returning
attemptedModels="gpt-4" would record ['cloud:g','cloud:p',
'cloud:t','cloud:-','cloud:4'] in the cost CSV.
- pdd/track_cost.py: replace fcntl.flock with a cross-platform
atomic lock file (O_CREAT|O_EXCL with retry and 60s stale-lock
steal). POSIX and Windows now share the same serialization path
for the legacy 6-column cost.csv migration race — no more
unlocked-degraded mode on Windows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the 4th human review pass on the portable O_CREAT|O_EXCL lock introduced in the 3rd pass: - Finding 1: timeout fell through unlocked after only 5s. Retry budget is now 30s, AND falling through emits a yellow Rich warning so the user knows the write isn't serialized — but only when real contention was observed (saw FileExistsError during retry), not when the filesystem just doesn't support O_EXCL. _acquire_atomic_lock now returns (handle, contended) so the caller can distinguish. - Finding 2: a legitimate slow migration holding the lock past 60s was treated as stale and stolen. Staleness is now determined by PID liveness (os.kill(pid, 0) → ProcessLookupError) read from a nonce we write at acquire. mtime is only a long-threshold safety net (10 minutes) for malformed lock files with no parseable PID; a LIVE PID is never stolen regardless of mtime. - Finding 3: after a stale-steal, the original process's release would unlink the new owner's lock file, letting a third process enter the critical section. Each acquire writes a unique pid-uuid nonce; release reads the file back and only unlinks when its nonce still matches ours. Three new regression tests pin each finding's repro. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-pass review) Three findings from the 5th human review pass on the portable O_CREAT|O_EXCL lock: - Finding 1: 30s timeout fell through to unlocked write WHILE the original holder was still in the migration critical section, reopening the lost-row race. Lock is now only REQUIRED for the migration path (read-modify-write of the whole file); a pure append to an already-migrated file is safe without the lock thanks to POSIX O_APPEND atomicity for sub-PIPE_BUF writes. When migration is needed AND the lock can't be acquired, SKIP the cost row with a loud warning naming the file — better to lose one row than to corrupt the legacy CSV via unlocked migration. - Finding 2: os.open(O_CREAT|O_EXCL) followed by os.write(nonce) could leave a malformed lock file if the write was partial or if the process died between the two syscalls. Future processes would then wait the full 10-minute mtime threshold. Acquire now verifies the full nonce was written; on partial write it closes + unlinks + retries. - Finding 3: _release_atomic_lock returned silently when _read_lock_owner couldn't parse the file (mocked-open test reproduced this), leaving the lock artifact behind. Release now unlinks an unparseable lock that we created via O_EXCL — we hold the path, the malformed content is necessarily ours, and leaving it would block future PDD invocations for the stale window. Three new regression tests pin each finding's repro: - test_migration_skips_write_on_lock_failure - test_acquire_atomic_lock_retries_on_partial_nonce_write - test_release_atomic_lock_unlinks_malformed_lock Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pass review)
Three findings from the 6th human review pass on the pre-lock
header peek and the lock-fail policy:
- Finding 1: pre-lock peek caught only OSError. A malformed first
row (csv.Error, e.g. unterminated quoted field) bubbled up to the
outer except and the cost row was silently dropped. Catch
(OSError, csv.Error, UnicodeDecodeError, StopIteration); treat
peek failure as "migration unknown" and fall through to the
append path so the new row still lands. csv.writer appends
regardless of pre-existing content corruption.
- Finding 2: same peek-narrow-except bug for non-UTF-8 bytes.
UnicodeDecodeError now folded into the broadened except.
- Finding 3: when migration is needed AND the filesystem doesn't
support O_EXCL (saw_contention=False, OSError on os.open), the
previous policy skipped the cost row indefinitely until manual
migration — a known tradeoff that the reviewer asked us to
document or accept explicitly. New policy: split the lock-fail
modes. On real contention (saw_contention=True) still skip with
the loud warning. On unsupported filesystem attempt UNLOCKED
migration with a one-line tradeoff warning ("safe for
single-process use; concurrent PDD writers on this filesystem
can lose rows during legacy migration"). Single-process use is
the dominant case on such filesystems.
Three new regression tests pin each finding's repro:
- test_peek_survives_malformed_csv_first_row
- test_peek_survives_non_utf8_csv
- test_unsupported_fs_lock_attempts_unlocked_migration
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Recommending close as superseded. The
Re-basing this PR on In addition, the current diff has a few issues that would need to be fixed even before considering rebase:
Suggest closing this PR and #1086 as already resolved by #1056 / #897. |
Summary
Adds full attempted-model history to the cost CSV so users can see PDD fallback decision-making, including cloud attempts, local retries, failure paths, and successful GPT-5 Responses API calls.
Closes #1086
Changes Made
llm_invokenow returns and propagatesattempted_modelsfor successful calls, failures, cloud fallback, and Responses API success paths.track_costwrites a newattempted_modelsCSV column, migrates existing 6-column CSVs, and scopes command-level attempted models per invocation.Verification
tests/test_track_cost.py,tests/test_llm_invoke.py,tests/test_llm_invoke_retry_cost.pypassed before the final isolation patch; final focused isolation run passed (5 passed).make cloud-testond46638dd5: completed with 65/77 tasks passing and 12 failing. The failures are cloud backend/credit-dependent integration paths (Insufficient credits,currentBalance: 10, and cloud-only command cases), not issue Model used UI #1086-specific regressions.Review Checklist
Created by pdd change workflow; updated after review fixes