chore: PDD sync for #1086 by prompt-driven-github[bot] · Pull Request #1087 · promptdriven/pdd

prompt-driven-github · 2026-05-19T18:01:01Z

Summary

Adds full attempted-model history to the cost CSV so users can see PDD fallback decision-making, including cloud attempts, local retries, failure paths, and successful GPT-5 Responses API calls.

Closes #1086

Changes Made

llm_invoke now returns and propagates attempted_models for successful calls, failures, cloud fallback, and Responses API success paths.
track_cost writes a new attempted_models CSV column, migrates existing 6-column CSVs, and scopes command-level attempted models per invocation.
Prompt sources, generated code, metadata, architecture docs, README, and examples are aligned with the new contract.
Regression coverage was added for command scoping, CSV migration, fallback chains, cloud pre-auth behavior, and Responses API success reporting.

Verification

GitHub PR checks: passing.
Focused local checks: tests/test_track_cost.py, tests/test_llm_invoke.py, tests/test_llm_invoke_retry_cost.py passed before the final isolation patch; final focused isolation run passed (5 passed).
make cloud-test on d46638dd5: completed with 65/77 tasks passing and 12 failing. The failures are cloud backend/credit-dependent integration paths (Insufficient credits, currentBalance: 10, and cloud-only command cases), not issue Model used UI #1086-specific regressions.

Review Checklist

Prompt syntax is valid
PDD conventions followed
Documentation is up to date
Generated code and metadata are in sync

Created by pdd change workflow; updated after review fixes

Record models attempted during fallback (not just the model that ultimately succeeded) in the cost CSV. Adds an `attempted_models` field to llm_invoke's return dict and serializes it as a new semicolon-delimited column in track_cost's CSV output. Closes #1086

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

prompt-driven-github · 2026-05-19T18:47:18Z

Step 7/8: Review Loop Final Report

PR: #1087
Issue: #1086
issue_aligned: unknown
reviewer-status: codex=findings claude=fixer fresh-final=missing
fresh-final-review: missing
max-rounds-reached: false
max-cost-reached: false
max-duration-reached: false

Summary

Fixer claude could not address codex's findings.

Per-Reviewer Status

Reviewer	Status
codex	findings
claude	fixer
fresh-final	missing

Findings

Severity	Status	Location	Finding	Required fix	Reviewer
critical	open	pdd/llm_invoke.py:546	`attempted_models` leaks across commands in batch runs.	Address the reviewer finding.	codex
medium	open	pdd/prompts/llm_invoke_python.prompt:116	Prompt and generated code disagree on Click context ownership.	Address the reviewer finding.	codex
medium	open	.pdd/meta/llm_invoke_python.json:5	`.pdd` metadata for `llm_invoke` is stale after the sync.	Address the reviewer finding.	codex
low	open	error_log.txt:1	Generated debug log committed at repo root.	Address the reviewer finding.	codex

Fixer Rationale

pdd/llm_invoke.py:546: attempted_models leaks across commands in batch runs. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)
pdd/prompts/llm_invoke_python.prompt:116: Prompt and generated code disagree on Click context ownership. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)
.pdd/meta/llm_invoke_python.json:5: .pdd metadata for llm_invoke is stale after the sync. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)
error_log.txt:1: Generated debug log committed at repo root. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)

Fixes Attempted

claude: success; verification=unverified; changed_files=README.md, pdd/llm_invoke.py, pdd/track_cost.py, tests/test_track_cost.py; Populated attempted_models from real fallback paths in llm_invoke (local + cloud), added Click-context propagation so cost tracking sees the chain without modifying every caller, migrated existing 6-column cost.csv files to add the new attempted_models header before appending rows, and clarified the README empty-state semantics. Added a migration test; updated two existing tests whose strict assert_called_once_with assertion no longer holds because the writer now also opens the CSV in read mode to detect header drift.
claude: success; verification=unverified; changed_files=pdd/llm_invoke.py, pdd/track_cost.py, error_log.txt; Hoisted attempted_models_chain above the cloud-vs-local split so cloud attempts ('cloud:auto', rewritten to 'cloud:<reported_model>' on success) are recorded and persist through fallback to local execution. Moved the local-candidate append from before _ensure_api_key() to after the credential check passes, so models skipped for missing keys never appear in the chain (matches prompt contract llm_invoke_python.prompt:111). Attached the chain via exc.attempted_models on both the exhaustion RuntimeError and the cloud-path InsufficientCreditsError, plus surfaced the chain via _propagate_attempted_models_to_ctx on failure. Updated track_cost so it writes a row when a command fails but a fallback chain is recorded (on the exception or ctx.obj), surfacing total-failure history. Removed trailing whitespace at track_cost.py:92. All 315 llm_invoke + track_cost tests pass. The 6 'cloud' suite failures observed during regression all reproduce on the unmodified base branch, so they are pre-existing and out of scope.
claude: failed; verification=unverified; changed_files=none; All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques

Serhan-Asad · 2026-05-19T19:53:00Z

Step 7/8: Review Loop Final Report

PR: #1087
Issue: #1086
issue_aligned: false
active-reviewer: codex
reviewer-status: codex=findings claude=fixer fresh-final=missing
fresh-final-review: missing
max-rounds-reached: false
max-cost-reached: false
max-duration-reached: false

Summary

generated-code-only fix refused: pdd/llm_invoke.py is generated from pdd/prompts/llm_invoke_python.prompt. Update the prompt source or run the proper PDD sync path before re-running the review loop.

Per-Reviewer Status

Reviewer	Status
codex	findings
claude	fixer
fresh-final	missing

Findings

Severity	Status	Location	Finding	Required fix	Reviewer
medium	open	pdd/llm_invoke.py:3798	Successful OpenAI GPT-5 Responses API invocations do not surface attempted_models, so cost.csv can still omit the model history for a successful LLM call.	Before the Responses API success return, propagate attempted_models_chain to Click context and include 'attempted_models': list(attempted_models_chain) in the returned dict. Add a regression test for a gpt-5/litellm.responses success.	codex
medium	open	pdd/llm_invoke.py:3053	Cloud authentication failures are recorded as a cloud model attempt even when no cloud HTTP request or cloud model attempt happened.	Record the cloud sentinel only once an actual cloud request is about to be made, or remove the sentinel when _llm_invoke_cloud fails before the HTTP request. Keep the prompt and implementation aligned with the 'actual provider/cloud HTTP call' recording rule.	codex
low	open	context/cli_example.py:184	The CLI examples still document the wrong attempted_models empty-state semantics.	Update both CLI examples to say single-model successes contain the successful model, and the field is empty only when no LLM attempt was recorded.	codex

Fixer Rationale

pdd/llm_invoke.py:3798: Successful OpenAI GPT-5 Responses API invocations do not surface attempted_models, so cost.csv can still omit the model history for a successful LLM call. (claude: fixed - Added _propagate_attempted_models_to_ctx(attempted_models_chain) and 'attempted_models': list(attempted_models_chain) to the Responses API success return. New regression test asserts both the returned dict field and ctx propagation.)
pdd/llm_invoke.py:3053: Cloud authentication failures are recorded as a cloud model attempt even when no cloud HTTP request or cloud model attempt happened. (claude: fixed - Removed the pre-append of 'cloud:auto' in the outer llm_invoke. _llm_invoke_cloud now defines attempted_models_record=['cloud:auto'] only just before requests.post and attaches it to any cloud-specific exception raised at/after the HTTP boundary. Caller merges e.attempted_models into the chain. Pre-HTTP JWT auth failures raise without the attribute, so the chain stays empty for them.)
context/cli_example.py:184: The CLI examples still document the wrong attempted_models empty-state semantics. (claude: fixed - Both context/cli_example.py and context/core/cli_example.py updated to state a normal single-model success contains a single entry (the successful model) and is empty only when no LLM attempt was recorded.)

Fixes Attempted

claude: success; verification=unverified; changed_files=none; Scoped ctx.obj['attempted_models'] per @track_cost boundary (fixes cross-command leak); stopped early cloud propagation in _llm_invoke_cloud so the outer normalized chain isn't duplicated; moved attempted_models_chain.append to immediately before each provider call so context-window-rejected models aren't counted; rewrote the llm_invoke prompt section to permit best-effort Click propagation; updated architecture.json signatures/return types for track_cost and extract_cost_and_model; regenerated .pdd/meta/llm_invoke_python.json hashes and include_deps; added attempted_models docs to both CLI examples; removed error_log.txt; added a real-Click chained-group regression test that fails without the fix and passes with it. tests/test_track_cost.py + tests/test_llm_invoke.py + tests/test_llm_invoke_retry_cost.py all green (320 passed).
claude: success; verification=unverified; changed_files=context/cli_example.py, context/core/cli_example.py, pdd/llm_invoke.py, tests/test_llm_invoke.py; Applied fixes for all three open findings. (1) Responses API success path now propagates attempted_models to ctx and includes the chain in the returned dict; added a regression test test_llm_invoke_responses_api_success_returns_attempted_models. (2) cloud:auto sentinel is no longer pre-appended in the outer caller. _llm_invoke_cloud only attaches attempted_models=['cloud:auto'] to exceptions raised at or after the HTTP boundary, so pre-HTTP failures (JWT auth) leave the chain untouched. Caller's except handlers now merge e.attempted_models into the chain instead of relying on a pre-append. (3) Both context/cli_example.py and context/core/cli_example.py updated to state single-model successes contain one entry; empty only when no LLM attempt was recorded. All 317 llm_invoke+track_cost tests and 21 cloud tests pass.

- Scope attempted_models to each @track_cost invocation: snapshot, reset, capture this command's chain, then restore the prior shared value so chained / batched runs cannot leak a previous command's fallback chain into the next CSV row (issue #1086 regression). - Document the cloud /llmInvoke response contract requiring attemptedModels and add a multi-attempt cloud-response regression test. - Restore include_deps entries for auth_service_example, token_counter, and core/cloud in .pdd/meta/llm_invoke_python.json. - Sync architecture.json signature for extract_cost_and_model. - Remove transient error_log.txt from repo root. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Move attempted_models append in llm_invoke.py from immediately after credential check to immediately before each litellm.responses / batch_completion / completion call. Models rejected by pre-call context-window validation no longer pollute the attempted_models chain. Added regression test TestContextWindowValidation::test_context_window_skipped_model_not_in_attempted_models. - Strengthen llm_invoke prompt cloud contract: attemptedModels is now documented as REQUIRED in /llmInvoke success responses; the existing client-side fallback to [modelName] is preserved as a documented backward-compatibility degraded mode for non-conforming servers, and the cloud helper now emits a warning when the field is missing. Added test test_cloud_response_missing_attempted_models_warns_and_degrades. - Update track_cost prompt to encode the snapshot/reset/restore behavior for ctx.obj['attempted_models'] so a future PDD sync cannot regenerate away the cross-command leak fix (#1086). - Refresh .pdd/meta/llm_invoke_python.json (prompt_hash, code_hash, test_hash, test_files entry, timestamp) so drift checks audit the committed dev unit correctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Avoid O(n) re-read of cost.csv on every command. Only load+rewrite rows when the existing header is missing 'attempted_models'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Attach attempted_models_chain to ctx and any pre-candidate-loop exception so track_cost can record cloud attempts even when local setup fails before the candidate loop runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

When _propagate_attempted_models_to_ctx is called twice in one invocation and the incoming chain is a prefix-extension of the existing ctx chain, replace rather than append-with-dedup so the cloud prefix isn't recorded twice. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- track_cost: capture ctx.obj['attempted_models'] post-command regardless of snapshot state, so commands that create ctx.obj via ensure_object still record the chain. - llm_invoke: helper now called at most once per invocation. Drop prefix-extension shortcut; the per-element dedup loop suffices for cross-invocation composition and no longer collapses repeated chains within one Click command. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop per-element dedup in _propagate_attempted_models_to_ctx. Each llm_invoke chain is already pre-deduplicated; collapsing at the invocation boundary loses chronological history for commands that make multiple sibling llm_invoke calls reusing the same model (gpt-4 + gpt-4 must record both, not collapse to one). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The track_cost example was popping PYTEST_CURRENT_TEST at module top-level so it would write its CSV when run as a script. Under pytest that pop runs on import too and disables track_cost's CSV guard for the rest of the session. Move the pop inside main(). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Remove stale 'no duplicate consecutive model names' clause in llm_invoke_python.prompt:116; line 118's pure-concat rule is authoritative. - architecture.json track_cost returns is now Callable to match the prompt's interface block. - CSV migration writes a tempfile in the same directory and os.replaces atomically. Concurrent PDD processes can no longer corrupt a legacy cost.csv during first-run upgrade. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Worker threads' Click ctx is thread-local, so attempted_models propagation from inside _llm_invoke_cloud / candidate-loop didn't reach the main thread's ctx for parallel summarize_directory work. Capture attempted_models from each llm_result and aggregate at the main level via _propagate_attempted_models_to_ctx after the executor completes. Non-threaded code paths follow the same return-and-aggregate pattern for parity. auto_deps_main itself needs no code change — llm_invoke (via insert_includes) already best-effort propagates the chain to ctx on every success path per the round-4 'at most once' contract, so the @track_cost decorator on the auto-deps command already sees the chain. A regression test now pins this contract. Note: sync_orchestration calls auto_deps_main inside a @track_cost-decorated pdd sync, so the chain lands in the sync command's row — that's correct (sync paid for those calls). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cloud_fix_errors only parsed modelName/totalCost from the cloud response. The attemptedModels key (per issue #1086 cloud contract) was discarded, so cloud-only pdd fix runs wrote a single-model attempted_models column even when the server attempted fallbacks. Parse attemptedModels (with attempted_models / modelChain aliases and modelName fallback for non-conforming servers). Return as the 8th tuple slot, prefixed with cloud: to distinguish cloud attempts from local. In fix_error_loop's cloud branch, propagate the chain to the active Click ctx via _propagate_attempted_models_to_ctx so the @track_cost decorator on the outer pdd fix command records the full chain to cost.csv. Note: fix_main and commands/fix.py retain their legacy 6-/3-tuple return shapes — the spec's enriched-dict change had too large a blast radius (37+ callsites between sync_orchestration, pin_example_hack, and tests). ctx propagation achieves the same goal — @track_cost reads ctx.obj['attempted_models'] regardless of the result tuple shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- summarize_directory: only aggregate-and-propagate from worker threads, not in the serial path (llm_invoke's own ctx propagation already covers max_workers==1, so the parent saw 'm1;m1' for a single LLM call). - summarize_directory: capture e.attempted_models when a worker's llm_invoke raises, so failed-file rows still record the attempted chain instead of dropping it on the floor. - track_cost: serialize CSV migration + append with fcntl.flock (POSIX best-effort; Windows falls through unchanged). Prevents the lost-update race where two concurrent PDD processes hitting a legacy 6-column cost.csv would each migrate independently and the second os.replace would overwrite the first's appended row. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cloud /fixCode error paths (timeout, 4xx, 5xx) attach attempted_models=['cloud:auto'] to the raised RuntimeError so the caller can recover the chain. fix_error_loop's cloud branch now propagates e.attempted_models to ctx before falling back to local or re-raising. Without this, cost.csv recorded only the later local model (or stayed empty) on cloud failure, contradicting the issue #1086 contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second-pass human review flagged prompt-vs-code drift introduced by the round-9 cloud_fix_errors 8-tuple change and stale meta for summarize_directory. Resync: - fix_error_loop_python.prompt: cloud_fix_errors return is 8-tuple including attempted_models. - architecture.json: cloud_fix_errors signature updated to match the 8-tuple plus protect_tests / failure_classification params; fix_error_loop return updated from None to the real 6-tuple. - .pdd/meta/summarize_directory_python.json: refresh hashes and timestamp so a future `pdd sync` doesn't see false drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three findings from the third human review pass: - .pdd/meta/{llm_invoke,summarize_directory}_python.json now match PDD's own calculate_prompt_hash (incorporates <include> dep contents) and calculate_sha256 from sync_determine_operation. Previous round used raw sha256(prompt_bytes), which doesn't match PDD's hash when the prompt has <include> tags. Refreshed code/ test/timestamp too. - pdd/fix_error_loop.py: wrap a scalar attemptedModels response in a single-element list before the cloud: prefix comprehension. Without the guard, a non-conforming cloud server returning attemptedModels="gpt-4" would record ['cloud:g','cloud:p', 'cloud:t','cloud:-','cloud:4'] in the cost CSV. - pdd/track_cost.py: replace fcntl.flock with a cross-platform atomic lock file (O_CREAT|O_EXCL with retry and 60s stale-lock steal). POSIX and Windows now share the same serialization path for the legacy 6-column cost.csv migration race — no more unlocked-degraded mode on Windows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three findings from the 4th human review pass on the portable O_CREAT|O_EXCL lock introduced in the 3rd pass: - Finding 1: timeout fell through unlocked after only 5s. Retry budget is now 30s, AND falling through emits a yellow Rich warning so the user knows the write isn't serialized — but only when real contention was observed (saw FileExistsError during retry), not when the filesystem just doesn't support O_EXCL. _acquire_atomic_lock now returns (handle, contended) so the caller can distinguish. - Finding 2: a legitimate slow migration holding the lock past 60s was treated as stale and stolen. Staleness is now determined by PID liveness (os.kill(pid, 0) → ProcessLookupError) read from a nonce we write at acquire. mtime is only a long-threshold safety net (10 minutes) for malformed lock files with no parseable PID; a LIVE PID is never stolen regardless of mtime. - Finding 3: after a stale-steal, the original process's release would unlink the new owner's lock file, letting a third process enter the critical section. Each acquire writes a unique pid-uuid nonce; release reads the file back and only unlinks when its nonce still matches ours. Three new regression tests pin each finding's repro. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-pass review) Three findings from the 5th human review pass on the portable O_CREAT|O_EXCL lock: - Finding 1: 30s timeout fell through to unlocked write WHILE the original holder was still in the migration critical section, reopening the lost-row race. Lock is now only REQUIRED for the migration path (read-modify-write of the whole file); a pure append to an already-migrated file is safe without the lock thanks to POSIX O_APPEND atomicity for sub-PIPE_BUF writes. When migration is needed AND the lock can't be acquired, SKIP the cost row with a loud warning naming the file — better to lose one row than to corrupt the legacy CSV via unlocked migration. - Finding 2: os.open(O_CREAT|O_EXCL) followed by os.write(nonce) could leave a malformed lock file if the write was partial or if the process died between the two syscalls. Future processes would then wait the full 10-minute mtime threshold. Acquire now verifies the full nonce was written; on partial write it closes + unlinks + retries. - Finding 3: _release_atomic_lock returned silently when _read_lock_owner couldn't parse the file (mocked-open test reproduced this), leaving the lock artifact behind. Release now unlinks an unparseable lock that we created via O_EXCL — we hold the path, the malformed content is necessarily ours, and leaving it would block future PDD invocations for the stale window. Three new regression tests pin each finding's repro: - test_migration_skips_write_on_lock_failure - test_acquire_atomic_lock_retries_on_partial_nonce_write - test_release_atomic_lock_unlinks_malformed_lock Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…pass review) Three findings from the 6th human review pass on the pre-lock header peek and the lock-fail policy: - Finding 1: pre-lock peek caught only OSError. A malformed first row (csv.Error, e.g. unterminated quoted field) bubbled up to the outer except and the cost row was silently dropped. Catch (OSError, csv.Error, UnicodeDecodeError, StopIteration); treat peek failure as "migration unknown" and fall through to the append path so the new row still lands. csv.writer appends regardless of pre-existing content corruption. - Finding 2: same peek-narrow-except bug for non-UTF-8 bytes. UnicodeDecodeError now folded into the broadened except. - Finding 3: when migration is needed AND the filesystem doesn't support O_EXCL (saw_contention=False, OSError on os.open), the previous policy skipped the cost row indefinitely until manual migration — a known tradeoff that the reviewer asked us to document or accept explicitly. New policy: split the lock-fail modes. On real contention (saw_contention=True) still skip with the loud warning. On unsupported filesystem attempt UNLOCKED migration with a one-line tradeoff warning ("safe for single-process use; concurrent PDD writers on this filesystem can lose rows during legacy migration"). Single-process use is the dominant case on such filesystems. Three new regression tests pin each finding's repro: - test_peek_survives_malformed_csv_first_row - test_peek_survives_non_utf8_csv - test_unsupported_fs_lock_attempts_unlocked_migration Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Serhan-Asad · 2026-05-22T17:52:55Z

Recommending close as superseded.

The attempted_models cost column + fallback-chain tracking that this PR adds is already on main via #1056 (merge commit 656df8b30) and its follow-ups under #897. Main's implementation is broader:

pdd/track_cost.py + pdd/llm_invoke.py both implement attempted_models, with per-@track_cost-boundary scoping so the chain doesn't leak across chained Click commands.
pdd/summarize_directory.py propagates the chain across worker threads (concurrent paths sort by file-submission index for determinism).
README.md on main (line ~816) documents the column's semantics, ordering rules, and ; sanitization.

Re-basing this PR on main produces a no-op or a regression: git merge-tree reports content conflicts in 17 files, including README.md, architecture.json, pdd/llm_invoke.py, pdd/track_cost.py, pdd/summarize_directory.py, 3 prompts, both CLI examples, context/track_cost_example.py, 3 tests, and 2 .pdd/meta/*.json files — i.e. essentially the entire surface area of #1056.

In addition, the current diff has a few issues that would need to be fixed even before considering rebase:

pdd/llm_invoke.py:711-736 calls response.json().get(...) in every cloud error branch but only catches requests.exceptions.* afterwards. Non-JSON 5xx bodies raise ValueError and bypass the local fallback, with no attempted_models attached.
pdd/track_cost.py:367 writes an empty attempted_models cell for legacy 3-tuple returns, contradicting README.md's claim that single-model success contains one entry. context/track_cost_example.py:161 even documents the empty-cell behavior, which conflicts with the README.
pdd/track_cost.py:466 calls _acquire_atomic_lock on the pure-append path; under contention it can wait the full _LOCK_RETRY_MAX * _LOCK_RETRY_INTERVAL (~30s) before giving up, despite the comment claiming O_APPEND atomicity makes the lock optional.

Suggest closing this PR and #1086 as already resolved by #1056 / #897.

prompt-driven-github Bot mentioned this pull request May 19, 2026

Model used UI #1086

Open

greptile-apps Bot reviewed May 19, 2026

View reviewed changes

chore: PDD sync changes for #1086

9b49954

prompt-driven-github Bot changed the title ~~feat: track attempted models in cost CSV (#1086)~~ chore: PDD sync for #1086 May 19, 2026

pdd-bot added 2 commits May 19, 2026 18:26

fix: address codex review-loop findings

85c9880

fix: address codex review-loop findings

253f74a

Serhan-Asad and others added 21 commits May 19, 2026 12:58

fix: address checkup attempted models findings

c7f7af4

test: harden attempted-model cost tracking coverage

9df4957

test: isolate fallback cwd unit test from cloud auth

d46638d

fix: header-only peek on track_cost append (codex P2)

464b7ce

Avoid O(n) re-read of cost.csv on every command. Only load+rewrite rows when the existing header is missing 'attempted_models'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: PDD sync for #1086#1087

chore: PDD sync for #1086#1087
prompt-driven-github[bot] wants to merge 25 commits into
mainfrom
change/issue-1086

prompt-driven-github Bot commented May 19, 2026 •

edited by Serhan-Asad

Loading

Uh oh!

greptile-apps Bot left a comment

Uh oh!

prompt-driven-github Bot commented May 19, 2026

Uh oh!

Serhan-Asad commented May 19, 2026

Uh oh!

Serhan-Asad commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

prompt-driven-github Bot commented May 19, 2026 • edited by Serhan-Asad Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes Made

Verification

Review Checklist

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

prompt-driven-github Bot commented May 19, 2026

Step 7/8: Review Loop Final Report

Summary

Per-Reviewer Status

Findings

Fixer Rationale

Fixes Attempted

Uh oh!

Serhan-Asad commented May 19, 2026

Step 7/8: Review Loop Final Report

Summary

Per-Reviewer Status

Findings

Fixer Rationale

Fixes Attempted

Uh oh!

Serhan-Asad commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

prompt-driven-github Bot commented May 19, 2026 •

edited by Serhan-Asad

Loading