Skip to content

chore: PDD sync for #1086#1087

Open
prompt-driven-github[bot] wants to merge 25 commits into
mainfrom
change/issue-1086
Open

chore: PDD sync for #1086#1087
prompt-driven-github[bot] wants to merge 25 commits into
mainfrom
change/issue-1086

Conversation

@prompt-driven-github
Copy link
Copy Markdown
Contributor

@prompt-driven-github prompt-driven-github Bot commented May 19, 2026

Summary

Adds full attempted-model history to the cost CSV so users can see PDD fallback decision-making, including cloud attempts, local retries, failure paths, and successful GPT-5 Responses API calls.

Closes #1086

Changes Made

  • llm_invoke now returns and propagates attempted_models for successful calls, failures, cloud fallback, and Responses API success paths.
  • track_cost writes a new attempted_models CSV column, migrates existing 6-column CSVs, and scopes command-level attempted models per invocation.
  • Prompt sources, generated code, metadata, architecture docs, README, and examples are aligned with the new contract.
  • Regression coverage was added for command scoping, CSV migration, fallback chains, cloud pre-auth behavior, and Responses API success reporting.

Verification

  • GitHub PR checks: passing.
  • Focused local checks: tests/test_track_cost.py, tests/test_llm_invoke.py, tests/test_llm_invoke_retry_cost.py passed before the final isolation patch; final focused isolation run passed (5 passed).
  • make cloud-test on d46638dd5: completed with 65/77 tasks passing and 12 failing. The failures are cloud backend/credit-dependent integration paths (Insufficient credits, currentBalance: 10, and cloud-only command cases), not issue Model used UI #1086-specific regressions.

Review Checklist

  • Prompt syntax is valid
  • PDD conventions followed
  • Documentation is up to date
  • Generated code and metadata are in sync

Created by pdd change workflow; updated after review fixes

Record models attempted during fallback (not just the model that
ultimately succeeded) in the cost CSV. Adds an `attempted_models`
field to llm_invoke's return dict and serializes it as a new
semicolon-delimited column in track_cost's CSV output.

Closes #1086
@prompt-driven-github prompt-driven-github Bot mentioned this pull request May 19, 2026
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@prompt-driven-github prompt-driven-github Bot changed the title feat: track attempted models in cost CSV (#1086) chore: PDD sync for #1086 May 19, 2026
@prompt-driven-github
Copy link
Copy Markdown
Contributor Author

Step 7/8: Review Loop Final Report

PR: #1087
Issue: #1086
issue_aligned: unknown
reviewer-status: codex=findings claude=fixer fresh-final=missing
fresh-final-review: missing
max-rounds-reached: false
max-cost-reached: false
max-duration-reached: false

Summary

Fixer claude could not address codex's findings.

Per-Reviewer Status

Reviewer Status
codex findings
claude fixer
fresh-final missing

Findings

Severity Status Location Finding Required fix Reviewer
critical open pdd/llm_invoke.py:546 attempted_models leaks across commands in batch runs. Address the reviewer finding. codex
medium open pdd/prompts/llm_invoke_python.prompt:116 Prompt and generated code disagree on Click context ownership. Address the reviewer finding. codex
medium open .pdd/meta/llm_invoke_python.json:5 .pdd metadata for llm_invoke is stale after the sync. Address the reviewer finding. codex
low open error_log.txt:1 Generated debug log committed at repo root. Address the reviewer finding. codex

Fixer Rationale

  • pdd/llm_invoke.py:546: attempted_models leaks across commands in batch runs. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)
  • pdd/prompts/llm_invoke_python.prompt:116: Prompt and generated code disagree on Click context ownership. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)
  • .pdd/meta/llm_invoke_python.json:5: .pdd metadata for llm_invoke is stale after the sync. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)
  • error_log.txt:1: Generated debug log committed at repo root. (All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques)

Fixes Attempted

  • claude: success; verification=unverified; changed_files=README.md, pdd/llm_invoke.py, pdd/track_cost.py, tests/test_track_cost.py; Populated attempted_models from real fallback paths in llm_invoke (local + cloud), added Click-context propagation so cost tracking sees the chain without modifying every caller, migrated existing 6-column cost.csv files to add the new attempted_models header before appending rows, and clarified the README empty-state semantics. Added a migration test; updated two existing tests whose strict assert_called_once_with assertion no longer holds because the writer now also opens the CSV in read mode to detect header drift.
  • claude: success; verification=unverified; changed_files=pdd/llm_invoke.py, pdd/track_cost.py, error_log.txt; Hoisted attempted_models_chain above the cloud-vs-local split so cloud attempts ('cloud:auto', rewritten to 'cloud:<reported_model>' on success) are recorded and persist through fallback to local execution. Moved the local-candidate append from before _ensure_api_key() to after the credential check passes, so models skipped for missing keys never appear in the chain (matches prompt contract llm_invoke_python.prompt:111). Attached the chain via exc.attempted_models on both the exhaustion RuntimeError and the cloud-path InsufficientCreditsError, plus surfaced the chain via _propagate_attempted_models_to_ctx on failure. Updated track_cost so it writes a row when a command fails but a fallback chain is recorded (on the exception or ctx.obj), surfacing total-failure history. Removed trailing whitespace at track_cost.py:92. All 315 llm_invoke + track_cost tests pass. The 6 'cloud' suite failures observed during regression all reproduce on the unmodified base branch, so they are pre-existing and out of scope.
  • claude: failed; verification=unverified; changed_files=none; All agent providers failed: anthropic: Exit code 1: {"type":"result","subtype":"success","is_error":true,"api_error_status":429,"duration_ms":585,"duration_api_ms":0,"num_turns":1,"result":"You've hit your limit · resets 9:20pm (UTC)","stop_reason":"stop_sequence","session_id":"80041e05-fab9-49ab-bebd-6fdc7647803d","total_cost_usd":0,"usage":{"input_tokens":0,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"output_tokens":0,"server_tool_use":{"web_search_requests":0,"web_fetch_reques

@Serhan-Asad
Copy link
Copy Markdown
Collaborator

Step 7/8: Review Loop Final Report

PR: #1087
Issue: #1086
issue_aligned: false
active-reviewer: codex
reviewer-status: codex=findings claude=fixer fresh-final=missing
fresh-final-review: missing
max-rounds-reached: false
max-cost-reached: false
max-duration-reached: false

Summary

generated-code-only fix refused: pdd/llm_invoke.py is generated from pdd/prompts/llm_invoke_python.prompt. Update the prompt source or run the proper PDD sync path before re-running the review loop.

Per-Reviewer Status

Reviewer Status
codex findings
claude fixer
fresh-final missing

Findings

Severity Status Location Finding Required fix Reviewer
medium open pdd/llm_invoke.py:3798 Successful OpenAI GPT-5 Responses API invocations do not surface attempted_models, so cost.csv can still omit the model history for a successful LLM call. Before the Responses API success return, propagate attempted_models_chain to Click context and include 'attempted_models': list(attempted_models_chain) in the returned dict. Add a regression test for a gpt-5/litellm.responses success. codex
medium open pdd/llm_invoke.py:3053 Cloud authentication failures are recorded as a cloud model attempt even when no cloud HTTP request or cloud model attempt happened. Record the cloud sentinel only once an actual cloud request is about to be made, or remove the sentinel when _llm_invoke_cloud fails before the HTTP request. Keep the prompt and implementation aligned with the 'actual provider/cloud HTTP call' recording rule. codex
low open context/cli_example.py:184 The CLI examples still document the wrong attempted_models empty-state semantics. Update both CLI examples to say single-model successes contain the successful model, and the field is empty only when no LLM attempt was recorded. codex

Fixer Rationale

  • pdd/llm_invoke.py:3798: Successful OpenAI GPT-5 Responses API invocations do not surface attempted_models, so cost.csv can still omit the model history for a successful LLM call. (claude: fixed - Added _propagate_attempted_models_to_ctx(attempted_models_chain) and 'attempted_models': list(attempted_models_chain) to the Responses API success return. New regression test asserts both the returned dict field and ctx propagation.)
  • pdd/llm_invoke.py:3053: Cloud authentication failures are recorded as a cloud model attempt even when no cloud HTTP request or cloud model attempt happened. (claude: fixed - Removed the pre-append of 'cloud:auto' in the outer llm_invoke. _llm_invoke_cloud now defines attempted_models_record=['cloud:auto'] only just before requests.post and attaches it to any cloud-specific exception raised at/after the HTTP boundary. Caller merges e.attempted_models into the chain. Pre-HTTP JWT auth failures raise without the attribute, so the chain stays empty for them.)
  • context/cli_example.py:184: The CLI examples still document the wrong attempted_models empty-state semantics. (claude: fixed - Both context/cli_example.py and context/core/cli_example.py updated to state a normal single-model success contains a single entry (the successful model) and is empty only when no LLM attempt was recorded.)

Fixes Attempted

  • claude: success; verification=unverified; changed_files=none; Scoped ctx.obj['attempted_models'] per @track_cost boundary (fixes cross-command leak); stopped early cloud propagation in _llm_invoke_cloud so the outer normalized chain isn't duplicated; moved attempted_models_chain.append to immediately before each provider call so context-window-rejected models aren't counted; rewrote the llm_invoke prompt section to permit best-effort Click propagation; updated architecture.json signatures/return types for track_cost and extract_cost_and_model; regenerated .pdd/meta/llm_invoke_python.json hashes and include_deps; added attempted_models docs to both CLI examples; removed error_log.txt; added a real-Click chained-group regression test that fails without the fix and passes with it. tests/test_track_cost.py + tests/test_llm_invoke.py + tests/test_llm_invoke_retry_cost.py all green (320 passed).
  • claude: success; verification=unverified; changed_files=context/cli_example.py, context/core/cli_example.py, pdd/llm_invoke.py, tests/test_llm_invoke.py; Applied fixes for all three open findings. (1) Responses API success path now propagates attempted_models to ctx and includes the chain in the returned dict; added a regression test test_llm_invoke_responses_api_success_returns_attempted_models. (2) cloud:auto sentinel is no longer pre-appended in the outer caller. _llm_invoke_cloud only attaches attempted_models=['cloud:auto'] to exceptions raised at or after the HTTP boundary, so pre-HTTP failures (JWT auth) leave the chain untouched. Caller's except handlers now merge e.attempted_models into the chain instead of relying on a pre-append. (3) Both context/cli_example.py and context/core/cli_example.py updated to state single-model successes contain one entry; empty only when no LLM attempt was recorded. All 317 llm_invoke+track_cost tests and 21 cloud tests pass.

Serhan-Asad and others added 21 commits May 19, 2026 12:58
- Scope attempted_models to each @track_cost invocation: snapshot, reset,
  capture this command's chain, then restore the prior shared value so
  chained / batched runs cannot leak a previous command's fallback chain
  into the next CSV row (issue #1086 regression).
- Document the cloud /llmInvoke response contract requiring attemptedModels
  and add a multi-attempt cloud-response regression test.
- Restore include_deps entries for auth_service_example, token_counter,
  and core/cloud in .pdd/meta/llm_invoke_python.json.
- Sync architecture.json signature for extract_cost_and_model.
- Remove transient error_log.txt from repo root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move attempted_models append in llm_invoke.py from immediately after
  credential check to immediately before each litellm.responses /
  batch_completion / completion call. Models rejected by pre-call
  context-window validation no longer pollute the attempted_models chain.
  Added regression test
  TestContextWindowValidation::test_context_window_skipped_model_not_in_attempted_models.

- Strengthen llm_invoke prompt cloud contract: attemptedModels is now
  documented as REQUIRED in /llmInvoke success responses; the existing
  client-side fallback to [modelName] is preserved as a documented
  backward-compatibility degraded mode for non-conforming servers, and the
  cloud helper now emits a warning when the field is missing. Added test
  test_cloud_response_missing_attempted_models_warns_and_degrades.

- Update track_cost prompt to encode the snapshot/reset/restore behavior
  for ctx.obj['attempted_models'] so a future PDD sync cannot regenerate
  away the cross-command leak fix (#1086).

- Refresh .pdd/meta/llm_invoke_python.json (prompt_hash, code_hash,
  test_hash, test_files entry, timestamp) so drift checks audit the
  committed dev unit correctly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Avoid O(n) re-read of cost.csv on every command. Only load+rewrite
rows when the existing header is missing 'attempted_models'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Attach attempted_models_chain to ctx and any pre-candidate-loop
exception so track_cost can record cloud attempts even when local
setup fails before the candidate loop runs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When _propagate_attempted_models_to_ctx is called twice in one
invocation and the incoming chain is a prefix-extension of the
existing ctx chain, replace rather than append-with-dedup so the
cloud prefix isn't recorded twice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- track_cost: capture ctx.obj['attempted_models'] post-command
  regardless of snapshot state, so commands that create ctx.obj
  via ensure_object still record the chain.
- llm_invoke: helper now called at most once per invocation. Drop
  prefix-extension shortcut; the per-element dedup loop suffices
  for cross-invocation composition and no longer collapses
  repeated chains within one Click command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop per-element dedup in _propagate_attempted_models_to_ctx.
Each llm_invoke chain is already pre-deduplicated; collapsing at
the invocation boundary loses chronological history for commands
that make multiple sibling llm_invoke calls reusing the same
model (gpt-4 + gpt-4 must record both, not collapse to one).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The track_cost example was popping PYTEST_CURRENT_TEST at module
top-level so it would write its CSV when run as a script. Under
pytest that pop runs on import too and disables track_cost's CSV
guard for the rest of the session. Move the pop inside main().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Remove stale 'no duplicate consecutive model names' clause in
  llm_invoke_python.prompt:116; line 118's pure-concat rule is
  authoritative.
- architecture.json track_cost returns is now Callable to match
  the prompt's interface block.
- CSV migration writes a tempfile in the same directory and
  os.replaces atomically. Concurrent PDD processes can no longer
  corrupt a legacy cost.csv during first-run upgrade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker threads' Click ctx is thread-local, so attempted_models
propagation from inside _llm_invoke_cloud / candidate-loop didn't
reach the main thread's ctx for parallel summarize_directory work.
Capture attempted_models from each llm_result and aggregate at
the main level via _propagate_attempted_models_to_ctx after the
executor completes. Non-threaded code paths follow the same
return-and-aggregate pattern for parity.

auto_deps_main itself needs no code change — llm_invoke (via
insert_includes) already best-effort propagates the chain to
ctx on every success path per the round-4 'at most once'
contract, so the @track_cost decorator on the auto-deps command
already sees the chain. A regression test now pins this contract.

Note: sync_orchestration calls auto_deps_main inside a
@track_cost-decorated pdd sync, so the chain lands in the sync
command's row — that's correct (sync paid for those calls).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cloud_fix_errors only parsed modelName/totalCost from the cloud
response. The attemptedModels key (per issue #1086 cloud contract)
was discarded, so cloud-only pdd fix runs wrote a single-model
attempted_models column even when the server attempted fallbacks.

Parse attemptedModels (with attempted_models / modelChain aliases
and modelName fallback for non-conforming servers). Return as the
8th tuple slot, prefixed with cloud: to distinguish cloud attempts
from local. In fix_error_loop's cloud branch, propagate the chain
to the active Click ctx via _propagate_attempted_models_to_ctx so
the @track_cost decorator on the outer pdd fix command records
the full chain to cost.csv.

Note: fix_main and commands/fix.py retain their legacy 6-/3-tuple
return shapes — the spec's enriched-dict change had too large a
blast radius (37+ callsites between sync_orchestration,
pin_example_hack, and tests). ctx propagation achieves the same
goal — @track_cost reads ctx.obj['attempted_models'] regardless
of the result tuple shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- summarize_directory: only aggregate-and-propagate from worker
  threads, not in the serial path (llm_invoke's own ctx propagation
  already covers max_workers==1, so the parent saw 'm1;m1' for a
  single LLM call).
- summarize_directory: capture e.attempted_models when a worker's
  llm_invoke raises, so failed-file rows still record the attempted
  chain instead of dropping it on the floor.
- track_cost: serialize CSV migration + append with fcntl.flock
  (POSIX best-effort; Windows falls through unchanged). Prevents the
  lost-update race where two concurrent PDD processes hitting a
  legacy 6-column cost.csv would each migrate independently and the
  second os.replace would overwrite the first's appended row.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloud /fixCode error paths (timeout, 4xx, 5xx) attach
attempted_models=['cloud:auto'] to the raised RuntimeError so the
caller can recover the chain. fix_error_loop's cloud branch now
propagates e.attempted_models to ctx before falling back to local
or re-raising. Without this, cost.csv recorded only the later
local model (or stayed empty) on cloud failure, contradicting the
issue #1086 contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second-pass human review flagged prompt-vs-code drift introduced by
the round-9 cloud_fix_errors 8-tuple change and stale meta for
summarize_directory. Resync:

- fix_error_loop_python.prompt: cloud_fix_errors return is 8-tuple
  including attempted_models.
- architecture.json: cloud_fix_errors signature updated to match
  the 8-tuple plus protect_tests / failure_classification params;
  fix_error_loop return updated from None to the real 6-tuple.
- .pdd/meta/summarize_directory_python.json: refresh hashes and
  timestamp so a future `pdd sync` doesn't see false drift.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the third human review pass:

- .pdd/meta/{llm_invoke,summarize_directory}_python.json now match
  PDD's own calculate_prompt_hash (incorporates <include> dep
  contents) and calculate_sha256 from sync_determine_operation.
  Previous round used raw sha256(prompt_bytes), which doesn't match
  PDD's hash when the prompt has <include> tags. Refreshed code/
  test/timestamp too.

- pdd/fix_error_loop.py: wrap a scalar attemptedModels response in
  a single-element list before the cloud: prefix comprehension.
  Without the guard, a non-conforming cloud server returning
  attemptedModels="gpt-4" would record ['cloud:g','cloud:p',
  'cloud:t','cloud:-','cloud:4'] in the cost CSV.

- pdd/track_cost.py: replace fcntl.flock with a cross-platform
  atomic lock file (O_CREAT|O_EXCL with retry and 60s stale-lock
  steal). POSIX and Windows now share the same serialization path
  for the legacy 6-column cost.csv migration race — no more
  unlocked-degraded mode on Windows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from the 4th human review pass on the portable
O_CREAT|O_EXCL lock introduced in the 3rd pass:

- Finding 1: timeout fell through unlocked after only 5s. Retry
  budget is now 30s, AND falling through emits a yellow Rich
  warning so the user knows the write isn't serialized — but only
  when real contention was observed (saw FileExistsError during
  retry), not when the filesystem just doesn't support O_EXCL.
  _acquire_atomic_lock now returns (handle, contended) so the
  caller can distinguish.

- Finding 2: a legitimate slow migration holding the lock past 60s
  was treated as stale and stolen. Staleness is now determined by
  PID liveness (os.kill(pid, 0) → ProcessLookupError) read from a
  nonce we write at acquire. mtime is only a long-threshold safety
  net (10 minutes) for malformed lock files with no parseable PID;
  a LIVE PID is never stolen regardless of mtime.

- Finding 3: after a stale-steal, the original process's release
  would unlink the new owner's lock file, letting a third process
  enter the critical section. Each acquire writes a unique
  pid-uuid nonce; release reads the file back and only unlinks
  when its nonce still matches ours.

Three new regression tests pin each finding's repro.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-pass review)

Three findings from the 5th human review pass on the portable
O_CREAT|O_EXCL lock:

- Finding 1: 30s timeout fell through to unlocked write WHILE the
  original holder was still in the migration critical section,
  reopening the lost-row race. Lock is now only REQUIRED for the
  migration path (read-modify-write of the whole file); a pure
  append to an already-migrated file is safe without the lock
  thanks to POSIX O_APPEND atomicity for sub-PIPE_BUF writes. When
  migration is needed AND the lock can't be acquired, SKIP the
  cost row with a loud warning naming the file — better to lose
  one row than to corrupt the legacy CSV via unlocked migration.

- Finding 2: os.open(O_CREAT|O_EXCL) followed by os.write(nonce)
  could leave a malformed lock file if the write was partial or
  if the process died between the two syscalls. Future processes
  would then wait the full 10-minute mtime threshold. Acquire now
  verifies the full nonce was written; on partial write it
  closes + unlinks + retries.

- Finding 3: _release_atomic_lock returned silently when
  _read_lock_owner couldn't parse the file (mocked-open test
  reproduced this), leaving the lock artifact behind. Release now
  unlinks an unparseable lock that we created via O_EXCL — we
  hold the path, the malformed content is necessarily ours, and
  leaving it would block future PDD invocations for the stale
  window.

Three new regression tests pin each finding's repro:
  - test_migration_skips_write_on_lock_failure
  - test_acquire_atomic_lock_retries_on_partial_nonce_write
  - test_release_atomic_lock_unlinks_malformed_lock

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…pass review)

Three findings from the 6th human review pass on the pre-lock
header peek and the lock-fail policy:

- Finding 1: pre-lock peek caught only OSError. A malformed first
  row (csv.Error, e.g. unterminated quoted field) bubbled up to the
  outer except and the cost row was silently dropped. Catch
  (OSError, csv.Error, UnicodeDecodeError, StopIteration); treat
  peek failure as "migration unknown" and fall through to the
  append path so the new row still lands. csv.writer appends
  regardless of pre-existing content corruption.

- Finding 2: same peek-narrow-except bug for non-UTF-8 bytes.
  UnicodeDecodeError now folded into the broadened except.

- Finding 3: when migration is needed AND the filesystem doesn't
  support O_EXCL (saw_contention=False, OSError on os.open), the
  previous policy skipped the cost row indefinitely until manual
  migration — a known tradeoff that the reviewer asked us to
  document or accept explicitly. New policy: split the lock-fail
  modes. On real contention (saw_contention=True) still skip with
  the loud warning. On unsupported filesystem attempt UNLOCKED
  migration with a one-line tradeoff warning ("safe for
  single-process use; concurrent PDD writers on this filesystem
  can lose rows during legacy migration"). Single-process use is
  the dominant case on such filesystems.

Three new regression tests pin each finding's repro:
  - test_peek_survives_malformed_csv_first_row
  - test_peek_survives_non_utf8_csv
  - test_unsupported_fs_lock_attempts_unlocked_migration

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Serhan-Asad
Copy link
Copy Markdown
Collaborator

Recommending close as superseded.

The attempted_models cost column + fallback-chain tracking that this PR adds is already on main via #1056 (merge commit 656df8b30) and its follow-ups under #897. Main's implementation is broader:

  • pdd/track_cost.py + pdd/llm_invoke.py both implement attempted_models, with per-@track_cost-boundary scoping so the chain doesn't leak across chained Click commands.
  • pdd/summarize_directory.py propagates the chain across worker threads (concurrent paths sort by file-submission index for determinism).
  • README.md on main (line ~816) documents the column's semantics, ordering rules, and ; sanitization.

Re-basing this PR on main produces a no-op or a regression: git merge-tree reports content conflicts in 17 files, including README.md, architecture.json, pdd/llm_invoke.py, pdd/track_cost.py, pdd/summarize_directory.py, 3 prompts, both CLI examples, context/track_cost_example.py, 3 tests, and 2 .pdd/meta/*.json files — i.e. essentially the entire surface area of #1056.

In addition, the current diff has a few issues that would need to be fixed even before considering rebase:

  • pdd/llm_invoke.py:711-736 calls response.json().get(...) in every cloud error branch but only catches requests.exceptions.* afterwards. Non-JSON 5xx bodies raise ValueError and bypass the local fallback, with no attempted_models attached.
  • pdd/track_cost.py:367 writes an empty attempted_models cell for legacy 3-tuple returns, contradicting README.md's claim that single-model success contains one entry. context/track_cost_example.py:161 even documents the empty-cell behavior, which conflicts with the README.
  • pdd/track_cost.py:466 calls _acquire_atomic_lock on the pure-append path; under contention it can wait the full _LOCK_RETRY_MAX * _LOCK_RETRY_INTERVAL (~30s) before giving up, despite the comment claiming O_APPEND atomicity makes the lock optional.

Suggest closing this PR and #1086 as already resolved by #1056 / #897.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Model used UI

2 participants