fix(checkup): bounded PR-head freshness lease for pdd checkup --pr (#1116)#1134
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
Step 7/8: Verification & Final Report (Iteration 1)Test Results After Fixes
Build StatusPass — Overall StatusAll clear — Steps 6.1 (artifact removal), 6.2 (artifact-hygiene regression), 6.3 (freshness-lease integration tests) and the Issue Alignmentissue_aligned: true — The PR's commit history ( Issues Summary
Checkup complete. All Issues Fixed {
"success": true,
"message": "PR #1134 substantively implements the bounded PR-head freshness lease for `pdd checkup --pr` requested by #1116 (capture-then-recheck PR head SHA, bounded MAX_PR_HEAD_REFRESHES reruns with sidecar counter, rebase-conflict classification, no force-push). Step 7 verification: build passes; 460 checkup-scoped tests pass (1 skipped); 19 focused tests (4 new freshness-lease integration tests + 2 new artifact-hygiene tests + 13 existing/modified agentic_crash tests) pass. Iteration-1 fixes removed a stray `new_file.txt` artifact at repo root, fixed `test_file_system_changes_detected` to chdir into tmp_path so it can't recreate that artifact, and added regression coverage pinning both the no-stray-artifact invariant and the orchestrator↔review-loop 2-tuple contract / sidecar isolation / exhaustion-SHA-chain behavior.",
"tech_stack": ["python", "pytest", "click", "pydantic"],
"issue_aligned": true,
"issues": [
{
"module": "repo_root",
"file": "new_file.txt",
"severity": "low",
"category": "artifact",
"description": "Stray empty new_file.txt left at repo root by an earlier checkup run on a different issue.",
"fixed": true,
"fix_description": "Removed the untracked empty file in Step 6.1."
},
{
"module": "tests",
"file": "tests/test_agentic_crash.py",
"severity": "low",
"category": "test_failure",
"description": "test_file_system_changes_detected used Path.cwd() (real repo root) for the mocked agent side effect that writes new_file.txt, which could leave a stray artifact at the repo root if the test was run from the repo.",
"fixed": true,
"fix_description": "Added monkeypatch.chdir(tmp_path) so the mocked side effect writes into the test sandbox; documented the rationale referencing issue #1116."
},
{
"module": "tests",
"file": "tests/test_checkup_artifact_hygiene.py",
"severity": "low",
"category": "test_failure",
"description": "No regression test pinned the 'no stray new_file.txt / empty top-level .txt artifacts at repo root' invariant.",
"fixed": true,
"fix_description": "Added 2 parametrized assertions guarding against stray artifacts at the repo root."
},
{
"module": "tests",
"file": "tests/test_checkup_freshness_lease_integration.py",
"severity": "low",
"category": "interface_mismatch",
"description": "No cross-module integration coverage of the orchestrator↔review-loop 2-tuple contract for _commit_and_push_if_changed, sidecar refresh-counter isolation from clear_workflow_state, or the budget-exhaustion message's observed-SHA chain.",
"fixed": true,
"fix_description": "Added 4 integration tests covering the 2-tuple contract, helper-wrapper equivalence, sidecar isolation from workflow-state-dir clearing, and the SHA-chain content of the exhaustion error message."
}
],
"changed_files": [
"tests/test_agentic_crash.py",
"tests/test_checkup_artifact_hygiene.py",
"tests/test_checkup_freshness_lease_integration.py"
]
}PR Push StatusPushed fixes to PR branch. |
Issue #1116. `pdd checkup --pr` previously fail-closed when the remote PR head advanced during a run and the local fix commit couldn't be cleanly rebased onto the new head. The operator had to rerun manually. Now bounded auto-rerun: when the head advances mid-checkup, discard the stale work and restart the orchestrator against the new head, capped at MAX_PR_HEAD_REFRESHES=2. Past that budget the wrapper returns a clear failure listing observed SHA transitions. Design — rename-and-wrap (minimal blast radius): - run_agentic_checkup_orchestrator becomes a small public wrapper. Non-PR mode is pass-through; PR mode loops around the inner. - The renamed _run_agentic_checkup_orchestrator_inner gets two checkpoints that raise _PRHeadAdvancedRestart on stale-head detection: A) after Step 7's gate passes (fix mode), before the push block. B) inside `if not push_ok:`, only when the push message indicates a rebase failure AND a fresh refetch confirms the head moved (auth/ network errors never trigger a rerun). - Counter persists in `.pdd/checkup-pr-{N}/pr_head_refreshes` so a Ctrl-C + manual resume can't bypass the bound. Cleared on success. - --no-fix stays fail-closed (existing _finalize SHA-mismatch logic). - _commit_and_push_if_changed return shape stays (bool, str). - _run_final_checkup_on_pr in agentic_e2e_fix_orchestrator.py stays fail-closed as the outer safety net. Tests: 7 new tests under TestPrHeadAdvanceAutoRerun covering both checkpoints, budget exhaustion, --no-fix no-consume, auth-failure no-trigger, counter clear-on-success, and counter persistence. All 175 tests in test_agentic_checkup_orchestrator + test_checkup_pr_mode pass with no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three findings from codex's first review of the PR-head freshness lease implementation. 1. Checkpoint B classifier missed the case where the push helper's 3-attempt retry loop exhausts on remote-advance: the final return is `Failed to push fixes to PR branch: <git stderr>`, not the rebase-specific strings. Extended the classifier to also match that generic message when `_is_remote_advanced_push_error` recognizes the embedded git markers. 2. Added Checkpoint C: when `_commit_and_push_if_changed` returns "No changes to push." / "No eligible changes to push.", neither the rebase path nor `_run_post_push_reverify_if_needed` runs. The head could have advanced between Checkpoint A and the final report. Checkpoint C refetches just before the report and restarts if the head moved. 3. Added `test_refresh_counter_resumes_from_pre_seeded_value` which pre-seeds the sidecar at 1, forces one restart, and asserts exhaustion. Locks down that the wrapper actually loads the persisted counter on entry (the existing persistence test only verified the file is written). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ssifier
Three important findings + one nit from codex's second review of the
PR-head freshness lease implementation.
1. Outer wrapper now calls clear_workflow_state(...) explicitly when
catching _PRHeadAdvancedRestart. Previously it relied on the inner's
identity guard to discard stale cache via SHA mismatch; a flaky
refetch returning the old SHA could defeat the rerun. The exception
already proves the head advanced — don't depend on a fresh fetch to
re-prove it.
2. Checkpoint B classifier extended to recognize "Failed to reset to
original fixer commit before rebase: ..." from
_rebase_onto_updated_pr_head. That message is only reachable after
a remote-advanced push, so when the fresh-refetch SHA confirms
advance, it should follow the same bounded restart path.
3. Three new tests:
- Generic non-fast-forward push failure triggers restart via the
_is_remote_advanced_push_error branch (round-1 added the classifier
branch but no test exercised it).
- "No changes to push" head-advance before final report restarts and
does NOT post a stale Step 7 verdict (Checkpoint C coverage).
- Empty initial pr_head_sha fail-degrades to current behavior; no
restart, no budget consumption.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex round-3 finding. PDD treats prompts as source of truth; the generated module pdd/agentic_checkup_orchestrator.py now contains the pdd/prompts/agentic_checkup_orchestrator_python.prompt described the pre-#1116 behavior. A future pdd sync would regenerate from the stale prompt and drop the fix. Added a sub-bullet under Requirement 7 (PR-mode) describing: - MAX_PR_HEAD_REFRESHES bound and the three freshness checkpoints (A: after Step 7 gate, B: on rebase-related push failure with fresh-refetch confirmation, C: after no-change push outcome). - Rename-and-wrap structure with the internal _PRHeadAdvancedRestart exception caught by the outer wrapper. - Explicit clear_workflow_state on restart so a flaky refetch cannot defeat the rerun via the identity guard. - Sidecar counter at .pdd/checkup-pr-{N}/pr_head_refreshes (outside workflow-state dir so clear_workflow_state does not nuke it). - Budget exhaustion diagnostic listing observed SHA transitions. - --no-fix mode does not consume restart budget; the lease is fix-mode only. No production code change; tests unchanged (180/180 still passing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
F1: --no-fix --pr mode now refetches the remote PR head after Step 7's gate passes and fails closed (unverified verdict) if the head advanced during the run. Previously _finalize's fail-closed downgrade only runs under --review-loop, leaving --no-fix open to publishing a stale clean verdict. F2: _PRHeadAdvancedRestart now carries step_comments_set; the outer wrapper preserves it across restart iterations via _carried_step_comments so per-step issue comments aren't re-posted after clear_workflow_state wipes the stored field. F3: Added _force_skip_state_load kwarg to the inner; set True by the outer on restart iterations so a flaky GitHub state DELETE cannot reload stale cached step outputs on the next inner call. Tests: 3 new test scenarios in TestPrHeadAdvanceAutoRerun. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
F1a: fail-closed when freshness refetch returns empty SHA (unknown freshness). Previously an exception-swallowed refetch silently let the clean Step 7 verdict through; now "could not retrieve" triggers the same downgrade path. F1b: posted report body now includes the downgrade reason prepended to the Step 7 output so the PR/issue thread shows WHY the verdict was invalidated, not just the clean output that no longer applies. Prompt: documents --no-fix fail-closed behavior (both advance and unknown freshness), _carried_step_comments, and _force_skip_state_load so future pdd sync regeneration preserves all three guarantees. Tests: 2 new tests — unknown-freshness fail-closed, report body includes "Verdict downgraded" + SHA transition text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
External review filed three gaps on PR #1134 after rebase onto current main. All three reproduce on the rebased branch and are closed here. Finding #1 (rebase regression): main added test_scope and start_step_override params to _run_agentic_checkup_orchestrator_inner and threads them from agentic_checkup.run_agentic_checkup, but the new public wrapper run_agentic_checkup_orchestrator was added in this PR without those kwargs. The rebase merges cleanly but the call from agentic_checkup.py would TypeError at runtime. Fix: add test_scope and start_step_override to the wrapper signature; forward to both inner call sites (non-PR pass-through and PR-mode restart loop). Finding #2 (Checkpoint A2 gap): the registry-edit and prompt-source guard refusal paths each post the canonical Step 7 verdict via _post_pr_mode_final_report before returning. Between Checkpoint A's refetch (line 1953) and these refusal posts (lines 2298 / 2333) the remote head can advance, leaving the Step 7 verdict belonging to the OLD head. Fix: reuse the pr_metadata fetch already at the top of the guards-and-push block (no extra GitHub I/O) and raise _PRHeadAdvancedRestart on mismatch — symmetric with Checkpoint A. Finding #3 (entry-SHA empty gap): the --no-fix post-Step-7 freshness check at line 1996 was gated on a non-empty current_pr_head_sha. If the entry _fetch_pr_metadata returned empty (transient gh hiccup) but the post-run refetch succeeded with a non-empty SHA, the clean Step 7 verdict slipped through as "fresh" even though we had no baseline to compare against. Fix: add a third elif branch — when current_pr_head_sha is empty but nofix_fresh_sha is non-empty, fail-closed with a diagnostic explaining the entry SHA was unavailable. Tests: 2 new tests in TestPrHeadAdvanceAutoRerun — test_pr_head_advance_at_guards_block_triggers_rerun (A2 restart with exact metadata-call accounting) and test_no_fix_empty_entry_sha_fails_closed (empty entry + non-empty fresh -> downgrade, no budget consumed). Updated _run_orchestrator_capturing_context in test_agentic_checkup_orchestrator.py to mock _fetch_pr_metadata so the new no-fix freshness check sees stable SHAs (previously the gh-CLI call failed silently in the sandbox, which the pre-rebase no-fix path did not exercise). Prompt: updated agentic_checkup_orchestrator_python.prompt with the Checkpoint A2 bullet and the three-condition --no-fix downgrade so a future pdd sync regenerates with the fix in place. LC_ALL=C pytest tests/test_agentic_checkup_orchestrator.py tests/test_checkup_pr_mode.py -m "not integration and not e2e and not real" --timeout=60 -> 204/204 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
31b041c to
ecdfde0
Compare
External review filed three follow-up gaps on PR #1134 after round-4: Finding 1 (successful push paths still have stale-report race): the round-1 Checkpoint C only fired on "No changes to push." / "No eligible changes to push.", leaving normal push and post-rebase reverify-passed paths open. If the remote head advances after our push but before _post_pr_mode_final_report, the canonical Step 7 verdict belongs to an older head than what the PR thread now shows. Fix: broaden to Checkpoint D — fire on every successful push outcome (failure paths are already covered by Checkpoint B). Baseline is _git_rev_parse_head(worktree_path) (the local pushed SHA) instead of current_pr_head_sha — after a real push the remote head moves BECAUSE of us, so comparing against entry SHA would false-positive every push. Local HEAD equals our pushed commit on normal/rebase paths and equals entry SHA on the no-change path, so Checkpoint D subsumes the round-1 Checkpoint C. Finding 2 (fix-mode disables freshness lease when entry SHA empty): every fix-mode checkpoint (A, A2, B, D) gates on a non-empty current_pr_head_sha. If the entry _fetch_pr_metadata returned empty (transient gh hiccup) the entire lease silently disabled, opening the same stale-verdict hole the lease was built to close. Fix: fail-closed at fix-mode PR-mode entry when current_pr_head_sha is empty. Symmetric with the round-3 --no-fix post-Step-7 fail-closed downgrade — the user retries once gh recovers. Finding 3 (README missing user-visible rerun behavior): the new auto-rerun affects latency/cost/recovery; operators need to recognize the budget-exhaustion failure surface and know where the sidecar lives. Fix: README section under PR Mode covering auto-rerun cap=2, sidecar path (.pdd/checkup-pr-{N}/pr_head_refreshes), budget-exhaustion diagnostic format, fix-mode entry-SHA fail-closed, and --no-fix fail-closed semantics. Tests: 1 new test (test_pr_head_advance_after_successful_real_push_ triggers_rerun) covering Checkpoint D on a real push followed by third-party advance; existing test_empty_entry_pr_head_sha_does_not_ consume_refresh_budget renamed and rewritten as test_empty_entry_pr_head_sha_fix_mode_fails_closed for the new fail-closed semantics. Updated metadata_mock.call_count and _git_rev_parse_head side_effects in three existing tests because Checkpoint D adds one metadata fetch per successful push iteration and expects local HEAD to equal the pushed SHA. Parametrized test_resume_discards_cache_when_either_side_sha_is_empty switched to no_fix=True so the new fix-mode entry-SHA gate does not mask the state-identity guard behavior under test; added a fetch mock to test_step_8_skipped_in_pr_mode and test_pr_mode_commits_and_pushes_ generated_fixes. Prompt source: agentic_checkup_orchestrator_python.prompt updated with Checkpoint D documentation and the fix-mode entry-SHA fail-closed contract so a future pdd sync regenerates with both guarantees. LC_ALL=C pytest tests/test_agentic_checkup_orchestrator.py tests/test_checkup_pr_mode.py -m "not integration and not e2e and not real" --timeout=60 -> 205/205 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloud-test on the round-5 commit (9dc17b2) passed 77/77 chunks with no failures. Captured durations refresh the chunk-balancing input so future runs distribute load against the current PR-mode test surface (the round-3/4/5 commits added ~200+ new test cases in test_checkup_pr_mode.py + test_agentic_checkup_orchestrator.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes #1116.
pdd checkup --prnow auto-reruns up to twice when the remote PR head advances mid-checkup, instead of failing closed and forcing operator manual rerun.run_agentic_checkup_orchestratoradds three freshness checkpoints (A after Step 7 gate, B on rebase-related push failures, C after no-change push outcomes). Each checkpoint refetches PR metadata and restarts the inner orchestrator from scratch when the head moved, capped atMAX_PR_HEAD_REFRESHES = 2. The restart counter persists in a sidecar file outside workflow state so a Ctrl-C + manual resume cannot bypass the bound._commit_and_push_if_changedreturn shape (still(bool, str)),pdd/checkup_review_loop.py,pdd/agentic_e2e_fix_orchestrator.py(_run_final_checkup_on_prstays fail-closed as the outer safety net), README, CHANGELOG. No force-push behavior added.--no-fixmode stays fail-closed via the existing_finalizeSHA-mismatch downgrade; it does NOT consume restart budget.Why this design (vs PR #1121)
The pre-existing PR #1121 wraps the entire orchestrator in
while True:, changes the push helper's return-tuple shape, and touches checkup_review_loop / README / CHANGELOG (2010+/1732- LOC). This PR uses a minimal rename-and-wrap (~1265 lines including 863 lines of tests):defrename to_run_agentic_checkup_orchestrator_inner._PRHeadAdvancedRestartexception.Codex (gpt-5.1-codex) reviewed 4 rounds and signed off (
VERDICT: ship).Commits (4)
f7ea6cf6ffeat(checkup): bounded PR-head freshness lease for pdd checkup --prdf003d79bfix(checkup): codex round-1 — close Checkpoint B/C gaps for Make pdd checkup --pr rerun when PR head advances mid-checkup #111611535aed5fix(checkup): codex round-2 — explicit state clear + rebase-reset classifierec23e9a5bdocs(checkup): sync prompt source for Make pdd checkup --pr rerun when PR head advances mid-checkup #1116 freshness leaseTest plan
TestPrHeadAdvanceAutoReruncovering all three checkpoints, budget exhaustion, sidecar persistence, sidecar resume-from-disk,--no-fixno-consume, auth-failure no-trigger, empty-SHA fail-degrade.LC_ALL=C pytest tests/test_agentic_checkup_orchestrator.py tests/test_checkup_pr_mode.py -m "not integration and not e2e and not real" --timeout=60→ 180/180 passing.pdd checkup --prrun while an external commit lands on the PR branch mid-run; verify the rerun fires once and a clean second pass completes.Notes
pdd/prompts/agentic_checkup_orchestrator_python.promptupdated for parity so a futurepdd synccannot regenerate without the fix..pdd/checkup-pr-{pr_number}/pr_head_refreshes— outside.pdd/checkup-statesoclear_workflow_statenever deletes the counter.🤖 Generated with Claude Code