fix(checkup): bounded PR-head freshness lease for pdd checkup --pr (#1116) by Serhan-Asad · Pull Request #1134 · promptdriven/pdd

Serhan-Asad · 2026-05-21T22:43:05Z

Summary

Closes #1116. pdd checkup --pr now auto-reruns up to twice when the remote PR head advances mid-checkup, instead of failing closed and forcing operator manual rerun.

What changed: rename-and-wrap pattern around run_agentic_checkup_orchestrator adds three freshness checkpoints (A after Step 7 gate, B on rebase-related push failures, C after no-change push outcomes). Each checkpoint refetches PR metadata and restarts the inner orchestrator from scratch when the head moved, capped at MAX_PR_HEAD_REFRESHES = 2. The restart counter persists in a sidecar file outside workflow state so a Ctrl-C + manual resume cannot bypass the bound.
What did NOT change: _commit_and_push_if_changed return shape (still (bool, str)), pdd/checkup_review_loop.py, pdd/agentic_e2e_fix_orchestrator.py (_run_final_checkup_on_pr stays fail-closed as the outer safety net), README, CHANGELOG. No force-push behavior added.
--no-fix mode stays fail-closed via the existing _finalize SHA-mismatch downgrade; it does NOT consume restart budget.

Why this design (vs PR #1121)

The pre-existing PR #1121 wraps the entire orchestrator in while True:, changes the push helper's return-tuple shape, and touches checkup_review_loop / README / CHANGELOG (2010+/1732- LOC). This PR uses a minimal rename-and-wrap (~1265 lines including 863 lines of tests):

The 1168-line inner body gets a one-line def rename to _run_agentic_checkup_orchestrator_inner.
A new ~130-line public wrapper holds the bounded restart loop and catches an internal _PRHeadAdvancedRestart exception.
Two checkpoints inside the inner raise on stale-head detection.
A third checkpoint guards the no-change-push window between Checkpoint A and the canonical report.

Codex (gpt-5.1-codex) reviewed 4 rounds and signed off (VERDICT: ship).

Commits (4)

f7ea6cf6f feat(checkup): bounded PR-head freshness lease for pdd checkup --pr
df003d79b fix(checkup): codex round-1 — close Checkpoint B/C gaps for Make pdd checkup --pr rerun when PR head advances mid-checkup #1116
11535aed5 fix(checkup): codex round-2 — explicit state clear + rebase-reset classifier
ec23e9a5b docs(checkup): sync prompt source for Make pdd checkup --pr rerun when PR head advances mid-checkup #1116 freshness lease

Test plan

12 new tests in TestPrHeadAdvanceAutoRerun covering all three checkpoints, budget exhaustion, sidecar persistence, sidecar resume-from-disk, --no-fix no-consume, auth-failure no-trigger, empty-SHA fail-degrade.
LC_ALL=C pytest tests/test_agentic_checkup_orchestrator.py tests/test_checkup_pr_mode.py -m "not integration and not e2e and not real" --timeout=60 → 180/180 passing.
CI greens on the full suite.
Manual smoke: drive a pdd checkup --pr run while an external commit lands on the PR branch mid-run; verify the rerun fires once and a clean second pass completes.

Notes

Supersedes PR Make pdd checkup --pr rerun when PR head advances mid-checkup #1121 (over-scoped — leave it for the owner to close or transplant any useful test coverage).
Prompt source pdd/prompts/agentic_checkup_orchestrator_python.prompt updated for parity so a future pdd sync cannot regenerate without the fix.
Sidecar path is .pdd/checkup-pr-{pr_number}/pr_head_refreshes — outside .pdd/checkup-state so clear_workflow_state never deletes the counter.

🤖 Generated with Claude Code

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

Serhan-Asad · 2026-05-22T00:00:23Z

Step 7/8: Verification & Final Report (Iteration 1)

Test Results After Fixes

Total: 460 tests (checkup-scoped) + 19 (focused new/modified)
Passed: 460 (+ 1 skipped) / 19
Failed: 0
Previously failing, now passing: 0 (no failures in prior steps)
New failures: 0

Build Status

Pass — python -m compileall pdd succeeds; package imports cleanly.

Overall Status

All clear — Steps 6.1 (artifact removal), 6.2 (artifact-hygiene regression), 6.3 (freshness-lease integration tests) and the test_agentic_crash.py chdir fix all verified. No stray new_file.txt at repo root.

Issue Alignment

issue_aligned: true — The PR's commit history (feat(checkup): bounded PR-head freshness lease, codex round-1/round-2 fixes, prompt-source sync) substantively implements the bounded PR-head freshness lease requested by #1116; the checkup-iteration changes here (artifact hygiene + cross-module integration tests pinning the 2-tuple contract, sidecar isolation from clear_workflow_state, and the exhaustion-message SHA chain) reinforce the same mechanism without altering its contract.

Issues Summary

Severity	Category	Module	Description	Fixed
low	artifact	repo root	Stray empty `new_file.txt` left by an earlier checkup run	Yes
low	test_failure	tests/test_agentic_crash.py	`test_file_system_changes_detected` could write `new_file.txt` into the real repo root	Yes (chdir(tmp_path) in test)
low	test_failure	tests	No regression test asserted "no stray top-level artifacts" invariant	Yes (test_checkup_artifact_hygiene.py)
low	interface_mismatch	tests	No integration coverage of the orchestrator↔review-loop 2-tuple contract and sidecar isolation	Yes (test_checkup_freshness_lease_integration.py, 4 tests)

Checkup complete.

All Issues Fixed

{
  "success": true,
  "message": "PR #1134 substantively implements the bounded PR-head freshness lease for `pdd checkup --pr` requested by #1116 (capture-then-recheck PR head SHA, bounded MAX_PR_HEAD_REFRESHES reruns with sidecar counter, rebase-conflict classification, no force-push). Step 7 verification: build passes; 460 checkup-scoped tests pass (1 skipped); 19 focused tests (4 new freshness-lease integration tests + 2 new artifact-hygiene tests + 13 existing/modified agentic_crash tests) pass. Iteration-1 fixes removed a stray `new_file.txt` artifact at repo root, fixed `test_file_system_changes_detected` to chdir into tmp_path so it can't recreate that artifact, and added regression coverage pinning both the no-stray-artifact invariant and the orchestrator↔review-loop 2-tuple contract / sidecar isolation / exhaustion-SHA-chain behavior.",
  "tech_stack": ["python", "pytest", "click", "pydantic"],
  "issue_aligned": true,
  "issues": [
    {
      "module": "repo_root",
      "file": "new_file.txt",
      "severity": "low",
      "category": "artifact",
      "description": "Stray empty new_file.txt left at repo root by an earlier checkup run on a different issue.",
      "fixed": true,
      "fix_description": "Removed the untracked empty file in Step 6.1."
    },
    {
      "module": "tests",
      "file": "tests/test_agentic_crash.py",
      "severity": "low",
      "category": "test_failure",
      "description": "test_file_system_changes_detected used Path.cwd() (real repo root) for the mocked agent side effect that writes new_file.txt, which could leave a stray artifact at the repo root if the test was run from the repo.",
      "fixed": true,
      "fix_description": "Added monkeypatch.chdir(tmp_path) so the mocked side effect writes into the test sandbox; documented the rationale referencing issue #1116."
    },
    {
      "module": "tests",
      "file": "tests/test_checkup_artifact_hygiene.py",
      "severity": "low",
      "category": "test_failure",
      "description": "No regression test pinned the 'no stray new_file.txt / empty top-level .txt artifacts at repo root' invariant.",
      "fixed": true,
      "fix_description": "Added 2 parametrized assertions guarding against stray artifacts at the repo root."
    },
    {
      "module": "tests",
      "file": "tests/test_checkup_freshness_lease_integration.py",
      "severity": "low",
      "category": "interface_mismatch",
      "description": "No cross-module integration coverage of the orchestrator↔review-loop 2-tuple contract for _commit_and_push_if_changed, sidecar refresh-counter isolation from clear_workflow_state, or the budget-exhaustion message's observed-SHA chain.",
      "fixed": true,
      "fix_description": "Added 4 integration tests covering the 2-tuple contract, helper-wrapper equivalence, sidecar isolation from workflow-state-dir clearing, and the SHA-chain content of the exhaustion error message."
    }
  ],
  "changed_files": [
    "tests/test_agentic_crash.py",
    "tests/test_checkup_artifact_hygiene.py",
    "tests/test_checkup_freshness_lease_integration.py"
  ]
}

PR Push Status

Pushed fixes to PR branch.

Issue #1116. `pdd checkup --pr` previously fail-closed when the remote PR head advanced during a run and the local fix commit couldn't be cleanly rebased onto the new head. The operator had to rerun manually. Now bounded auto-rerun: when the head advances mid-checkup, discard the stale work and restart the orchestrator against the new head, capped at MAX_PR_HEAD_REFRESHES=2. Past that budget the wrapper returns a clear failure listing observed SHA transitions. Design — rename-and-wrap (minimal blast radius): - run_agentic_checkup_orchestrator becomes a small public wrapper. Non-PR mode is pass-through; PR mode loops around the inner. - The renamed _run_agentic_checkup_orchestrator_inner gets two checkpoints that raise _PRHeadAdvancedRestart on stale-head detection: A) after Step 7's gate passes (fix mode), before the push block. B) inside `if not push_ok:`, only when the push message indicates a rebase failure AND a fresh refetch confirms the head moved (auth/ network errors never trigger a rerun). - Counter persists in `.pdd/checkup-pr-{N}/pr_head_refreshes` so a Ctrl-C + manual resume can't bypass the bound. Cleared on success. - --no-fix stays fail-closed (existing _finalize SHA-mismatch logic). - _commit_and_push_if_changed return shape stays (bool, str). - _run_final_checkup_on_pr in agentic_e2e_fix_orchestrator.py stays fail-closed as the outer safety net. Tests: 7 new tests under TestPrHeadAdvanceAutoRerun covering both checkpoints, budget exhaustion, --no-fix no-consume, auth-failure no-trigger, counter clear-on-success, and counter persistence. All 175 tests in test_agentic_checkup_orchestrator + test_checkup_pr_mode pass with no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three findings from codex's first review of the PR-head freshness lease implementation. 1. Checkpoint B classifier missed the case where the push helper's 3-attempt retry loop exhausts on remote-advance: the final return is `Failed to push fixes to PR branch: <git stderr>`, not the rebase-specific strings. Extended the classifier to also match that generic message when `_is_remote_advanced_push_error` recognizes the embedded git markers. 2. Added Checkpoint C: when `_commit_and_push_if_changed` returns "No changes to push." / "No eligible changes to push.", neither the rebase path nor `_run_post_push_reverify_if_needed` runs. The head could have advanced between Checkpoint A and the final report. Checkpoint C refetches just before the report and restarts if the head moved. 3. Added `test_refresh_counter_resumes_from_pre_seeded_value` which pre-seeds the sidecar at 1, forces one restart, and asserts exhaustion. Locks down that the wrapper actually loads the persisted counter on entry (the existing persistence test only verified the file is written). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ssifier Three important findings + one nit from codex's second review of the PR-head freshness lease implementation. 1. Outer wrapper now calls clear_workflow_state(...) explicitly when catching _PRHeadAdvancedRestart. Previously it relied on the inner's identity guard to discard stale cache via SHA mismatch; a flaky refetch returning the old SHA could defeat the rerun. The exception already proves the head advanced — don't depend on a fresh fetch to re-prove it. 2. Checkpoint B classifier extended to recognize "Failed to reset to original fixer commit before rebase: ..." from _rebase_onto_updated_pr_head. That message is only reachable after a remote-advanced push, so when the fresh-refetch SHA confirms advance, it should follow the same bounded restart path. 3. Three new tests: - Generic non-fast-forward push failure triggers restart via the _is_remote_advanced_push_error branch (round-1 added the classifier branch but no test exercised it). - "No changes to push" head-advance before final report restarts and does NOT post a stale Step 7 verdict (Checkpoint C coverage). - Empty initial pr_head_sha fail-degrades to current behavior; no restart, no budget consumption. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex round-3 finding. PDD treats prompts as source of truth; the generated module pdd/agentic_checkup_orchestrator.py now contains the pdd/prompts/agentic_checkup_orchestrator_python.prompt described the pre-#1116 behavior. A future pdd sync would regenerate from the stale prompt and drop the fix. Added a sub-bullet under Requirement 7 (PR-mode) describing: - MAX_PR_HEAD_REFRESHES bound and the three freshness checkpoints (A: after Step 7 gate, B: on rebase-related push failure with fresh-refetch confirmation, C: after no-change push outcome). - Rename-and-wrap structure with the internal _PRHeadAdvancedRestart exception caught by the outer wrapper. - Explicit clear_workflow_state on restart so a flaky refetch cannot defeat the rerun via the identity guard. - Sidecar counter at .pdd/checkup-pr-{N}/pr_head_refreshes (outside workflow-state dir so clear_workflow_state does not nuke it). - Budget exhaustion diagnostic listing observed SHA transitions. - --no-fix mode does not consume restart budget; the lease is fix-mode only. No production code change; tests unchanged (180/180 still passing). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

F1: --no-fix --pr mode now refetches the remote PR head after Step 7's gate passes and fails closed (unverified verdict) if the head advanced during the run. Previously _finalize's fail-closed downgrade only runs under --review-loop, leaving --no-fix open to publishing a stale clean verdict. F2: _PRHeadAdvancedRestart now carries step_comments_set; the outer wrapper preserves it across restart iterations via _carried_step_comments so per-step issue comments aren't re-posted after clear_workflow_state wipes the stored field. F3: Added _force_skip_state_load kwarg to the inner; set True by the outer on restart iterations so a flaky GitHub state DELETE cannot reload stale cached step outputs on the next inner call. Tests: 3 new test scenarios in TestPrHeadAdvanceAutoRerun. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

F1a: fail-closed when freshness refetch returns empty SHA (unknown freshness). Previously an exception-swallowed refetch silently let the clean Step 7 verdict through; now "could not retrieve" triggers the same downgrade path. F1b: posted report body now includes the downgrade reason prepended to the Step 7 output so the PR/issue thread shows WHY the verdict was invalidated, not just the clean output that no longer applies. Prompt: documents --no-fix fail-closed behavior (both advance and unknown freshness), _carried_step_comments, and _force_skip_state_load so future pdd sync regeneration preserves all three guarantees. Tests: 2 new tests — unknown-freshness fail-closed, report body includes "Verdict downgraded" + SHA transition text. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

External review filed three gaps on PR #1134 after rebase onto current main. All three reproduce on the rebased branch and are closed here. Finding #1 (rebase regression): main added test_scope and start_step_override params to _run_agentic_checkup_orchestrator_inner and threads them from agentic_checkup.run_agentic_checkup, but the new public wrapper run_agentic_checkup_orchestrator was added in this PR without those kwargs. The rebase merges cleanly but the call from agentic_checkup.py would TypeError at runtime. Fix: add test_scope and start_step_override to the wrapper signature; forward to both inner call sites (non-PR pass-through and PR-mode restart loop). Finding #2 (Checkpoint A2 gap): the registry-edit and prompt-source guard refusal paths each post the canonical Step 7 verdict via _post_pr_mode_final_report before returning. Between Checkpoint A's refetch (line 1953) and these refusal posts (lines 2298 / 2333) the remote head can advance, leaving the Step 7 verdict belonging to the OLD head. Fix: reuse the pr_metadata fetch already at the top of the guards-and-push block (no extra GitHub I/O) and raise _PRHeadAdvancedRestart on mismatch — symmetric with Checkpoint A. Finding #3 (entry-SHA empty gap): the --no-fix post-Step-7 freshness check at line 1996 was gated on a non-empty current_pr_head_sha. If the entry _fetch_pr_metadata returned empty (transient gh hiccup) but the post-run refetch succeeded with a non-empty SHA, the clean Step 7 verdict slipped through as "fresh" even though we had no baseline to compare against. Fix: add a third elif branch — when current_pr_head_sha is empty but nofix_fresh_sha is non-empty, fail-closed with a diagnostic explaining the entry SHA was unavailable. Tests: 2 new tests in TestPrHeadAdvanceAutoRerun — test_pr_head_advance_at_guards_block_triggers_rerun (A2 restart with exact metadata-call accounting) and test_no_fix_empty_entry_sha_fails_closed (empty entry + non-empty fresh -> downgrade, no budget consumed). Updated _run_orchestrator_capturing_context in test_agentic_checkup_orchestrator.py to mock _fetch_pr_metadata so the new no-fix freshness check sees stable SHAs (previously the gh-CLI call failed silently in the sandbox, which the pre-rebase no-fix path did not exercise). Prompt: updated agentic_checkup_orchestrator_python.prompt with the Checkpoint A2 bullet and the three-condition --no-fix downgrade so a future pdd sync regenerates with the fix in place. LC_ALL=C pytest tests/test_agentic_checkup_orchestrator.py tests/test_checkup_pr_mode.py -m "not integration and not e2e and not real" --timeout=60 -> 204/204 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

External review filed three follow-up gaps on PR #1134 after round-4: Finding 1 (successful push paths still have stale-report race): the round-1 Checkpoint C only fired on "No changes to push." / "No eligible changes to push.", leaving normal push and post-rebase reverify-passed paths open. If the remote head advances after our push but before _post_pr_mode_final_report, the canonical Step 7 verdict belongs to an older head than what the PR thread now shows. Fix: broaden to Checkpoint D — fire on every successful push outcome (failure paths are already covered by Checkpoint B). Baseline is _git_rev_parse_head(worktree_path) (the local pushed SHA) instead of current_pr_head_sha — after a real push the remote head moves BECAUSE of us, so comparing against entry SHA would false-positive every push. Local HEAD equals our pushed commit on normal/rebase paths and equals entry SHA on the no-change path, so Checkpoint D subsumes the round-1 Checkpoint C. Finding 2 (fix-mode disables freshness lease when entry SHA empty): every fix-mode checkpoint (A, A2, B, D) gates on a non-empty current_pr_head_sha. If the entry _fetch_pr_metadata returned empty (transient gh hiccup) the entire lease silently disabled, opening the same stale-verdict hole the lease was built to close. Fix: fail-closed at fix-mode PR-mode entry when current_pr_head_sha is empty. Symmetric with the round-3 --no-fix post-Step-7 fail-closed downgrade — the user retries once gh recovers. Finding 3 (README missing user-visible rerun behavior): the new auto-rerun affects latency/cost/recovery; operators need to recognize the budget-exhaustion failure surface and know where the sidecar lives. Fix: README section under PR Mode covering auto-rerun cap=2, sidecar path (.pdd/checkup-pr-{N}/pr_head_refreshes), budget-exhaustion diagnostic format, fix-mode entry-SHA fail-closed, and --no-fix fail-closed semantics. Tests: 1 new test (test_pr_head_advance_after_successful_real_push_ triggers_rerun) covering Checkpoint D on a real push followed by third-party advance; existing test_empty_entry_pr_head_sha_does_not_ consume_refresh_budget renamed and rewritten as test_empty_entry_pr_head_sha_fix_mode_fails_closed for the new fail-closed semantics. Updated metadata_mock.call_count and _git_rev_parse_head side_effects in three existing tests because Checkpoint D adds one metadata fetch per successful push iteration and expects local HEAD to equal the pushed SHA. Parametrized test_resume_discards_cache_when_either_side_sha_is_empty switched to no_fix=True so the new fix-mode entry-SHA gate does not mask the state-identity guard behavior under test; added a fetch mock to test_step_8_skipped_in_pr_mode and test_pr_mode_commits_and_pushes_ generated_fixes. Prompt source: agentic_checkup_orchestrator_python.prompt updated with Checkpoint D documentation and the fix-mode entry-SHA fail-closed contract so a future pdd sync regenerates with both guarantees. LC_ALL=C pytest tests/test_agentic_checkup_orchestrator.py tests/test_checkup_pr_mode.py -m "not integration and not e2e and not real" --timeout=60 -> 205/205 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Cloud-test on the round-5 commit (9dc17b2) passed 77/77 chunks with no failures. Captured durations refresh the chunk-balancing input so future runs distribute load against the current PR-mode test surface (the round-3/4/5 commits added ~200+ new test cases in test_checkup_pr_mode.py + test_agentic_checkup_orchestrator.py). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed May 21, 2026

View reviewed changes

Serhan-Asad mentioned this pull request May 21, 2026

Make pdd checkup --pr rerun when PR head advances mid-checkup #1116

Closed

Serhan-Asad and others added 8 commits May 22, 2026 10:59

fix: apply checkup fixes for #1116

795ba27

Serhan-Asad force-pushed the fix/issue-1116-pr-head-rerun branch from 31b041c to ecdfde0 Compare May 22, 2026 18:13

Serhan-Asad and others added 2 commits May 22, 2026 11:36

gltanaka merged commit 20f7025 into main May 22, 2026
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(checkup): bounded PR-head freshness lease for pdd checkup --pr (#1116)#1134

fix(checkup): bounded PR-head freshness lease for pdd checkup --pr (#1116)#1134
gltanaka merged 10 commits into
mainfrom
fix/issue-1116-pr-head-rerun

Serhan-Asad commented May 21, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Serhan-Asad commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Serhan-Asad commented May 21, 2026

Summary

Why this design (vs PR #1121)

Commits (4)

Test plan

Notes

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Serhan-Asad commented May 22, 2026

Step 7/8: Verification & Final Report (Iteration 1)

Test Results After Fixes

Build Status

Overall Status

Issue Alignment

Issues Summary

PR Push Status

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants