Problem
pdd checkup --pr --review-loop can still accept an LLM reviewer/verifier clean result as the final readiness signal even when a deterministic local check would fail in the PR worktree.
That is a product bug. PDD is meant to review arbitrary user repositories, not only this repo. If a repo has a reliable local format, whitespace, syntax, compile, lint, type, or targeted test gate, a clean LLM verdict must not override that gate.
Concrete failure class: a PR is reported clean while a configured format check such as prettier --check, git diff --check, python -m py_compile, or an equivalent repo-local deterministic check fails. The loop should surface the failed gate as blocking evidence, let the fixer address it when the failure is code-owned, then rerun the gate before accepting the PR.
Product requirement
The gate layer must be repository-agnostic and safe for untrusted PRs.
Do not design this around PDD-specific files, PDD's Makefile, Conda, architecture.json, or our own test suite. The implementation should work for a small JavaScript app, a Python package, a mixed repo, or a repo with only generic Git checks.
PR contents are untrusted. Do not blindly execute scripts, Make targets, CI jobs, package-manager lifecycle hooks, deploy commands, install commands, or arbitrary commands found in PR-modified files. A malicious fork PR can change package.json, Makefile, or config files. Default gates must be built from fixed, conservative commands and narrowly parsed trusted config, not from unchecked PR-authored command strings.
Proposed direction
Implement this in upstream pdd checkup --pr --review-loop as a generic deterministic-gate layer:
- Discover a conservative gate set from the PR worktree and changed-file scope, with no PDD-specific assumptions.
- Always include safe universal checks when applicable, especially PR-range whitespace/conflict-marker checks such as
git diff --check <merge-base>...HEAD rather than only checking the clean working tree.
- Add ecosystem-specific gates only when they are local, deterministic, bounded, and non-mutating. Examples: changed-file Python syntax compile, direct local Prettier check, direct local TypeScript typecheck, direct local lint/type tools when clearly configured.
- Prefer direct tool invocation over package-manager scripts. If package scripts are supported at all, they must not execute lifecycle hooks or shell chains, and they must be parsed/validated so a PR cannot smuggle
curl, rm, deploys, installs, background jobs, or command substitution into the gate.
- Do not run generic Makefile targets, arbitrary CI workflows, pre-commit hooks, package install commands, cloud jobs, e2e suites, or deploy-like commands by default. Those require an explicit trusted allowlist/config design, preferably read from base-trusted configuration or operator input.
- Run gates inside the loop-owned PR worktree before every path that would mark the PR clean: initial reviewer clean, review-only clean, fallback-reviewer clean, post-fix verifier clean, and any pending-findings-empty shortcut.
- Convert repo-owned gate failures into synthetic blocking findings with command, exit code, source, and bounded output excerpt. Feed those findings to the normal fixer path and rerun gates after fixes.
- Distinguish repo failures from gate-runner infrastructure failures. Both should block a clean verdict, but missing tools, timeouts, permission errors, and runner crashes should be reported as unverified/diagnostic gate failures, not blindly assigned to the fixer as if application code can always fix them.
- Persist gate artifacts under the existing checkup artifact directory and include a concise final-report section plus machine-readable state for downstream verdict adapters.
- Scrub secrets before truncating, persisting, logging, or rendering any subprocess output or exception text. Gate output can contain env vars, tokens, URLs, or auth headers.
Non-goals / caution
This should not try to reproduce full CI in v1. Full CI can be slow, flaky, credentialed, external-service-dependent, or side-effectful. The goal is a small, trustworthy local hard stop that catches deterministic failures the LLM can miss.
This should not become a PDD-only quality gate. PDD-specific sync/prompt/architecture checks can remain separate or be added as optional repo-specific gates, but the default product behavior must work on arbitrary repositories.
Acceptance criteria
- A failing discovered deterministic gate prevents a clean checkup verdict even if the LLM reviewer/verifier returns
clean.
- Gates run before every clean-exit path in the review loop.
- Gate discovery and execution are generic: tests use synthetic non-PDD repos, at minimum one Python repo and one JavaScript/TypeScript-style repo.
- A committed whitespace/conflict-marker failure in the PR diff is caught via a PR-range
git diff --check, not missed because the worktree is clean.
- A malicious or PR-modified
package.json script with shell chaining, lifecycle hooks, network calls, install/deploy behavior, or command substitution is not executed by default.
- Repo-owned gate failures become actionable blocking findings and are rerun after the fixer changes the worktree.
- Runner/infrastructure failures block clean but are clearly labeled as gate-runner diagnostics/unverified status.
- Gate command output and exception text are bounded, persisted for audit, rendered in the final report, and secret-scrubbed before any public/logging surface.
- The feature has an opt-out only for diagnostics/emergency use; the default path must be fail-closed for discovered gate failures.
Problem
pdd checkup --pr --review-loopcan still accept an LLM reviewer/verifiercleanresult as the final readiness signal even when a deterministic local check would fail in the PR worktree.That is a product bug. PDD is meant to review arbitrary user repositories, not only this repo. If a repo has a reliable local format, whitespace, syntax, compile, lint, type, or targeted test gate, a clean LLM verdict must not override that gate.
Concrete failure class: a PR is reported clean while a configured format check such as
prettier --check,git diff --check,python -m py_compile, or an equivalent repo-local deterministic check fails. The loop should surface the failed gate as blocking evidence, let the fixer address it when the failure is code-owned, then rerun the gate before accepting the PR.Product requirement
The gate layer must be repository-agnostic and safe for untrusted PRs.
Do not design this around PDD-specific files, PDD's Makefile, Conda,
architecture.json, or our own test suite. The implementation should work for a small JavaScript app, a Python package, a mixed repo, or a repo with only generic Git checks.PR contents are untrusted. Do not blindly execute scripts, Make targets, CI jobs, package-manager lifecycle hooks, deploy commands, install commands, or arbitrary commands found in PR-modified files. A malicious fork PR can change
package.json,Makefile, or config files. Default gates must be built from fixed, conservative commands and narrowly parsed trusted config, not from unchecked PR-authored command strings.Proposed direction
Implement this in upstream
pdd checkup --pr --review-loopas a generic deterministic-gate layer:git diff --check <merge-base>...HEADrather than only checking the clean working tree.curl,rm, deploys, installs, background jobs, or command substitution into the gate.Non-goals / caution
This should not try to reproduce full CI in v1. Full CI can be slow, flaky, credentialed, external-service-dependent, or side-effectful. The goal is a small, trustworthy local hard stop that catches deterministic failures the LLM can miss.
This should not become a PDD-only quality gate. PDD-specific sync/prompt/architecture checks can remain separate or be added as optional repo-specific gates, but the default product behavior must work on arbitrary repositories.
Acceptance criteria
clean.git diff --check, not missed because the worktree is clean.package.jsonscript with shell chaining, lifecycle hooks, network calls, install/deploy behavior, or command substitution is not executed by default.