Skip to content

pdd checkup --pr should enforce repo-agnostic deterministic gates before clean verdict #1092

@Serhan-Asad

Description

@Serhan-Asad

Problem

pdd checkup --pr --review-loop can still accept an LLM reviewer/verifier clean result as the final readiness signal even when a deterministic local check would fail in the PR worktree.

That is a product bug. PDD is meant to review arbitrary user repositories, not only this repo. If a repo has a reliable local format, whitespace, syntax, compile, lint, type, or targeted test gate, a clean LLM verdict must not override that gate.

Concrete failure class: a PR is reported clean while a configured format check such as prettier --check, git diff --check, python -m py_compile, or an equivalent repo-local deterministic check fails. The loop should surface the failed gate as blocking evidence, let the fixer address it when the failure is code-owned, then rerun the gate before accepting the PR.

Product requirement

The gate layer must be repository-agnostic and safe for untrusted PRs.

Do not design this around PDD-specific files, PDD's Makefile, Conda, architecture.json, or our own test suite. The implementation should work for a small JavaScript app, a Python package, a mixed repo, or a repo with only generic Git checks.

PR contents are untrusted. Do not blindly execute scripts, Make targets, CI jobs, package-manager lifecycle hooks, deploy commands, install commands, or arbitrary commands found in PR-modified files. A malicious fork PR can change package.json, Makefile, or config files. Default gates must be built from fixed, conservative commands and narrowly parsed trusted config, not from unchecked PR-authored command strings.

Proposed direction

Implement this in upstream pdd checkup --pr --review-loop as a generic deterministic-gate layer:

  1. Discover a conservative gate set from the PR worktree and changed-file scope, with no PDD-specific assumptions.
  2. Always include safe universal checks when applicable, especially PR-range whitespace/conflict-marker checks such as git diff --check <merge-base>...HEAD rather than only checking the clean working tree.
  3. Add ecosystem-specific gates only when they are local, deterministic, bounded, and non-mutating. Examples: changed-file Python syntax compile, direct local Prettier check, direct local TypeScript typecheck, direct local lint/type tools when clearly configured.
  4. Prefer direct tool invocation over package-manager scripts. If package scripts are supported at all, they must not execute lifecycle hooks or shell chains, and they must be parsed/validated so a PR cannot smuggle curl, rm, deploys, installs, background jobs, or command substitution into the gate.
  5. Do not run generic Makefile targets, arbitrary CI workflows, pre-commit hooks, package install commands, cloud jobs, e2e suites, or deploy-like commands by default. Those require an explicit trusted allowlist/config design, preferably read from base-trusted configuration or operator input.
  6. Run gates inside the loop-owned PR worktree before every path that would mark the PR clean: initial reviewer clean, review-only clean, fallback-reviewer clean, post-fix verifier clean, and any pending-findings-empty shortcut.
  7. Convert repo-owned gate failures into synthetic blocking findings with command, exit code, source, and bounded output excerpt. Feed those findings to the normal fixer path and rerun gates after fixes.
  8. Distinguish repo failures from gate-runner infrastructure failures. Both should block a clean verdict, but missing tools, timeouts, permission errors, and runner crashes should be reported as unverified/diagnostic gate failures, not blindly assigned to the fixer as if application code can always fix them.
  9. Persist gate artifacts under the existing checkup artifact directory and include a concise final-report section plus machine-readable state for downstream verdict adapters.
  10. Scrub secrets before truncating, persisting, logging, or rendering any subprocess output or exception text. Gate output can contain env vars, tokens, URLs, or auth headers.

Non-goals / caution

This should not try to reproduce full CI in v1. Full CI can be slow, flaky, credentialed, external-service-dependent, or side-effectful. The goal is a small, trustworthy local hard stop that catches deterministic failures the LLM can miss.

This should not become a PDD-only quality gate. PDD-specific sync/prompt/architecture checks can remain separate or be added as optional repo-specific gates, but the default product behavior must work on arbitrary repositories.

Acceptance criteria

  • A failing discovered deterministic gate prevents a clean checkup verdict even if the LLM reviewer/verifier returns clean.
  • Gates run before every clean-exit path in the review loop.
  • Gate discovery and execution are generic: tests use synthetic non-PDD repos, at minimum one Python repo and one JavaScript/TypeScript-style repo.
  • A committed whitespace/conflict-marker failure in the PR diff is caught via a PR-range git diff --check, not missed because the worktree is clean.
  • A malicious or PR-modified package.json script with shell chaining, lifecycle hooks, network calls, install/deploy behavior, or command substitution is not executed by default.
  • Repo-owned gate failures become actionable blocking findings and are rerun after the fixer changes the worktree.
  • Runner/infrastructure failures block clean but are clearly labeled as gate-runner diagnostics/unverified status.
  • Gate command output and exception text are bounded, persisted for audit, rendered in the final report, and secret-scrubbed before any public/logging surface.
  • The feature has an opt-out only for diagnostics/emergency use; the default path must be fail-closed for discovered gate failures.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions