Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
5d5b435
feat: add /pdd budget control comments for GitHub App runs (#1128)
pdd-bot May 21, 2026
d3ba80f
Merge upstream/main into change/issue-1128
Serhan-Asad May 22, 2026
3631371
fix(prompts): address PR #1131 review findings 2-5 (budget control)
Serhan-Asad May 22, 2026
7a2852e
fix(track_cost prompt): align CSV reader contract with watcher's acce…
Serhan-Asad May 22, 2026
b46bf0d
feat(budget-control): generate runtime modules + tests for GitHub App…
Serhan-Asad May 22, 2026
e461203
fix(budget-control): correct watcher tz, status race, route allowlist…
Serhan-Asad May 22, 2026
60f5e2f
fix(budget-control): UTC timestamps, submit-time validation, current-…
Serhan-Asad May 22, 2026
45fcd1a
fix(budget-control): reject node/max on non-issue, expose node_count,…
Serhan-Asad May 22, 2026
c7b67f3
fix(budget-control): reject command=issue in default executor, strict…
Serhan-Asad May 22, 2026
7005568
fix(budget-control): always wire per-job CSV at submit, write rows on…
Serhan-Asad May 22, 2026
207a66f
fix(budget-control): absolutize explicit CSV path, job_id column, llm…
Serhan-Asad May 22, 2026
12cdb01
fix(budget-control): legacy-CSV watcher fallback, PDD_JOB_ID for cust…
Serhan-Asad May 22, 2026
028cff5
fix(budget-control): drop os.environ race, migrate mid CSV header, sy…
Serhan-Asad May 23, 2026
9f55088
fix(budget-control): PDD_JOB_ID safety net for legacy executors, lega…
Serhan-Asad May 23, 2026
565a104
fix(budget-control): safety net wires CSV path, locked migration, pro…
Serhan-Asad May 23, 2026
03da1f3
fix(budget-control): write-lock spans full block, lock file never unl…
Serhan-Asad May 23, 2026
bc32ccd
fix(budget-control): synchronous flush at job end, CSV fallback in ge…
Serhan-Asad May 23, 2026
00b2c54
fix(budget-control): serialise consume, pure read_spent_now, fresh ge…
Serhan-Asad May 23, 2026
134c270
fix(budget-control): update_budget awaits handler, queued-job baselin…
Serhan-Asad May 23, 2026
2a137fa
fix(budget-control): address post-implementation review findings
Serhan-Asad May 23, 2026
a541ff8
fix(budget-control): close three second-pass review findings
Serhan-Asad May 23, 2026
33a080e
fix(track_cost): isolate partial_cost/last_model between tracked comm…
Serhan-Asad May 23, 2026
7b6f437
fix(budget-control): close three fourth-pass review findings
Serhan-Asad May 23, 2026
7d352b6
fix(budget-control): close three fifth-pass review findings
Serhan-Asad May 23, 2026
71b3239
fix(budget-control): close three sixth-pass review findings
Serhan-Asad May 23, 2026
9be8bd1
fix(budget-control): close three seventh-pass review findings
Serhan-Asad May 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

## Unreleased

### Add

- **github-app**: add `/pdd` budget control comments for GitHub App runs (#1128). The App now posts a startup settings comment for every label-triggered run (`pdd-bug`, `pdd-change`, `pdd-fix`, `pdd-sync`, `pdd-issue`), accepts `/pdd budget N`, `/pdd budget node N`, `/pdd budget max N`, `/pdd settings`, and `/pdd stop` in issue comments, and enforces the active cap at subprocess boundaries by polling the existing `track_cost` CSV (which appends a row only when a PDD subprocess exits; the watcher therefore stops the run before the next subprocess spawns rather than mid-call). `pdd-issue` defaults to `$80` per node and `$400` total (effective cap `min($80 x node count, $400)`); normal commands show `Budget cap: none` until set. New public modules `cost_budget_watcher`, `server/budget_settings`, `server/slash_command_parser`, and `server/budget_comments`; `Job` / `JobManager.submit` accept `budget_cap` / `node_budget` / `max_total_cap`; new `GET`/`POST /commands/jobs/{job_id}/budget` endpoints; new `BUDGET_EXCEEDED` job status.

### Fix

- **checkup**: enforce a SHA-backed verification trust boundary in `pdd checkup --pr --review-loop` so unverified fixer attempts are never rendered as completed fixes. `FixResult` now carries `fixer_result`/`push_status`/`local_fixer_commit_sha`/`pushed_head_sha`, `ReviewLoopState` carries `verified_head_sha`/`remote_pr_head_sha`/`verification_status_by_round`, and the final report renders fixed-field `### Fixes Attempted` bullets plus header `verified-head-sha:` / `remote-pr-head-sha:` lines. Before promoting `fresh-final-review: clean` or `verification=verified`, the loop re-fetches the remote PR head and downgrades to `verification=unverified` on mismatch or budget exhaustion (#1088).
Expand Down
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -835,6 +835,71 @@ pdd [GLOBAL OPTIONS] fix --budget 5.0 [OTHER OPTIONS] [ARGS]...
```
This sets a maximum budget of $5.00 for the fix operation.

### GitHub App control comments

When PDD is triggered through the GitHub App via the existing `pdd-bug`, `pdd-change`, `pdd-fix`, `pdd-sync`, or `pdd-issue` labels, the App posts a **startup settings comment** to the issue summarising the active run's budget and the comment-driven controls available during the run. No new labels are required — budget is controlled entirely by `/pdd` slash commands in issue comments.

**Startup comment — commands with no default cap** (e.g. `pdd bug`):

```md
PDD is starting `pdd bug`.

Budget cap: none

You can add a cap by commenting:
/pdd budget 30

Other controls:
/pdd settings
/pdd stop
```

**Startup comment — `pdd-issue`** (autonomous solving has defaults `$80` per node and `$400` total):

```md
PDD is starting autonomous solving.

Budget:
- node budget: $80 per node
- max total cap: $400
- effective cap: min($80 x node count, $400)

You can change this run by commenting:
/pdd budget node 50
/pdd budget max 200
```

**Available `/pdd` commands** (post these as new issue comments while a run is active — the App parses the first non-fenced line of each comment):

| Command | Applies to | Effect |
|---------|------------|--------|
| `/pdd budget N` | Normal commands | Sets the total cap for the current run to `$N`. |
| `/pdd budget N` | `pdd-issue` | Alias for `/pdd budget max N` (updates the tree-wide cap). |
| `/pdd budget node N` | `pdd-issue` | Updates the per-node budget. Effective cap recomputes as `min(node_budget x node_count, max_total_cap)`. |
| `/pdd budget max N` | `pdd-issue` | Updates the tree-wide ceiling. Effective cap recomputes as above. |
| `/pdd settings` | Any command | Read-only. Replies with the current command, budgets, effective cap, spend so far, and run status. |
| `/pdd stop` | Any command | Terminates the active run and posts a final spend summary. |

**Defaults and `Budget cap: none`:**

- For `pdd-bug`, `pdd-change`, `pdd-fix`, and `pdd-sync`, the startup comment shows `Budget cap: none` until a `/pdd budget N` comment is posted.
- For `pdd-issue`, the defaults are `node budget = $80` and `max total cap = $400`, yielding `effective cap = min($80 x node count, $400)`.
- All amounts are positive USD values; valid forms include `30`, `30.5`, `$30`, `30.00`. Negatives, zero, NaN, and values above the project's hard ceiling (`$10000`) are rejected with a usage hint.

**Parser rules:**

- The App only matches `/pdd ...` on the first non-fenced, non-blank line of an `issue_comment.created` event; fenced code blocks (so the startup comment's own examples cannot re-trigger commands) and bot-authored comments are skipped, and repeated webhook deliveries are de-duplicated by comment ID.
- Authorisation is scoped to the verb, not to the `/pdd` prefix: budget-mutating verbs (`/pdd budget`, `/pdd budget node`, `/pdd budget max`, `/pdd stop`) require the commenter to be the issue author or a user with `OWNER` / `MEMBER` / `COLLABORATOR` association on the repo. The read-only verb `/pdd settings` is open to anyone whose comment is parsed (i.e. not filtered as fenced, bot, or duplicate). This matches the unauthorized-reply wording which redirects rejected commenters to `/pdd settings`.
- Invalid `/pdd` commands get a single helpful reply and do not change settings.

**Enforcement:**

- Budget enforcement watches the same cost CSV that `track_cost` writes for every PDD command (the `--output-cost` / `PDD_OUTPUT_COST_PATH` file). `track_cost` only appends a row when a PDD subprocess exits — never mid-call — so the watcher's enforcement boundary is the **subprocess boundary**, not the LLM call.
- For `pdd-issue` (which spawns many nested PDD subprocesses: `change`, `sync`, `bug`, `fix`, `generate`, `test`, ...), the watcher polls after each nested subprocess writes its cost row and stops the run before the next subprocess is spawned once cumulative spend crosses the active effective cap. Filtering uses the SET of nested command names — never `{"issue"}`, because `pdd-issue` never writes a row with that command itself.
- For single-subprocess commands (`pdd bug`, `pdd change`, `pdd fix`, `pdd sync`), the cost row is only written when the command exits, so the cap effectively applies "after this subprocess finishes, stop spawning more" — a single long command can overshoot the cap by exactly its own final spend before `budget_exceeded` fires.
- When cumulative spend on the run reaches the active effective cap, the executor terminates the run via the same path `/pdd stop` uses and the App posts a final `budget_exceeded` comment.
- `/pdd budget`, `/pdd budget node`, and `/pdd budget max` comments posted *during* an active run apply immediately to the in-flight job — they update the watcher's cap and are evaluated at the next subprocess boundary.

## Commands

Here are the main commands provided by PDD:
Expand Down
Loading
Loading