feat: add public surface and test churn gates to pdd sync (#1012)#1015
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
c5f79fc to
dc3d57d
Compare
When a prior test-churn attempt failed, accumulate its cost (and model fallback) and fold them into the successful retry's TestResult so the caller charges every attempt against the sync budget. Mirrors the generate-op and one-session-sync accumulation patterns (Codex PR #1015 high finding). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ollow-ups) Three independent gate-correctness fixes called out by Codex iter-3 medium review (#1015): - `BREAKING-CHANGE:` opt-out parser now accepts identifier wrappers of matching back-tick, single-quote, OR double-quote (`"old_helper"`, `'Class.method'`). Mismatched quotes are still rejected so a typo can never silently whitelist a removal. - `_snapshot_public_surface` and `_snapshot_public_signatures` recurse into public nested classes at all depths so a removed/signature-changed `Outer.Inner.method` is caught (previously only direct class methods participated, letting nested-class drift escape both diffs). - `_get_test_churn_threshold` accepts `"40%"`/`"100%"` percent strings (strips the suffix, divides by 100, then clamps). Unparseable values still default to 0.40 but now log a warning so the silent default is visible to operators. Prompt at `pdd/prompts/code_generator_main_python.prompt` updated to match the new contract; new test coverage in `tests/test_code_generator_main.py` exercises each path including mismatched-wrapper rejection and triple-nested classes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
69a18d7 to
741a66c
Compare
…n gates Addresses the 2 Blocker + 2 Important findings from codex review iter-1 of PR #1015. All findings have a focused regression test added. - Blocker 1 (binding kind): _snapshot_public_signatures now prefixes each class-body method signature with [instance]/[staticmethod]/[classmethod]/ [property] so a method-binding flip (def f(self, v) -> @staticmethod def f(v)) is detected as a signature change even when the receiver-stripped params match. Previously Class.f(1) callers could break silently. - Blocker 2 (YAML front matter): defensive hardening that strips a leading YAML --- ... --- front matter block before BREAKING-CHANGE: parsing, so an indented opt-out buried in metadata cannot whitelist surface removals or test-churn. Verified no shipped prompt currently uses front matter (pdd/prompts/ scan), so this guards against a future convention rather than fixing a current exploit. - Important 1 (contract docs): tightens the docstring on _verify_code_surface_after_write and the matching README sentence to make explicit that crash/fix/verify surface failures are hard failures (no PDD_REPAIR_DIRECTIVE retry); the repair loop applies to generate and one-session paths only. Backed by a regression test that asserts the gate runs exactly once and returns the typed error without retry. - Important 2 (retry budget): splits the one-session retry accounting so public-surface repair uses MAX_CONFORMANCE_ATTEMPTS (parity with the generate path) while test-churn keeps its tighter _MAX_TEST_CHURN_ATTEMPTS cap. Previously a single min() shared the smaller cap across both gates, cutting surface repair off at 2 attempts. Backed by two tests: one asserts surface attempts use the full budget, the other pins the churn cap at 2. Test coverage: - 10 new tests in tests/test_code_generator_main.py covering binding-kind detection (instance/staticmethod/classmethod/property flips) and YAML front-matter stripping (with unterminated-fence and empty-prompt edge cases). - 2 new tests in tests/test_sync_orchestration.py pinning the hard-failure contract of _verify_code_surface_after_write. - 2 new tests in tests/test_one_session_sync.py exercising the full MAX_CONFORMANCE_ATTEMPTS budget for surface repair and the _MAX_TEST_CHURN_ATTEMPTS cap for churn repair. Local pytest run: 570 passed, 0 failed across test_code_generator_main.py, test_sync_orchestration.py, test_one_session_sync.py, test_cmd_test_main.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n gates Address three new blockers introduced by the iter-1 follow-up commit on PR #1015: * `_strip_yaml_front_matter` now tolerates a leading UTF-8 BOM, CRLF / mixed line endings, trailing whitespace on the fence line, and a closing fence at EOF (no trailing newline). The previous regex only matched `---\n...---\n`, so a Windows-saved prompt could leave `BREAKING-CHANGE:` metadata visible to the directive parser and silently opt out of the gates. * `_snapshot_public_signatures` now accumulates property accessor roles per class member before recording the final entry. Previously `@x.setter def x(self, value)` overwrote the `@property` getter in the signatures dict and was misclassified as `[instance] (value)`, so a rewrite that replaced the whole descriptor with a plain `def x(self, value)` produced an identical snapshot and bypassed the gate. The merged entry is now `[property:<sorted-roles>] <getter-signature>`, distinct from any plain-method snapshot. * Top-level public symbols now carry a kind prefix (`[function]`, `[async_function]`, `[class]`) so swapping a public class with a same-named function (or a sync function with `async def`) is detected even when the receiver-stripped parameter list happens to match. The class self-entry inside `_walk_class` mirrors this with `[class]`, applying transitively to nested classes. Adds 17 tests across `TestPublicSurfaceBindingKind` and `TestFrontMatterStripping` covering each new edge case (property / property+setter / setter-loss flips, class/function/async flips, CRLF / BOM / EOF-terminated / trailing-whitespace fences, and end-to-end verification that buried directives no longer opt out of the removal gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address four findings from the external reviewer: 1. Blocker (pdd/one_session_sync.py): `except TestChurnError` now restores the pre-session code file in addition to the test file, mirroring the `PublicSurfaceRegressionError` handler's pattern. A failed one-session attempt previously left mutated code on disk even when the test rewrite was rejected, so subsequent `pdd sync` runs treated the mutated code as the baseline. 2. High (pdd/code_generator_main.py): `_format_python_signature` now emits a literal `/` token between positional-only and regular args so `def f(x, /, y)` and `def f(x, y)` produce DIFFERENT snapshots. Previously both forms snapshotted as `(x, y)` and the public-surface gate missed a real ABI break (kwarg callers succeed against the second form, fail against the first). `skip_first` strips from the posonly group when present, otherwise from `args.args`, so the marker insertion index stays correct. 3. Medium (pdd/prompts/sync_orchestration_python.prompt): the non-generate code-write surface gate bullet now says `break`-ing the operation loop (matching the actual code at sync_orchestration.py:~2825) and mirrors the inline rationale that surface regression is a contract violation, not a transient failure — `continue` would let `sync_determine_operation` re-select the same operation and spin indefinitely on deterministic bad output. 4. High (architecture.json): refreshed signature strings for `update_file_pair` (added `strength`/`temperature` params) and `commit_and_push` (now takes `Sequence[DriftInfo]` with defaults and the new `finalized_modules` parameter) to match the live function signatures. Tests: - TestOneSessionRollback::test_test_churn_restores_code_and_test pins the blocker fix: mock rewrites both code and test, asserts both files restored after TestChurnError propagates. - TestPublicSurfaceBindingKind: four new tests for the posonly fix (added/removed/unchanged + a direct snapshot-string assertion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three edge cases reviewer probes found after the iter-3 fix landed: * `_snapshot_public_signatures` now records top-level imports as `[import]` / `[import:<module>]` / `[import:from <module>:<source>]` entries (parity with the surface helper). Without this entry, a silent re-export break like `from pathlib import Path` -> `def Path(): pass` produced no snapshot diff because the surface set unchanged while the signatures dict treated the new function as a brand-new symbol. * `_walk_class` now synthesises a constructor signature from class- level `AnnAssign` annotations when an `@dataclass` / `@dataclasses. dataclass` decorated class has no explicit `__init__`. Adding a required field (`name: str` -> `name: str; age: int`) now flips the snapshot. `ClassVar` / `InitVar` annotations are excluded; explicit `__init__` still takes precedence. Third-party `attrs` / `pydantic` decorators are out of scope (documented). * `build_public_surface_hard_failure_from_error` and `build_test_churn_hard_failure_from_error` now include the `BREAKING-CHANGE:` opt-in instruction line required by item 9d of the runner prompt. Tests cover the four import-flip variants, six dataclass scenarios, and two hard-fail string format checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses 4 review findings on commit 6523cfa: - High: dataclass init synthesizer now honours @DataClass(kw_only=True), the in-body `_: KW_ONLY` sentinel, and InitVar[...] fields. The sentinel check runs before the underscore-prefix skip so the canonical marker is recognised; decorator wins over sentinel (single `*`); a trailing `*` without kw-only fields is stripped. - Medium: fields declared with `field(init=False, ...)` (bare or `dataclasses.field` attribute form) are excluded from the synth, matching the stdlib runtime. `init=True`, missing `init` kwarg, and non-`field` calls still include the field. - Medium: AsyncSyncRunner's internal `_build_public_surface_hard_failure` and `_build_test_churn_hard_failure` blocks now carry the BREAKING-CHANGE: opt-out instruction so end users see the directive on the subprocess hard-failure path (not just the importable builders). - Medium: prompt and architecture.json describe the new dataclass synth rules (kw_only / KW_ONLY / InitVar included; ClassVar + init=False excluded; explicit __init__ wins) plus the kind-prefix invariant and import-binding formats so prompt-driven regeneration preserves them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two more dataclass edge cases probed by the external reviewer on ``93d7fe69d``: 1. ``@dataclass(init=False)`` was ignored. The synth unconditionally built a constructor signature from class-body ``AnnAssign`` nodes when a dataclass decorator was present, but ``init=False`` keeps the class's natural (zero-arg) ``__init__``. Now detect the kwarg on Call decorators (``@dataclass(init=False)`` and ``@dataclasses.dataclass(init=False)``) and emit ``()`` instead of walking the fields. Flipping ``init=False`` -> default trips the gate; adding fields under ``init=False`` does not. 2. Inherited dataclass fields were not merged into the synthesised constructor signature. Now walk same-module ``Name`` bases left- to-right depth-first with a cycle guard, prepending each ``@dataclass`` base's fields (and ITS bases' fields, recursively) in front of the derived class's own fields. Override semantics: when both base and derived declare the same field name, the derived's annotation/default replaces the base's, preserving the base's slot via insertion-order dict semantics — matching what ``@dataclass`` actually synthesises. Cross-module / attribute-form bases (``class User(pkg.Base)``) that we cannot resolve from this module's AST get an ``[inherited_unresolved]`` marker prepended to the synth signature so local field changes still diff while invisible base changes correctly slip through. V1 limitations (documented in the helper docstring): - A base decorated with ``@dataclass(init=False)`` still contributes declared fields to a derived ``@dataclass``'s synth at runtime, but we treat such bases as non-dataclass for the purposes of inherited- field merging. - The ``KW_ONLY`` sentinel does not propagate across the inheritance boundary — each class re-emits its own kw-only context. Tests in ``TestPublicSurfaceBindingKind``: - ``test_dataclass_init_false_decorator_flip_to_default_is_detected`` - ``test_dataclass_init_false_decorator_keeps_zero_arg_signature`` - ``test_dataclasses_dataclass_init_false_form_works`` - ``test_dataclass_inherited_field_added_to_private_base_is_detected`` - ``test_dataclass_unchanged_inheritance_is_allowed`` - ``test_dataclass_unresolved_base_is_marked_uncertain`` - ``test_dataclass_multiple_inheritance_left_to_right`` - ``test_dataclass_explicit_init_still_takes_precedence_with_inheritance`` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Walk dataclass bases in reverse-MRO order so the synthesised ``__init__`` matches Python's ``@dataclass`` semantics (``__dataclass_fields__`` is populated by reversed iteration so later bases override earlier ones in dict insertion order). For ``class C(A, B)`` the synth becomes ``(b, a, c)``. - Stop skipping ``@dataclass(init=False)`` bases: their fields STILL live in ``__dataclass_fields__`` and the derived class merges them. ``init=False`` only suppresses the base's OWN init synthesis. - Update the existing multiple-inheritance test (renamed to ``test_dataclass_multiple_inheritance_mro_order``) and add diamond, single-inheritance regression, and init=False-base inheritance tests. - Document reverse-MRO ordering and init=False inheritance in the prompt and architecture.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Mirror prompt deps into architecture.json for update_main (was 2/9) and ci_drift_heal (was 7/9) so global sync can topologically order stale dependents correctly. - Surface repair_directive (#1012 F-H) on the canonical signatures for cmd_test_main and run_agentic_test_generate in architecture.json, plus the cmd_test_main_python.prompt Inputs section and the agentic_test_generate_python.prompt Entry Point. Also fix run_agentic_test_generate's pre-existing stale 4-arg Tuple[bool, str, str] signature in architecture.json. - Honor the anchored BREAKING-CHANGE prompt opt-out (parsed by _prompt_allows_test_churn) in the deletion-as-churn branches at cmd_test_main.py:230 and one_session_sync.py:609, closing the asymmetry where the prompt opt-out worked for wholesale rewrites but silently failed when the agent emptied/deleted the test file. Import the helper into both files and add regression tests in test_cmd_test_main.py and test_one_session_sync.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address three findings from the external review of the public-surface and test-churn gates: 1. P1 — `_is_test_output_path` now also matches `<name>_test.go` / `<name>_test.rb` / `<name>_spec.rb` so Go and Ruby sibling test files no longer bypass the churn gate when they live next to production code instead of under `tests/`. 2. P1 — `_snapshot_public_surface` skips `from __future__ import ...` directives. Removing the directive after a Python-version bump is harmless cleanup, but the gate previously flagged `annotations` (and any other future name) as a removed public symbol. 3. P2 — `_verify_public_surface_regression` expands BREAKING-CHANGE class-removal directives to cover every captured descendant. With `BREAKING-CHANGE: remove Service` the caller no longer has to enumerate `Service.run`, `Service.Inner.go`, etc. Signature-change directives stay strict — auto-expansion there would silently mask real regressions. Additionally: - sync_main retry message: report "Public surface regression" / "Test churn threshold exceeded" instead of mislabeling them as "Architecture conformance failed". - README: document the full public-surface coverage (constants and re-exported imports) and the new class-removal descendant rule. - CHANGELOG: restore the v0.0.238 Chore/Test subsections the original feat commit accidentally dropped. Regression coverage added in TestSyncCompatibilityGates: Go/Ruby sibling-file churn, `__future__` import ignored by the gate, and `BREAKING-CHANGE: remove Class` covering nested descendants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address three follow-up findings from the next review pass:
1. P1 — `_snapshot_public_signatures` also skips `from __future__
import …`. Without this the surface helper would return early but
the signatures dict still records `[import:from __future__]`, so a
regenerated module that drops the directive and binds a real
`annotations = …` value would falsely diff as a
`[import:from __future__]` → `[assignment]` signature change.
2. P1 — `_is_test_output_path` now also matches PascalCase JVM-style
suffixes: `Test.java`, `Tests.java`, `Test.kt`, `Tests.kt`,
`Spec.kt`. The agentic test prompt already names `Test.java` as a
recognised convention; without this match `src/test/java/FooTest.java`
bypassed the churn gate. Case-sensitive on purpose so `latest.kt`,
`manifest.java`, `request.scala` do not false-positive.
3. P1 — Prompt drift: `pdd/prompts/code_generator_main_python.prompt`
now documents the `__future__` skip in BOTH `_snapshot_public_surface`
and `_snapshot_public_signatures`, the full PascalCase / sibling-
file test-suffix list in the test-churn gate description, and the
`BREAKING-CHANGE: remove <Class>` descendant-expansion rule. A
future `pdd sync code_generator_main` can now regenerate without
reverting iter-9 + iter-10.
Nit:
- `run_agentic_test_generate` annotated return type was a stale
`tuple[str, float, str, bool]` (and the docstring claimed a 4-tuple)
while the actual return is `TestResult` (5 fields). Updated both.
Regression coverage added in TestSyncCompatibilityGates:
- `from __future__ import annotations` rebinding to a real
`annotations = {...}` value (the exact reviewer-named scenario).
- `FooTest.java` and `WidgetSpec.kt` churn detection.
- `latest.kt` NOT classified as a test (PascalCase false-positive
regression guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-up findings from the next review pass: 1. P1 — Expand `_is_test_output_path` to cover the remaining common test-file conventions the gate was skipping: Maven failsafe integration tests (`FooIT.java`), older JUnit/TestNG TestCase convention (`FooTestCase.java`), ScalaTest (`Test.scala`, `Tests.scala`, `Spec.scala`), Swift (`Tests.swift`), .NET xUnit/ NUnit (`Test.cs`, `Tests.cs`), and Rust sibling tests (`_test.rs`). All PascalCase JVM/.NET/Swift entries stay case-sensitive so `latest.kt`/`manifest.java`/`request.scala` continue NOT to false-positive. 2. P1 — Close the alt-path test churn loophole: when `run_agentic_test_generate` falls back to reading an alternate path (e.g. `__tests__/foo.test.ts`) because the canonical `output_test_file` is absent, recover the alt path's pre-existing content via `git show HEAD:<rel>` and re-run the test-churn check against THAT baseline (not the empty canonical). On `TestChurnError`: attach `total_cost`/`model_name` and restore the pre-existing content to disk so the repair loop sees the canonical pre-sync state. Untracked / never-committed alt files harmlessly return empty and fall through to first-time-generation behavior. Limitation: when the agent writes to BOTH the canonical output AND an alt path, only the canonical churn is gated; the alt-path delta in that dual-write case is still unguarded. The reviewer's named scenario (canonical absent, alt rewritten) is fully covered. 3. P1 — `_collect_bound_module_names` also skips `from __future__ import …` so `__all__ = ["annotations"]` no longer promotes the future binding into the public surface via the `__all__`- authoritative branch. Without this skip a clean-up that drops `from __future__ import annotations` AND the matching `__all__` entry would falsely diff as `removed: annotations`. Prompt sync: - `code_generator_main_python.prompt`: full extended test-suffix list in the test-churn gate description, and the parallel `__future__` skip in `_collect_bound_module_names` documented alongside the earlier skip notes for `_snapshot_public_surface` and `_snapshot_public_signatures`. - `agentic_test_generate_python.prompt`: documents the alt-path git-recovery churn check, the `_read_prior_content_from_git` helper contract, and the restore-on-failure requirement. Regression coverage: - `FooIT.java`, `FooTestCase.java`, `FooSpec.scala`, and `widget_test.rs` churn detection. - `__all__ = ["annotations"]` + `from __future__ import annotations` removal MUST NOT trip the gate. - `run_agentic_test_generate` alt-path rewrite with a tracked prior (mocked `_read_prior_content_from_git` returning 20 tests) raises `TestChurnError` AND restores the prior content; untracked prior (mocked empty) falls through with no raise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n gates Addresses the 2 Blocker + 2 Important findings from codex review iter-1 of PR #1015. All findings have a focused regression test added. - Blocker 1 (binding kind): _snapshot_public_signatures now prefixes each class-body method signature with [instance]/[staticmethod]/[classmethod]/ [property] so a method-binding flip (def f(self, v) -> @staticmethod def f(v)) is detected as a signature change even when the receiver-stripped params match. Previously Class.f(1) callers could break silently. - Blocker 2 (YAML front matter): defensive hardening that strips a leading YAML --- ... --- front matter block before BREAKING-CHANGE: parsing, so an indented opt-out buried in metadata cannot whitelist surface removals or test-churn. Verified no shipped prompt currently uses front matter (pdd/prompts/ scan), so this guards against a future convention rather than fixing a current exploit. - Important 1 (contract docs): tightens the docstring on _verify_code_surface_after_write and the matching README sentence to make explicit that crash/fix/verify surface failures are hard failures (no PDD_REPAIR_DIRECTIVE retry); the repair loop applies to generate and one-session paths only. Backed by a regression test that asserts the gate runs exactly once and returns the typed error without retry. - Important 2 (retry budget): splits the one-session retry accounting so public-surface repair uses MAX_CONFORMANCE_ATTEMPTS (parity with the generate path) while test-churn keeps its tighter _MAX_TEST_CHURN_ATTEMPTS cap. Previously a single min() shared the smaller cap across both gates, cutting surface repair off at 2 attempts. Backed by two tests: one asserts surface attempts use the full budget, the other pins the churn cap at 2. Test coverage: - 10 new tests in tests/test_code_generator_main.py covering binding-kind detection (instance/staticmethod/classmethod/property flips) and YAML front-matter stripping (with unterminated-fence and empty-prompt edge cases). - 2 new tests in tests/test_sync_orchestration.py pinning the hard-failure contract of _verify_code_surface_after_write. - 2 new tests in tests/test_one_session_sync.py exercising the full MAX_CONFORMANCE_ATTEMPTS budget for surface repair and the _MAX_TEST_CHURN_ATTEMPTS cap for churn repair. Local pytest run: 570 passed, 0 failed across test_code_generator_main.py, test_sync_orchestration.py, test_one_session_sync.py, test_cmd_test_main.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5fb3564 to
13cb692
Compare
…n gates Address three new blockers introduced by the iter-1 follow-up commit on PR #1015: * `_strip_yaml_front_matter` now tolerates a leading UTF-8 BOM, CRLF / mixed line endings, trailing whitespace on the fence line, and a closing fence at EOF (no trailing newline). The previous regex only matched `---\n...---\n`, so a Windows-saved prompt could leave `BREAKING-CHANGE:` metadata visible to the directive parser and silently opt out of the gates. * `_snapshot_public_signatures` now accumulates property accessor roles per class member before recording the final entry. Previously `@x.setter def x(self, value)` overwrote the `@property` getter in the signatures dict and was misclassified as `[instance] (value)`, so a rewrite that replaced the whole descriptor with a plain `def x(self, value)` produced an identical snapshot and bypassed the gate. The merged entry is now `[property:<sorted-roles>] <getter-signature>`, distinct from any plain-method snapshot. * Top-level public symbols now carry a kind prefix (`[function]`, `[async_function]`, `[class]`) so swapping a public class with a same-named function (or a sync function with `async def`) is detected even when the receiver-stripped parameter list happens to match. The class self-entry inside `_walk_class` mirrors this with `[class]`, applying transitively to nested classes. Adds 17 tests across `TestPublicSurfaceBindingKind` and `TestFrontMatterStripping` covering each new edge case (property / property+setter / setter-loss flips, class/function/async flips, CRLF / BOM / EOF-terminated / trailing-whitespace fences, and end-to-end verification that buried directives no longer opt out of the removal gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address four findings from the external reviewer: 1. Blocker (pdd/one_session_sync.py): `except TestChurnError` now restores the pre-session code file in addition to the test file, mirroring the `PublicSurfaceRegressionError` handler's pattern. A failed one-session attempt previously left mutated code on disk even when the test rewrite was rejected, so subsequent `pdd sync` runs treated the mutated code as the baseline. 2. High (pdd/code_generator_main.py): `_format_python_signature` now emits a literal `/` token between positional-only and regular args so `def f(x, /, y)` and `def f(x, y)` produce DIFFERENT snapshots. Previously both forms snapshotted as `(x, y)` and the public-surface gate missed a real ABI break (kwarg callers succeed against the second form, fail against the first). `skip_first` strips from the posonly group when present, otherwise from `args.args`, so the marker insertion index stays correct. 3. Medium (pdd/prompts/sync_orchestration_python.prompt): the non-generate code-write surface gate bullet now says `break`-ing the operation loop (matching the actual code at sync_orchestration.py:~2825) and mirrors the inline rationale that surface regression is a contract violation, not a transient failure — `continue` would let `sync_determine_operation` re-select the same operation and spin indefinitely on deterministic bad output. 4. High (architecture.json): refreshed signature strings for `update_file_pair` (added `strength`/`temperature` params) and `commit_and_push` (now takes `Sequence[DriftInfo]` with defaults and the new `finalized_modules` parameter) to match the live function signatures. Tests: - TestOneSessionRollback::test_test_churn_restores_code_and_test pins the blocker fix: mock rewrites both code and test, asserts both files restored after TestChurnError propagates. - TestPublicSurfaceBindingKind: four new tests for the posonly fix (added/removed/unchanged + a direct snapshot-string assertion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three edge cases reviewer probes found after the iter-3 fix landed: * `_snapshot_public_signatures` now records top-level imports as `[import]` / `[import:<module>]` / `[import:from <module>:<source>]` entries (parity with the surface helper). Without this entry, a silent re-export break like `from pathlib import Path` -> `def Path(): pass` produced no snapshot diff because the surface set unchanged while the signatures dict treated the new function as a brand-new symbol. * `_walk_class` now synthesises a constructor signature from class- level `AnnAssign` annotations when an `@dataclass` / `@dataclasses. dataclass` decorated class has no explicit `__init__`. Adding a required field (`name: str` -> `name: str; age: int`) now flips the snapshot. `ClassVar` / `InitVar` annotations are excluded; explicit `__init__` still takes precedence. Third-party `attrs` / `pydantic` decorators are out of scope (documented). * `build_public_surface_hard_failure_from_error` and `build_test_churn_hard_failure_from_error` now include the `BREAKING-CHANGE:` opt-in instruction line required by item 9d of the runner prompt. Tests cover the four import-flip variants, six dataclass scenarios, and two hard-fail string format checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses 4 review findings on commit 6523cfa: - High: dataclass init synthesizer now honours @DataClass(kw_only=True), the in-body `_: KW_ONLY` sentinel, and InitVar[...] fields. The sentinel check runs before the underscore-prefix skip so the canonical marker is recognised; decorator wins over sentinel (single `*`); a trailing `*` without kw-only fields is stripped. - Medium: fields declared with `field(init=False, ...)` (bare or `dataclasses.field` attribute form) are excluded from the synth, matching the stdlib runtime. `init=True`, missing `init` kwarg, and non-`field` calls still include the field. - Medium: AsyncSyncRunner's internal `_build_public_surface_hard_failure` and `_build_test_churn_hard_failure` blocks now carry the BREAKING-CHANGE: opt-out instruction so end users see the directive on the subprocess hard-failure path (not just the importable builders). - Medium: prompt and architecture.json describe the new dataclass synth rules (kw_only / KW_ONLY / InitVar included; ClassVar + init=False excluded; explicit __init__ wins) plus the kind-prefix invariant and import-binding formats so prompt-driven regeneration preserves them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two more dataclass edge cases probed by the external reviewer on ``93d7fe69d``: 1. ``@dataclass(init=False)`` was ignored. The synth unconditionally built a constructor signature from class-body ``AnnAssign`` nodes when a dataclass decorator was present, but ``init=False`` keeps the class's natural (zero-arg) ``__init__``. Now detect the kwarg on Call decorators (``@dataclass(init=False)`` and ``@dataclasses.dataclass(init=False)``) and emit ``()`` instead of walking the fields. Flipping ``init=False`` -> default trips the gate; adding fields under ``init=False`` does not. 2. Inherited dataclass fields were not merged into the synthesised constructor signature. Now walk same-module ``Name`` bases left- to-right depth-first with a cycle guard, prepending each ``@dataclass`` base's fields (and ITS bases' fields, recursively) in front of the derived class's own fields. Override semantics: when both base and derived declare the same field name, the derived's annotation/default replaces the base's, preserving the base's slot via insertion-order dict semantics — matching what ``@dataclass`` actually synthesises. Cross-module / attribute-form bases (``class User(pkg.Base)``) that we cannot resolve from this module's AST get an ``[inherited_unresolved]`` marker prepended to the synth signature so local field changes still diff while invisible base changes correctly slip through. V1 limitations (documented in the helper docstring): - A base decorated with ``@dataclass(init=False)`` still contributes declared fields to a derived ``@dataclass``'s synth at runtime, but we treat such bases as non-dataclass for the purposes of inherited- field merging. - The ``KW_ONLY`` sentinel does not propagate across the inheritance boundary — each class re-emits its own kw-only context. Tests in ``TestPublicSurfaceBindingKind``: - ``test_dataclass_init_false_decorator_flip_to_default_is_detected`` - ``test_dataclass_init_false_decorator_keeps_zero_arg_signature`` - ``test_dataclasses_dataclass_init_false_form_works`` - ``test_dataclass_inherited_field_added_to_private_base_is_detected`` - ``test_dataclass_unchanged_inheritance_is_allowed`` - ``test_dataclass_unresolved_base_is_marked_uncertain`` - ``test_dataclass_multiple_inheritance_left_to_right`` - ``test_dataclass_explicit_init_still_takes_precedence_with_inheritance`` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Walk dataclass bases in reverse-MRO order so the synthesised ``__init__`` matches Python's ``@dataclass`` semantics (``__dataclass_fields__`` is populated by reversed iteration so later bases override earlier ones in dict insertion order). For ``class C(A, B)`` the synth becomes ``(b, a, c)``. - Stop skipping ``@dataclass(init=False)`` bases: their fields STILL live in ``__dataclass_fields__`` and the derived class merges them. ``init=False`` only suppresses the base's OWN init synthesis. - Update the existing multiple-inheritance test (renamed to ``test_dataclass_multiple_inheritance_mro_order``) and add diamond, single-inheritance regression, and init=False-base inheritance tests. - Document reverse-MRO ordering and init=False inheritance in the prompt and architecture.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Review verdict: needed, but not merge-ready yet. The underlying issue in #1012 is real: the existing architecture conformance gate only checks declared architecture/interface symbols and does not protect incumbent public helpers or broad existing test coverage. A compatibility gate in this area is warranted. Blocking required change: the agentic test-churn gate still misses existing alternate-path test rewrites whenever the canonical output file already exists. In I reproduced this locally against the PR head with:
Observed result: Required changes before merge:
The focused PR tests I ran are green ( |
…e (Greg review #1015) Greg's review of PR #1015 surfaced a false negative in the agentic test-churn gate: when the canonical output file exists with content AND the agent rewrites a separate pre-existing test-like file (e.g. shrinks `__tests__/widget.test.ts` from 20 tests to 1) while leaving the canonical untouched, no `TestChurnError` was raised. Root cause — `pdd/agentic_test_generate.py`: The alt-path branch (`if not generated_content and changed_files`) only ran a churn check on alt-path files when the canonical was EMPTY. With canonical present and unchanged, the alt-path scan was skipped entirely; `cmd_test_main`'s outer churn check then compared the (unchanged) canonical against itself and trivially passed. Root cause — `pdd/one_session_sync.py`: The churn gate compared only `test_path` (canonical). The one-session agent runs under a mega-prompt with no scope constraint and can touch any test-like file in the project, so the same false negative applied. Fix: 1. `pdd/agentic_test_generate.py`: replace the alt-path-only churn check with a broad sweep over every changed test-like file that existed pre-run (snapshot already taken via `_snapshot_pre_test_contents`). Restore EVERY pre-run snapshot before raising so the retry loop's next attempt re-snapshots from the true pre-run baseline — restoring only the violating file would let attempt-N rewrites become attempt-N+1's new baseline and permanently weaken the gate. 2. `pdd/one_session_sync.py`: add a pre-session snapshot of every test-like file in the project (lives outside the retry loop, single source of truth across attempts). Inside the existing churn gate try-block, sweep changed test-like files against the snapshot before running the canonical-only check. Skip the canonical path in the sweep so the canonical's deletion-as-churn shortcut and repair directive remain intact. The opt-out env flags and prompt- side `BREAKING-CHANGE: rewrite tests` were moved above the canonical-empty early-return so they apply to the alt-path sweep too. The except handler now restores every snapshot in addition to the canonical + code paths. Test coverage: - `tests/test_agentic_test_generate.py`: - `test_canonical_present_alt_path_rewrite_raises_churn_error` (Greg's repro: canonical unchanged, alt-path 20→1) - `test_canonical_present_multiple_alt_paths_all_restored_on_failure` (restore-all invariant) - `tests/test_one_session_sync.py::TestOneSessionRollback`: - `test_alt_path_rewrite_with_canonical_unchanged_raises_churn_error` (analogous one-session repro) Both new test bodies were confirmed to FAIL against the pre-fix code (via `git stash` of the patched module) and PASS against the patched code. Existing 12-scenario gate matrix remains 12/12 PASS. Closes Greg-review blocking concern on PR #1015. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses @gltanaka review — iter-15 (commit
|
| Layer | Pre-fix | Post-fix |
|---|---|---|
| 8 PR-touched test files (focused gate suites) | 983 pass, 1 skip | 986 pass, 1 skip (+3 new regression tests) |
12-scenario gate matrix (scenario_matrix.py) |
12/12 | 12/12 |
All existing test_one_session_sync.py cases |
81 pass | 81 pass (no behavior drift on canonical-only paths) |
CI + cloud-batch
GitHub CI re-triggered automatically on the push. Cloud Batch re-submitted (run-20260522-152907-7c8b3bb22, currently uploading source). I'll post the cloud-batch result once it lands; given iter-13 + iter-14 already passed 77/77 and this commit is additive (no existing test deletions, no behavior changes on the canonical-only path), I'd expect the same outcome — but verifying.
Open items intentionally not in this commit
- Deletion-as-churn for alt-paths. Your repro is a rewrite; this commit doesn't extend deletion-as-churn to alt-paths (the canonical deletion case is still handled). If you'd like that covered, easiest follow-up is to add a sibling sweep over
pre_test_contentskeys that no longer exist post-session. Happy to add it on request — kept scope tight per your "the canonical-present + alt-path-existing case" framing. PDD_TEST_CHURN_THRESHOLD=0.40default. Unchanged this commit; carry-over flag from prior verification rounds. Independent of the false-negative fix.
Adds two repair-loop gates to prevent pdd sync from silently removing mature module APIs/tests when regenerating code. **PublicSurfaceRegressionError** — for mature Python modules (non-empty pre-generation code file), snapshots the public surface via stdlib `ast` before writing the new generation and raises when symbols are removed or signatures change. Captured surface includes: - Top-level functions and classes (recursive nested classes/methods at every depth — `Outer.Inner.method`). - Module-level public Assign / AnnAssign-with-value / Import / ImportFrom targets (so `import git` removal raises). - `__all__` is honored as authoritative when declared (last-assignment wins; computed forms fall back to the underscore-prefix heuristic; classes listed in `__all__` have their members recursively walked). - Implicit `__init__` is tracked so adding a required-arg constructor on a previously zero-arg class raises. **TestChurnError** — when overwriting an existing non-empty test file through code_generator_main, cmd_test_main, or one_session_sync, computes the stdlib `difflib.unified_diff` churn ratio between pre and post test bodies and raises when the ratio exceeds `PDD_TEST_CHURN_THRESHOLD` (default 0.40; trailing `%` accepted; invalid values log a warning and default). Deletion of a pre-existing test file is treated as maximal churn (ratio=1.0). Both gates accept anchored `BREAKING-CHANGE:` directives in the prompt body. Lines must start at column 0 (after optional whitespace), with a verb (`remove`/`rename`/`change signature`/`rewrite tests`/...) and a comma-separated symbol list. The verb's object must adjoin the relevant target (e.g. `rewriting tests` opts out churn; `rewrite docs and update tests` does not). Bare markers, prose-only mentions, and mid-line references do not opt out. Both errors carry a `repair_directive` and route through the existing `PDD_REPAIR_DIRECTIVE` plumbing. All four retry helpers (`code_generator_main`, `cmd_test_main`, `_run_test_op_with_churn_retry`, `run_one_session_sync`) follow the same contract: - Pop `PDD_REPAIR_DIRECTIVE` before attempt 1 so stale outer env doesn't contaminate the first try. - Loop-local directive variable holds the in-loop state; only set after catching a typed repair exception raised by THIS loop. - `cmd_test_main` accepts an explicit `repair_directive` keyword parameter so retry helpers pass it directly rather than via env. - The env-var write is preserved for cross-process inheritance into re-entrant PDD CLI subprocesses spawned by agentic generation. - `finally` block restores the prior env value (pop if originally unset). The gate fires on every code-write path in pdd sync: - In-process generate (`code_generator_main`). - Per-op orchestration (`crash_main`, `fix_verification_main`, `fix_main`) via `_verify_code_surface_after_write` — on regression, cost is charged and the operation loop breaks (no infinite loop on repeatable verify regressions). - One-session agentic path (`run_one_session_sync`) — surface check runs before churn check on each attempt; both files (code and test) are restored on exhaustion. 826 tests pass across the changed module set (`test_code_generator_main`, `test_cmd_test_main`, `test_sync_orchestration`, `test_sync_main`, `test_one_session_sync`, `test_agentic_sync_runner`, `commands/test_maintenance`). Closes #1012 Closes #1030 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n gates Addresses the 2 Blocker + 2 Important findings from codex review iter-1 of PR #1015. All findings have a focused regression test added. - Blocker 1 (binding kind): _snapshot_public_signatures now prefixes each class-body method signature with [instance]/[staticmethod]/[classmethod]/ [property] so a method-binding flip (def f(self, v) -> @staticmethod def f(v)) is detected as a signature change even when the receiver-stripped params match. Previously Class.f(1) callers could break silently. - Blocker 2 (YAML front matter): defensive hardening that strips a leading YAML --- ... --- front matter block before BREAKING-CHANGE: parsing, so an indented opt-out buried in metadata cannot whitelist surface removals or test-churn. Verified no shipped prompt currently uses front matter (pdd/prompts/ scan), so this guards against a future convention rather than fixing a current exploit. - Important 1 (contract docs): tightens the docstring on _verify_code_surface_after_write and the matching README sentence to make explicit that crash/fix/verify surface failures are hard failures (no PDD_REPAIR_DIRECTIVE retry); the repair loop applies to generate and one-session paths only. Backed by a regression test that asserts the gate runs exactly once and returns the typed error without retry. - Important 2 (retry budget): splits the one-session retry accounting so public-surface repair uses MAX_CONFORMANCE_ATTEMPTS (parity with the generate path) while test-churn keeps its tighter _MAX_TEST_CHURN_ATTEMPTS cap. Previously a single min() shared the smaller cap across both gates, cutting surface repair off at 2 attempts. Backed by two tests: one asserts surface attempts use the full budget, the other pins the churn cap at 2. Test coverage: - 10 new tests in tests/test_code_generator_main.py covering binding-kind detection (instance/staticmethod/classmethod/property flips) and YAML front-matter stripping (with unterminated-fence and empty-prompt edge cases). - 2 new tests in tests/test_sync_orchestration.py pinning the hard-failure contract of _verify_code_surface_after_write. - 2 new tests in tests/test_one_session_sync.py exercising the full MAX_CONFORMANCE_ATTEMPTS budget for surface repair and the _MAX_TEST_CHURN_ATTEMPTS cap for churn repair. Local pytest run: 570 passed, 0 failed across test_code_generator_main.py, test_sync_orchestration.py, test_one_session_sync.py, test_cmd_test_main.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…n gates Address three new blockers introduced by the iter-1 follow-up commit on PR #1015: * `_strip_yaml_front_matter` now tolerates a leading UTF-8 BOM, CRLF / mixed line endings, trailing whitespace on the fence line, and a closing fence at EOF (no trailing newline). The previous regex only matched `---\n...---\n`, so a Windows-saved prompt could leave `BREAKING-CHANGE:` metadata visible to the directive parser and silently opt out of the gates. * `_snapshot_public_signatures` now accumulates property accessor roles per class member before recording the final entry. Previously `@x.setter def x(self, value)` overwrote the `@property` getter in the signatures dict and was misclassified as `[instance] (value)`, so a rewrite that replaced the whole descriptor with a plain `def x(self, value)` produced an identical snapshot and bypassed the gate. The merged entry is now `[property:<sorted-roles>] <getter-signature>`, distinct from any plain-method snapshot. * Top-level public symbols now carry a kind prefix (`[function]`, `[async_function]`, `[class]`) so swapping a public class with a same-named function (or a sync function with `async def`) is detected even when the receiver-stripped parameter list happens to match. The class self-entry inside `_walk_class` mirrors this with `[class]`, applying transitively to nested classes. Adds 17 tests across `TestPublicSurfaceBindingKind` and `TestFrontMatterStripping` covering each new edge case (property / property+setter / setter-loss flips, class/function/async flips, CRLF / BOM / EOF-terminated / trailing-whitespace fences, and end-to-end verification that buried directives no longer opt out of the removal gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Top-level `assignment ↔ def/class` kind flips were asymmetric: the iter-2 fallback caught `def/class → assignment` (signature dropped, surface kept), but `assignment → def/class` left the before-side without a signature entry — the regenerated `def Foo()` looked like a new symbol, not a kind flip. `_snapshot_public_signatures` now also records public top-level `Assign` / `AnnAssign-with-value` targets as `[assignment]`, honoring `__all__` via the existing `_is_public_top_level` predicate. Both flip directions hit the primary `changed_set` diff path. Adds 4 regression tests in `TestPublicSurfaceBindingKind` covering both flip directions and the unchanged-assignment sanity check, plus an inline note in `_walk_class` acknowledging the known class-body-redefinition limitation flagged by codex as a debatable nit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address four findings from the external reviewer: 1. Blocker (pdd/one_session_sync.py): `except TestChurnError` now restores the pre-session code file in addition to the test file, mirroring the `PublicSurfaceRegressionError` handler's pattern. A failed one-session attempt previously left mutated code on disk even when the test rewrite was rejected, so subsequent `pdd sync` runs treated the mutated code as the baseline. 2. High (pdd/code_generator_main.py): `_format_python_signature` now emits a literal `/` token between positional-only and regular args so `def f(x, /, y)` and `def f(x, y)` produce DIFFERENT snapshots. Previously both forms snapshotted as `(x, y)` and the public-surface gate missed a real ABI break (kwarg callers succeed against the second form, fail against the first). `skip_first` strips from the posonly group when present, otherwise from `args.args`, so the marker insertion index stays correct. 3. Medium (pdd/prompts/sync_orchestration_python.prompt): the non-generate code-write surface gate bullet now says `break`-ing the operation loop (matching the actual code at sync_orchestration.py:~2825) and mirrors the inline rationale that surface regression is a contract violation, not a transient failure — `continue` would let `sync_determine_operation` re-select the same operation and spin indefinitely on deterministic bad output. 4. High (architecture.json): refreshed signature strings for `update_file_pair` (added `strength`/`temperature` params) and `commit_and_push` (now takes `Sequence[DriftInfo]` with defaults and the new `finalized_modules` parameter) to match the live function signatures. Tests: - TestOneSessionRollback::test_test_churn_restores_code_and_test pins the blocker fix: mock rewrites both code and test, asserts both files restored after TestChurnError propagates. - TestPublicSurfaceBindingKind: four new tests for the posonly fix (added/removed/unchanged + a direct snapshot-string assertion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three edge cases reviewer probes found after the iter-3 fix landed: * `_snapshot_public_signatures` now records top-level imports as `[import]` / `[import:<module>]` / `[import:from <module>:<source>]` entries (parity with the surface helper). Without this entry, a silent re-export break like `from pathlib import Path` -> `def Path(): pass` produced no snapshot diff because the surface set unchanged while the signatures dict treated the new function as a brand-new symbol. * `_walk_class` now synthesises a constructor signature from class- level `AnnAssign` annotations when an `@dataclass` / `@dataclasses. dataclass` decorated class has no explicit `__init__`. Adding a required field (`name: str` -> `name: str; age: int`) now flips the snapshot. `ClassVar` / `InitVar` annotations are excluded; explicit `__init__` still takes precedence. Third-party `attrs` / `pydantic` decorators are out of scope (documented). * `build_public_surface_hard_failure_from_error` and `build_test_churn_hard_failure_from_error` now include the `BREAKING-CHANGE:` opt-in instruction line required by item 9d of the runner prompt. Tests cover the four import-flip variants, six dataclass scenarios, and two hard-fail string format checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Addresses 4 review findings on commit 6523cfa: - High: dataclass init synthesizer now honours @DataClass(kw_only=True), the in-body `_: KW_ONLY` sentinel, and InitVar[...] fields. The sentinel check runs before the underscore-prefix skip so the canonical marker is recognised; decorator wins over sentinel (single `*`); a trailing `*` without kw-only fields is stripped. - Medium: fields declared with `field(init=False, ...)` (bare or `dataclasses.field` attribute form) are excluded from the synth, matching the stdlib runtime. `init=True`, missing `init` kwarg, and non-`field` calls still include the field. - Medium: AsyncSyncRunner's internal `_build_public_surface_hard_failure` and `_build_test_churn_hard_failure` blocks now carry the BREAKING-CHANGE: opt-out instruction so end users see the directive on the subprocess hard-failure path (not just the importable builders). - Medium: prompt and architecture.json describe the new dataclass synth rules (kw_only / KW_ONLY / InitVar included; ClassVar + init=False excluded; explicit __init__ wins) plus the kind-prefix invariant and import-binding formats so prompt-driven regeneration preserves them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two more dataclass edge cases probed by the external reviewer on ``93d7fe69d``: 1. ``@dataclass(init=False)`` was ignored. The synth unconditionally built a constructor signature from class-body ``AnnAssign`` nodes when a dataclass decorator was present, but ``init=False`` keeps the class's natural (zero-arg) ``__init__``. Now detect the kwarg on Call decorators (``@dataclass(init=False)`` and ``@dataclasses.dataclass(init=False)``) and emit ``()`` instead of walking the fields. Flipping ``init=False`` -> default trips the gate; adding fields under ``init=False`` does not. 2. Inherited dataclass fields were not merged into the synthesised constructor signature. Now walk same-module ``Name`` bases left- to-right depth-first with a cycle guard, prepending each ``@dataclass`` base's fields (and ITS bases' fields, recursively) in front of the derived class's own fields. Override semantics: when both base and derived declare the same field name, the derived's annotation/default replaces the base's, preserving the base's slot via insertion-order dict semantics — matching what ``@dataclass`` actually synthesises. Cross-module / attribute-form bases (``class User(pkg.Base)``) that we cannot resolve from this module's AST get an ``[inherited_unresolved]`` marker prepended to the synth signature so local field changes still diff while invisible base changes correctly slip through. V1 limitations (documented in the helper docstring): - A base decorated with ``@dataclass(init=False)`` still contributes declared fields to a derived ``@dataclass``'s synth at runtime, but we treat such bases as non-dataclass for the purposes of inherited- field merging. - The ``KW_ONLY`` sentinel does not propagate across the inheritance boundary — each class re-emits its own kw-only context. Tests in ``TestPublicSurfaceBindingKind``: - ``test_dataclass_init_false_decorator_flip_to_default_is_detected`` - ``test_dataclass_init_false_decorator_keeps_zero_arg_signature`` - ``test_dataclasses_dataclass_init_false_form_works`` - ``test_dataclass_inherited_field_added_to_private_base_is_detected`` - ``test_dataclass_unchanged_inheritance_is_allowed`` - ``test_dataclass_unresolved_base_is_marked_uncertain`` - ``test_dataclass_multiple_inheritance_left_to_right`` - ``test_dataclass_explicit_init_still_takes_precedence_with_inheritance`` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Walk dataclass bases in reverse-MRO order so the synthesised ``__init__`` matches Python's ``@dataclass`` semantics (``__dataclass_fields__`` is populated by reversed iteration so later bases override earlier ones in dict insertion order). For ``class C(A, B)`` the synth becomes ``(b, a, c)``. - Stop skipping ``@dataclass(init=False)`` bases: their fields STILL live in ``__dataclass_fields__`` and the derived class merges them. ``init=False`` only suppresses the base's OWN init synthesis. - Update the existing multiple-inheritance test (renamed to ``test_dataclass_multiple_inheritance_mro_order``) and add diamond, single-inheritance regression, and init=False-base inheritance tests. - Document reverse-MRO ordering and init=False inheritance in the prompt and architecture.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Mirror prompt deps into architecture.json for update_main (was 2/9) and ci_drift_heal (was 7/9) so global sync can topologically order stale dependents correctly. - Surface repair_directive (#1012 F-H) on the canonical signatures for cmd_test_main and run_agentic_test_generate in architecture.json, plus the cmd_test_main_python.prompt Inputs section and the agentic_test_generate_python.prompt Entry Point. Also fix run_agentic_test_generate's pre-existing stale 4-arg Tuple[bool, str, str] signature in architecture.json. - Honor the anchored BREAKING-CHANGE prompt opt-out (parsed by _prompt_allows_test_churn) in the deletion-as-churn branches at cmd_test_main.py:230 and one_session_sync.py:609, closing the asymmetry where the prompt opt-out worked for wholesale rewrites but silently failed when the agent emptied/deleted the test file. Import the helper into both files and add regression tests in test_cmd_test_main.py and test_one_session_sync.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Address three findings from the external review of the public-surface and test-churn gates: 1. P1 — `_is_test_output_path` now also matches `<name>_test.go` / `<name>_test.rb` / `<name>_spec.rb` so Go and Ruby sibling test files no longer bypass the churn gate when they live next to production code instead of under `tests/`. 2. P1 — `_snapshot_public_surface` skips `from __future__ import ...` directives. Removing the directive after a Python-version bump is harmless cleanup, but the gate previously flagged `annotations` (and any other future name) as a removed public symbol. 3. P2 — `_verify_public_surface_regression` expands BREAKING-CHANGE class-removal directives to cover every captured descendant. With `BREAKING-CHANGE: remove Service` the caller no longer has to enumerate `Service.run`, `Service.Inner.go`, etc. Signature-change directives stay strict — auto-expansion there would silently mask real regressions. Additionally: - sync_main retry message: report "Public surface regression" / "Test churn threshold exceeded" instead of mislabeling them as "Architecture conformance failed". - README: document the full public-surface coverage (constants and re-exported imports) and the new class-removal descendant rule. - CHANGELOG: restore the v0.0.238 Chore/Test subsections the original feat commit accidentally dropped. Regression coverage added in TestSyncCompatibilityGates: Go/Ruby sibling-file churn, `__future__` import ignored by the gate, and `BREAKING-CHANGE: remove Class` covering nested descendants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address three follow-up findings from the next review pass:
1. P1 — `_snapshot_public_signatures` also skips `from __future__
import …`. Without this the surface helper would return early but
the signatures dict still records `[import:from __future__]`, so a
regenerated module that drops the directive and binds a real
`annotations = …` value would falsely diff as a
`[import:from __future__]` → `[assignment]` signature change.
2. P1 — `_is_test_output_path` now also matches PascalCase JVM-style
suffixes: `Test.java`, `Tests.java`, `Test.kt`, `Tests.kt`,
`Spec.kt`. The agentic test prompt already names `Test.java` as a
recognised convention; without this match `src/test/java/FooTest.java`
bypassed the churn gate. Case-sensitive on purpose so `latest.kt`,
`manifest.java`, `request.scala` do not false-positive.
3. P1 — Prompt drift: `pdd/prompts/code_generator_main_python.prompt`
now documents the `__future__` skip in BOTH `_snapshot_public_surface`
and `_snapshot_public_signatures`, the full PascalCase / sibling-
file test-suffix list in the test-churn gate description, and the
`BREAKING-CHANGE: remove <Class>` descendant-expansion rule. A
future `pdd sync code_generator_main` can now regenerate without
reverting iter-9 + iter-10.
Nit:
- `run_agentic_test_generate` annotated return type was a stale
`tuple[str, float, str, bool]` (and the docstring claimed a 4-tuple)
while the actual return is `TestResult` (5 fields). Updated both.
Regression coverage added in TestSyncCompatibilityGates:
- `from __future__ import annotations` rebinding to a real
`annotations = {...}` value (the exact reviewer-named scenario).
- `FooTest.java` and `WidgetSpec.kt` churn detection.
- `latest.kt` NOT classified as a test (PascalCase false-positive
regression guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three follow-up findings from the next review pass: 1. P1 — Expand `_is_test_output_path` to cover the remaining common test-file conventions the gate was skipping: Maven failsafe integration tests (`FooIT.java`), older JUnit/TestNG TestCase convention (`FooTestCase.java`), ScalaTest (`Test.scala`, `Tests.scala`, `Spec.scala`), Swift (`Tests.swift`), .NET xUnit/ NUnit (`Test.cs`, `Tests.cs`), and Rust sibling tests (`_test.rs`). All PascalCase JVM/.NET/Swift entries stay case-sensitive so `latest.kt`/`manifest.java`/`request.scala` continue NOT to false-positive. 2. P1 — Close the alt-path test churn loophole: when `run_agentic_test_generate` falls back to reading an alternate path (e.g. `__tests__/foo.test.ts`) because the canonical `output_test_file` is absent, recover the alt path's pre-existing content via `git show HEAD:<rel>` and re-run the test-churn check against THAT baseline (not the empty canonical). On `TestChurnError`: attach `total_cost`/`model_name` and restore the pre-existing content to disk so the repair loop sees the canonical pre-sync state. Untracked / never-committed alt files harmlessly return empty and fall through to first-time-generation behavior. Limitation: when the agent writes to BOTH the canonical output AND an alt path, only the canonical churn is gated; the alt-path delta in that dual-write case is still unguarded. The reviewer's named scenario (canonical absent, alt rewritten) is fully covered. 3. P1 — `_collect_bound_module_names` also skips `from __future__ import …` so `__all__ = ["annotations"]` no longer promotes the future binding into the public surface via the `__all__`- authoritative branch. Without this skip a clean-up that drops `from __future__ import annotations` AND the matching `__all__` entry would falsely diff as `removed: annotations`. Prompt sync: - `code_generator_main_python.prompt`: full extended test-suffix list in the test-churn gate description, and the parallel `__future__` skip in `_collect_bound_module_names` documented alongside the earlier skip notes for `_snapshot_public_surface` and `_snapshot_public_signatures`. - `agentic_test_generate_python.prompt`: documents the alt-path git-recovery churn check, the `_read_prior_content_from_git` helper contract, and the restore-on-failure requirement. Regression coverage: - `FooIT.java`, `FooTestCase.java`, `FooSpec.scala`, and `widget_test.rs` churn detection. - `__all__ = ["annotations"]` + `from __future__ import annotations` removal MUST NOT trip the gate. - `run_agentic_test_generate` alt-path rewrite with a tracked prior (mocked `_read_prior_content_from_git` returning 20 tests) raises `TestChurnError` AND restores the prior content; untracked prior (mocked empty) falls through with no raise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…shot; broaden test-file extension coverage
Structural pivots in response to the next external review pass —
the iterative "add named language" / "fall back to HEAD" pattern
was diverging, so both fixes change the underlying approach:
F1+F2 — Replace `git show HEAD` recovery with a working-tree
content snapshot. `run_agentic_test_generate` now snapshots every
existing file under `project_root` matching `_is_test_output_path`
BEFORE the agent runs (bounded by per-file size of ~1 MiB) and uses
that snapshot as the alt-path churn baseline. This covers all three
prior-state shapes uniformly:
- tracked-clean (matched HEAD's content anyway)
- tracked-dirty (HEAD-based restore would have silently clobbered
local edits; snapshot preserves the working-tree content)
- untracked (HEAD-based recovery returned empty, treating
pre-existing coverage as first-time generation; snapshot picks
it up correctly)
The `_read_prior_content_from_git` helper and the `subprocess`
import are removed — replaced by `_snapshot_pre_test_contents`,
which lazy-imports `_is_test_output_path` to avoid the circular
dependency.
F3 — Replace the per-language `sibling_test_suffixes` tuple with a
derived check against `_LANGUAGE_TEST_FILE_EXTS` (module-level
tuple seeded with `.py`, `.go`, `.rb`, `.rs`, `.exs`, `.ex`,
`.dart`, `.clj`, `.cljc`, `.lua`, `.php`). `_is_test_output_path`
now matches `_test.<ext>` / `_spec.<ext>` for every extension in
that set, so Elixir, Dart, Clojure, Lua, and PHP are covered in
one go — and a future language addition to `language_format.csv`
only needs one tuple-append, not another `_is_test_output_path`
fix round. PascalCase JVM/.NET/Swift suffixes stay enumerated
(case-sensitive on `Test`/`Tests`/`Spec`/`IT`/`TestCase` to dodge
`latest.kt`/`manifest.java`/`request.scala`/`latest.groovy` false
positives) and pick up Groovy's `Spec.groovy`/`Test.groovy`/
`Tests.groovy` (Spock framework).
Prompt + architecture sync:
- `code_generator_main_python.prompt`: documents the two-family
test-suffix shape (data-driven `_LANGUAGE_TEST_FILE_EXTS` vs
enumerated PascalCase) with an instruction to append the tuple
when a new language is added rather than patching the gate.
- `agentic_test_generate_python.prompt`: documents
`_snapshot_pre_test_contents` and the restore-on-failure
contract; the stale `_read_prior_content_from_git` /
`git show HEAD` notes are gone.
- `architecture.json`: `run_agentic_test_generate` sideEffects
describe the snapshot contract; `_snapshot_pre_test_contents`
listed as a module helper.
Regression coverage:
- `FooSpec.groovy` (Spock) and `Greatest.groovy` false-positive
guard in `TestSyncCompatibilityGates`.
- `_test.exs` / `_test.dart` / `_test.clj` / `_spec.lua` churn
detection (Elixir/Dart/Clojure/Lua arriving via the derived
extension set, not extra code paths).
- `run_agentic_test_generate` alt-path tests rewritten to exercise
the real snapshot code path (no git mocking): untracked prior
raises; tracked-dirty prior raises and restore preserves dirty
edits; brand-new alt path falls through to first-time-generation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sting files
BLOCKING: the conformance and public-surface / test-churn gate-call
sites in `code_generator_main` previously gated on `and
generated_code_content` (a truthy check), which evaluated False for
empty string. When the LLM provider returned "", every gate was
skipped and the writer further down then ran
`p_output.write_text(final_content)` — silently truncating an
existing public module / test file to 0 bytes with NO
`PublicSurfaceRegressionError` / `TestChurnError` raised. Reviewer
reproduced this with mocked local generation: existing public module
and `tests/test_existing.py` both went to 0 bytes.
Fix in three parts:
1. Gate condition — change `and generated_code_content` to
`and generated_code_content is not None` at the architecture-
conformance gate-call (line 4022) and at the public-surface /
test-churn gate-call (line 4043). With `is not None` an empty
generation still runs the gates: `_snapshot_public_surface("")`
returns `set()` so every prior symbol surfaces as removed
(PublicSurfaceRegressionError); `_compute_test_churn_ratio(...)`
returns 1.0 against an empty rewrite so any non-empty test file
trips TestChurnError; `_verify_architecture_conformance` finds
no symbols so every declared arch entry surfaces as missing.
The existing compat-gate restore branch then re-writes
`existing_code_content` to disk so the repair loop sees the
pre-sync baseline.
2. Safety guard — for file types neither gate can inspect (non-
Python source, JSON, YAML, prompt files, etc.) the writer now
refuses to overwrite non-empty existing content with empty or
whitespace-only generated content, raising a
`click.UsageError("Refusing to overwrite ...")`. Empty output is
almost always an upstream provider/parse failure, and silent
truncation loses real work. `PDD_ALLOW_EMPTY_GENERATION=1` is
the explicit escape hatch for the rare intentional case.
3. Documentation — prompt + README sync the empty-generation gating
contract and the new env flag so a future `pdd sync
code_generator_main` does not regenerate away the fixes and
users hitting the safety guard discover the bypass flag.
Regression coverage (three named end-user surfaces):
- empty generation over existing Python public module →
`PublicSurfaceRegressionError`, file restored to pre-sync content.
- empty generation over existing `test_existing.py` →
`TestChurnError`, file restored to pre-sync content.
- empty generation over existing `config.yaml` → safety-guard
`click.UsageError("Refusing to overwrite ...")`, file untouched.
The branch is rebased onto current `origin/main` (one
auto-generated test-durations conflict resolved by taking main's
version), so the PR now presents a clean linear history.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ecture.json External review iter-13 docs nit: the empty-generation safety guard added in 13cb692 and the new `PDD_ALLOW_EMPTY_GENERATION` escape- hatch env flag are documented in README.md and the `code_generator_main_python.prompt` body, but architecture.json's `code_generator_main` description and CHANGELOG.md still did not mention them. - architecture.json: extend the existing `code_generator_main` description (which already names `PDD_TEST_CHURN_THRESHOLD` / `PDD_REPAIR_DIRECTIVE`) with the `is not None` gating contract, the non-Python safety guard, and the `PDD_ALLOW_EMPTY_GENERATION` bypass — keeping the env-flag inventory complete for downstream consumers that read architecture.json as the API/behavior catalog. - CHANGELOG.md: add an "Unreleased" section summarising the PR #1015 / #1012 gate feature with the full env-flag list so the next release-notes bump can promote it verbatim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Cloud-test run on 792e7d7 surfaced two failures (75 passed, 2 failed of 77 tasks); both addressed here so the next run goes clean: 1. Stub mismatch — `tests/test_e2e_issue_342_syspath_isolation.py` monkeypatches `pdd.agentic_test_generate.run_agentic_test_generate` via a `fake_run_agentic_test_generate` stub. The real signature gained a `repair_directive: str | None = None` kwarg in iter-9 (#1012, F-A / F-H), but the stub still had the pre-iter-9 signature, so any code path that forwarded the new kwarg failed with `TypeError: ... got an unexpected keyword argument 'repair_directive'`. Stub now accepts the kwarg (default `None`). 2. Test-churn gate false-positive in `tests/regression.sh` and `tests/cloud_regression.sh` — both scripts pre-existed PR #1015 and seed small fixture files (~24-line `test_simple_math.py`) as prerequisites. When the gate-bearing `pdd test` then regenerates against that small fixture, the resulting test suite naturally exceeds the 40% churn threshold and raises `TestChurnError` (exit code 2). These regression scripts validate baseline CLI behavior — that `pdd generate/test/example/fix/...` succeed and produce output — NOT the new gates' behavior (the gates have their own targeted pytest suites). Exporting `PDD_SKIP_TEST_CHURN_GATE=1` and `PDD_SKIP_PUBLIC_SURFACE_GATE=1` at the top of each script opts the run out of the new gates without touching the `PDD_SKIP_CONFORMANCE` umbrella (so architecture conformance still helps if it ever fires through this surface). A multi-line comment in each script explains why so a future regression-script editor doesn't unwind the bypass. 3. Refresh `ci/cloud-batch/test-durations.json` from this run's junitxml (348 files / 349 total). Updated automatically by `submit.sh` post-job; included so the next cloud-test schedule benefits from the new measurements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Auto-updated by ci/cloud-batch/submit.sh after the post-iter-14 cloud-test run (77/77 passed, 0 failed). Including so the next cloud-test schedule chunks tests with the latest measurements. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e (Greg review #1015) Greg's review of PR #1015 surfaced a false negative in the agentic test-churn gate: when the canonical output file exists with content AND the agent rewrites a separate pre-existing test-like file (e.g. shrinks `__tests__/widget.test.ts` from 20 tests to 1) while leaving the canonical untouched, no `TestChurnError` was raised. Root cause — `pdd/agentic_test_generate.py`: The alt-path branch (`if not generated_content and changed_files`) only ran a churn check on alt-path files when the canonical was EMPTY. With canonical present and unchanged, the alt-path scan was skipped entirely; `cmd_test_main`'s outer churn check then compared the (unchanged) canonical against itself and trivially passed. Root cause — `pdd/one_session_sync.py`: The churn gate compared only `test_path` (canonical). The one-session agent runs under a mega-prompt with no scope constraint and can touch any test-like file in the project, so the same false negative applied. Fix: 1. `pdd/agentic_test_generate.py`: replace the alt-path-only churn check with a broad sweep over every changed test-like file that existed pre-run (snapshot already taken via `_snapshot_pre_test_contents`). Restore EVERY pre-run snapshot before raising so the retry loop's next attempt re-snapshots from the true pre-run baseline — restoring only the violating file would let attempt-N rewrites become attempt-N+1's new baseline and permanently weaken the gate. 2. `pdd/one_session_sync.py`: add a pre-session snapshot of every test-like file in the project (lives outside the retry loop, single source of truth across attempts). Inside the existing churn gate try-block, sweep changed test-like files against the snapshot before running the canonical-only check. Skip the canonical path in the sweep so the canonical's deletion-as-churn shortcut and repair directive remain intact. The opt-out env flags and prompt- side `BREAKING-CHANGE: rewrite tests` were moved above the canonical-empty early-return so they apply to the alt-path sweep too. The except handler now restores every snapshot in addition to the canonical + code paths. Test coverage: - `tests/test_agentic_test_generate.py`: - `test_canonical_present_alt_path_rewrite_raises_churn_error` (Greg's repro: canonical unchanged, alt-path 20→1) - `test_canonical_present_multiple_alt_paths_all_restored_on_failure` (restore-all invariant) - `tests/test_one_session_sync.py::TestOneSessionRollback`: - `test_alt_path_rewrite_with_canonical_unchanged_raises_churn_error` (analogous one-session repro) Both new test bodies were confirmed to FAIL against the pre-fix code (via `git stash` of the patched module) and PASS against the patched code. Existing 12-scenario gate matrix remains 12/12 PASS. Closes Greg-review blocking concern on PR #1015. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
7c8b3bb to
20eec1b
Compare
|
Follow-up review on updated head The iter-15 update fixes the previous rewrite false negative, but the same multi-file gate still misses deletion of pre-existing alternate test files. Blocking scenario:
Observed locally against
The one-session path has the same issue: Root cause:
Required changes before merge:
The current focused tests pass ( |
…(Greg follow-up review #1015) Greg's iter-15 follow-up review surfaced a residual false negative in the multi-file churn sweep: the iter-15 sweep filtered out paths that no longer existed (`if not snap_path.exists(): continue`), so an agent could DELETE a pre-existing alternate test file while leaving the canonical untouched and the gate would not fire. Symmetric to the canonical-deletion shortcut already in place; the alt-path deletion case was simply missing. Root cause — `pdd/agentic_test_generate.py`: The iter-15 sweep iterated `changed_files` and `continue`d when the changed path no longer existed. `pre_test_contents` keys for files the agent had deleted were never visited. Root cause — `pdd/one_session_sync.py`: The iter-15 sweep explicitly skipped non-existent snapshot keys (`if not snap_path.exists(): continue`) with a comment saying the canonical-deletion branch covered the canonical case — but that branch only fires for the canonical, not alt-paths. Fix: 1. Both sweeps now iterate `pre_test_contents.items()` directly so deletions are visited uniformly with rewrites. When the snapshotted file no longer exists post-agent, the sweep raises `TestChurnError(churn_ratio=1.0, post_line_count=0, ...)` with a deletion-specific repair directive mirroring the canonical shortcut. Honors the same opt-outs (`PDD_SKIP_TEST_CHURN_GATE`, `PDD_SKIP_CONFORMANCE`, prompt-side `BREAKING-CHANGE: rewrite tests`) — inline because the deletion branch raises directly without going through `_verify_test_churn`. 2. Restore loops now `mkdir(parents=True, exist_ok=True)` before `write_text`, so files (and parent directories) the agent deleted are recreated when the gate fires. Without this, restoration of a deleted file in a deleted directory would raise OSError and be swallowed, leaving the file lost. 3. `agentic_test_generate.py` also factors restoration into a `_restore_snapshots()` helper used by both the rewrite and deletion branches. Test coverage: - `tests/test_agentic_test_generate.py::test_canonical_present_alt_path_deletion_raises_churn_error` — Greg's repro: canonical unchanged, alt-path deleted by mocked agent. Asserts `TestChurnError(churn_ratio=1.0)` raised, alt-path restored from snapshot, canonical untouched. - `tests/test_agentic_test_generate.py::test_alt_path_deletion_honors_skip_test_churn_gate_env` — `PDD_SKIP_TEST_CHURN_GATE=1` bypasses the deletion check (since it raises directly without `_verify_test_churn`). - `tests/test_one_session_sync.py::TestOneSessionRollback::test_alt_path_deletion_with_canonical_unchanged_raises_churn_error` — Analogous one-session repro. Each new test was confirmed to FAIL against the iter-15 code (via git stash) and PASS against iter-16. Existing 989 focused tests + 12/12 scenario matrix continue to pass. Closes the iter-15 deletion follow-up flagged on PR #1015. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Follow-up review on updated head Good news: I reran the prior deletion repros and both now behave correctly:
Remaining blocker: if a one-session attempt causes a public-surface regression AND deletes/rewrites an alternate test file in the same attempt, the public-surface gate fires first and the churn gate is skipped. The surface handler restores only the code file and canonical test file, so the alternate test deletion remains on disk after the failed sync. I reproduced locally against
Observed result:
This still violates the PR's safety goal: a failed sync can discard broad existing test coverage as a side effect of a higher-priority surface failure. Required change before merge:
Focused checks I ran after this review:
|
… (Greg iter-16 follow-up) Greg's iter-16 follow-up review surfaced a residual issue in `run_one_session_sync`: when the same attempt regresses public surface AND rewrites/deletes an alt-path test file, the public- surface gate fires first (higher priority) and `continue`s before the test-churn gate runs. The surface handler restored only the code file + canonical test file, leaving the alt-path damage on disk — silently discarding broad existing coverage as a side effect of a higher-priority failure. Root cause: The `except PublicSurfaceRegressionError` handler in `run_one_session_sync` restored `code_path` from `existing_code_content` and `test_path` from `existing_test_content` but had no equivalent of the iter-15/16 churn-handler loop that walks `pre_test_contents.items()` and writes every snapshot back. Fix: Mirror the churn-handler restore loop in the surface handler. Iterate `pre_test_contents.items()`, skip the canonical (already restored above), and `mkdir(parents=True, exist_ok=True)` + `write_text` every snapshotted alt-path test file — so both rewrites AND deletions performed by the failed attempt are rolled back regardless of which gate ultimately raised. Best-effort: `OSError` swallowed per the existing pattern. Test coverage: - `tests/test_one_session_sync.py::TestOneSessionRollback::test_surface_failure_with_alt_path_deletion_restores_all` — Greg's repro: pre-existing public `keep_me()`, canonical test, alt-path `src/widget_test.py` with 20 tests. Mocked agent removes `keep_me()` AND deletes alt-path in the same attempt. Asserts `PublicSurfaceRegressionError` raised, code + canonical + alt-path ALL restored to pre-session bytes. Confirmed the test fails against iter-16 (alt-path stayed deleted) and passes against iter-17. Existing 990 focused tests + 12/12 scenario matrix continue to pass. Closes Greg's iter-16 surface-handler restore-coverage follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Extends the existing
ArchitectureConformanceErrorrepair-loop scaffolding with parallelPublicSurfaceRegressionErrorandTestChurnErrorgates sopdd syncfails fast with an actionable diagnostic instead of silently rewriting mature modules and discarding broad test coverage.Closes #1012
Changes Made
Prompts Modified
pdd/prompts/code_generator_main_python.prompt— Adds snapshot/diff requirements (steps 5b, 5c), two new exception classes (PublicSurfaceRegressionError,TestChurnError), three stdlib-only helpers, theBREAKING-CHANGE:opt-out marker, and thePDD_TEST_CHURN_THRESHOLDenv var (default 40%). First-time generation is exempt.pdd/prompts/sync_orchestration_python.prompt— Extends the generate-op retry loop to catch the new errors, pins.pddrc/strengthacross retries, and pre-snapshots the existing module before regeneration.pdd/prompts/agentic_sync_runner_python.prompt— Adds_parse_public_surface_failureand_parse_test_churn_failureplus two hard-failure formatters; generalises the stuck-symbol short-circuit.pdd/prompts/sync_main_python.prompt— Extends the one-session Phase 1 retry loop and widens the two import-select lists to include the new error types.Documentation Updated
README.md— Brief mention of the new compatibility gate and theBREAKING-CHANGE:opt-out.architecture.json— Records the 9 new public symbols and 1 new env var (all additive; no removed symbols).Strategy
Reuses the proven
ArchitectureConformanceErrorplumbing verbatim:PDD_REPAIR_DIRECTIVE,MAX_CONFORMANCE_ATTEMPTS, the env-var protocol, and the hard-failure formatter pattern. Conservative defaults (40% churn threshold), explicit opt-out viaBREAKING-CHANGE:in the prompt body, and stdlib-only helpers keep the risk profile low.Acceptance Criteria Mapping
PublicSurfaceRegressionError.BREAKING-CHANGE:is the opt-out.TestChurnErrorwithPDD_TEST_CHURN_THRESHOLD.agentic_sync_runner_python.prompt.Review Checklist
architecture.jsoninterface entries match the promptsNext Steps After Merge
pytest -vv tests/to verify functionality.Created by pdd change workflow