feat: add public surface and test churn gates to pdd sync (#1012) by prompt-driven-github[bot] · Pull Request #1015 · promptdriven/pdd

prompt-driven-github · 2026-05-14T20:48:31Z

Summary

Extends the existing ArchitectureConformanceError repair-loop scaffolding with parallel PublicSurfaceRegressionError and TestChurnError gates so pdd sync fails fast with an actionable diagnostic instead of silently rewriting mature modules and discarding broad test coverage.

Closes #1012

Changes Made

Prompts Modified

pdd/prompts/code_generator_main_python.prompt — Adds snapshot/diff requirements (steps 5b, 5c), two new exception classes (PublicSurfaceRegressionError, TestChurnError), three stdlib-only helpers, the BREAKING-CHANGE: opt-out marker, and the PDD_TEST_CHURN_THRESHOLD env var (default 40%). First-time generation is exempt.
pdd/prompts/sync_orchestration_python.prompt — Extends the generate-op retry loop to catch the new errors, pins .pddrc/strength across retries, and pre-snapshots the existing module before regeneration.
pdd/prompts/agentic_sync_runner_python.prompt — Adds _parse_public_surface_failure and _parse_test_churn_failure plus two hard-failure formatters; generalises the stuck-symbol short-circuit.
pdd/prompts/sync_main_python.prompt — Extends the one-session Phase 1 retry loop and widens the two import-select lists to include the new error types.

Documentation Updated

README.md — Brief mention of the new compatibility gate and the BREAKING-CHANGE: opt-out.
architecture.json — Records the 9 new public symbols and 1 new env var (all additive; no removed symbols).

Strategy

Reuses the proven ArchitectureConformanceError plumbing verbatim: PDD_REPAIR_DIRECTIVE, MAX_CONFORMANCE_ATTEMPTS, the env-var protocol, and the hard-failure formatter pattern. Conservative defaults (40% churn threshold), explicit opt-out via BREAKING-CHANGE: in the prompt body, and stdlib-only helpers keep the risk profile low.

Acceptance Criteria Mapping

✅ Compatibility/conformance gate comparing generated public symbols/signatures vs. pre-sync module — PublicSurfaceRegressionError.
✅ Externally imported helpers remain present unless explicitly removed — snapshot/diff in step 5b catches removals; BREAKING-CHANGE: is the opt-out.
✅ Generated test rewrites do not discard broad coverage — TestChurnError with PDD_TEST_CHURN_THRESHOLD.
✅ Broad unrelated churn fails with actionable diagnostic — hard-failure formatters in agentic_sync_runner_python.prompt.

Review Checklist

Prompt syntax is valid
PDD conventions followed (reuses existing repair-loop plumbing)
architecture.json interface entries match the prompts
README mentions the new gate and opt-out

Next Steps After Merge

Regenerate code from the modified prompts in dependency order:

./.pdd/sync_order_change.sh

Or manually:

pdd sync code_generator_main
pdd sync agentic_sync_runner
pdd sync one_session_sync
pdd sync durable_sync_runner
pdd sync prompts
pdd sync sync_main
pdd sync agentic_sync
pdd sync checkup
pdd sync ci_drift_heal
pdd sync generate
pdd sync maintenance
pdd sync agentic_checkup
pdd sync pin_example_hack
pdd sync sync_orchestration

Run pytest -vv tests/ to verify functionality.
Verify the new gates trip on a deliberate regression in a sandbox module.

Created by pdd change workflow

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

When a prior test-churn attempt failed, accumulate its cost (and model fallback) and fold them into the successful retry's TestResult so the caller charges every attempt against the sync budget. Mirrors the generate-op and one-session-sync accumulation patterns (Codex PR #1015 high finding). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ollow-ups) Three independent gate-correctness fixes called out by Codex iter-3 medium review (#1015): - `BREAKING-CHANGE:` opt-out parser now accepts identifier wrappers of matching back-tick, single-quote, OR double-quote (`"old_helper"`, `'Class.method'`). Mismatched quotes are still rejected so a typo can never silently whitelist a removal. - `_snapshot_public_surface` and `_snapshot_public_signatures` recurse into public nested classes at all depths so a removed/signature-changed `Outer.Inner.method` is caught (previously only direct class methods participated, letting nested-class drift escape both diffs). - `_get_test_churn_threshold` accepts `"40%"`/`"100%"` percent strings (strips the suffix, divides by 100, then clamps). Unparseable values still default to 0.40 but now log a warning so the silent default is visible to operators. Prompt at `pdd/prompts/code_generator_main_python.prompt` updated to match the new contract; new test coverage in `tests/test_code_generator_main.py` exercises each path including mismatched-wrapper rejection and triple-nested classes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@staticmethod

…n gates Addresses the 2 Blocker + 2 Important findings from codex review iter-1 of PR #1015. All findings have a focused regression test added. - Blocker 1 (binding kind): _snapshot_public_signatures now prefixes each class-body method signature with [instance]/[staticmethod]/[classmethod]/ [property] so a method-binding flip (def f(self, v) -> @staticmethod def f(v)) is detected as a signature change even when the receiver-stripped params match. Previously Class.f(1) callers could break silently. - Blocker 2 (YAML front matter): defensive hardening that strips a leading YAML --- ... --- front matter block before BREAKING-CHANGE: parsing, so an indented opt-out buried in metadata cannot whitelist surface removals or test-churn. Verified no shipped prompt currently uses front matter (pdd/prompts/ scan), so this guards against a future convention rather than fixing a current exploit. - Important 1 (contract docs): tightens the docstring on _verify_code_surface_after_write and the matching README sentence to make explicit that crash/fix/verify surface failures are hard failures (no PDD_REPAIR_DIRECTIVE retry); the repair loop applies to generate and one-session paths only. Backed by a regression test that asserts the gate runs exactly once and returns the typed error without retry. - Important 2 (retry budget): splits the one-session retry accounting so public-surface repair uses MAX_CONFORMANCE_ATTEMPTS (parity with the generate path) while test-churn keeps its tighter _MAX_TEST_CHURN_ATTEMPTS cap. Previously a single min() shared the smaller cap across both gates, cutting surface repair off at 2 attempts. Backed by two tests: one asserts surface attempts use the full budget, the other pins the churn cap at 2. Test coverage: - 10 new tests in tests/test_code_generator_main.py covering binding-kind detection (instance/staticmethod/classmethod/property flips) and YAML front-matter stripping (with unterminated-fence and empty-prompt edge cases). - 2 new tests in tests/test_sync_orchestration.py pinning the hard-failure contract of _verify_code_surface_after_write. - 2 new tests in tests/test_one_session_sync.py exercising the full MAX_CONFORMANCE_ATTEMPTS budget for surface repair and the _MAX_TEST_CHURN_ATTEMPTS cap for churn repair. Local pytest run: 570 passed, 0 failed across test_code_generator_main.py, test_sync_orchestration.py, test_one_session_sync.py, test_cmd_test_main.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n gates Address three new blockers introduced by the iter-1 follow-up commit on PR #1015: * `_strip_yaml_front_matter` now tolerates a leading UTF-8 BOM, CRLF / mixed line endings, trailing whitespace on the fence line, and a closing fence at EOF (no trailing newline). The previous regex only matched `---\n...---\n`, so a Windows-saved prompt could leave `BREAKING-CHANGE:` metadata visible to the directive parser and silently opt out of the gates. * `_snapshot_public_signatures` now accumulates property accessor roles per class member before recording the final entry. Previously `@x.setter def x(self, value)` overwrote the `@property` getter in the signatures dict and was misclassified as `[instance] (value)`, so a rewrite that replaced the whole descriptor with a plain `def x(self, value)` produced an identical snapshot and bypassed the gate. The merged entry is now `[property:<sorted-roles>] <getter-signature>`, distinct from any plain-method snapshot. * Top-level public symbols now carry a kind prefix (`[function]`, `[async_function]`, `[class]`) so swapping a public class with a same-named function (or a sync function with `async def`) is detected even when the receiver-stripped parameter list happens to match. The class self-entry inside `_walk_class` mirrors this with `[class]`, applying transitively to nested classes. Adds 17 tests across `TestPublicSurfaceBindingKind` and `TestFrontMatterStripping` covering each new edge case (property / property+setter / setter-loss flips, class/function/async flips, CRLF / BOM / EOF-terminated / trailing-whitespace fences, and end-to-end verification that buried directives no longer opt out of the removal gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Address four findings from the external reviewer: 1. Blocker (pdd/one_session_sync.py): `except TestChurnError` now restores the pre-session code file in addition to the test file, mirroring the `PublicSurfaceRegressionError` handler's pattern. A failed one-session attempt previously left mutated code on disk even when the test rewrite was rejected, so subsequent `pdd sync` runs treated the mutated code as the baseline. 2. High (pdd/code_generator_main.py): `_format_python_signature` now emits a literal `/` token between positional-only and regular args so `def f(x, /, y)` and `def f(x, y)` produce DIFFERENT snapshots. Previously both forms snapshotted as `(x, y)` and the public-surface gate missed a real ABI break (kwarg callers succeed against the second form, fail against the first). `skip_first` strips from the posonly group when present, otherwise from `args.args`, so the marker insertion index stays correct. 3. Medium (pdd/prompts/sync_orchestration_python.prompt): the non-generate code-write surface gate bullet now says `break`-ing the operation loop (matching the actual code at sync_orchestration.py:~2825) and mirrors the inline rationale that surface regression is a contract violation, not a transient failure — `continue` would let `sync_determine_operation` re-select the same operation and spin indefinitely on deterministic bad output. 4. High (architecture.json): refreshed signature strings for `update_file_pair` (added `strength`/`temperature` params) and `commit_and_push` (now takes `Sequence[DriftInfo]` with defaults and the new `finalized_modules` parameter) to match the live function signatures. Tests: - TestOneSessionRollback::test_test_churn_restores_code_and_test pins the blocker fix: mock rewrites both code and test, asserts both files restored after TestChurnError propagates. - TestPublicSurfaceBindingKind: four new tests for the posonly fix (added/removed/unchanged + a direct snapshot-string assertion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three edge cases reviewer probes found after the iter-3 fix landed: * `_snapshot_public_signatures` now records top-level imports as `[import]` / `[import:<module>]` / `[import:from <module>:<source>]` entries (parity with the surface helper). Without this entry, a silent re-export break like `from pathlib import Path` -> `def Path(): pass` produced no snapshot diff because the surface set unchanged while the signatures dict treated the new function as a brand-new symbol. * `_walk_class` now synthesises a constructor signature from class- level `AnnAssign` annotations when an `@dataclass` / `@dataclasses. dataclass` decorated class has no explicit `__init__`. Adding a required field (`name: str` -> `name: str; age: int`) now flips the snapshot. `ClassVar` / `InitVar` annotations are excluded; explicit `__init__` still takes precedence. Third-party `attrs` / `pydantic` decorators are out of scope (documented). * `build_public_surface_hard_failure_from_error` and `build_test_churn_hard_failure_from_error` now include the `BREAKING-CHANGE:` opt-in instruction line required by item 9d of the runner prompt. Tests cover the four import-flip variants, six dataclass scenarios, and two hard-fail string format checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addresses 4 review findings on commit 6523cfa: - High: dataclass init synthesizer now honours @DataClass(kw_only=True), the in-body `_: KW_ONLY` sentinel, and InitVar[...] fields. The sentinel check runs before the underscore-prefix skip so the canonical marker is recognised; decorator wins over sentinel (single `*`); a trailing `*` without kw-only fields is stripped. - Medium: fields declared with `field(init=False, ...)` (bare or `dataclasses.field` attribute form) are excluded from the synth, matching the stdlib runtime. `init=True`, missing `init` kwarg, and non-`field` calls still include the field. - Medium: AsyncSyncRunner's internal `_build_public_surface_hard_failure` and `_build_test_churn_hard_failure` blocks now carry the BREAKING-CHANGE: opt-out instruction so end users see the directive on the subprocess hard-failure path (not just the importable builders). - Medium: prompt and architecture.json describe the new dataclass synth rules (kw_only / KW_ONLY / InitVar included; ClassVar + init=False excluded; explicit __init__ wins) plus the kind-prefix invariant and import-binding formats so prompt-driven regeneration preserves them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two more dataclass edge cases probed by the external reviewer on ``93d7fe69d``: 1. ``@dataclass(init=False)`` was ignored. The synth unconditionally built a constructor signature from class-body ``AnnAssign`` nodes when a dataclass decorator was present, but ``init=False`` keeps the class's natural (zero-arg) ``__init__``. Now detect the kwarg on Call decorators (``@dataclass(init=False)`` and ``@dataclasses.dataclass(init=False)``) and emit ``()`` instead of walking the fields. Flipping ``init=False`` -> default trips the gate; adding fields under ``init=False`` does not. 2. Inherited dataclass fields were not merged into the synthesised constructor signature. Now walk same-module ``Name`` bases left- to-right depth-first with a cycle guard, prepending each ``@dataclass`` base's fields (and ITS bases' fields, recursively) in front of the derived class's own fields. Override semantics: when both base and derived declare the same field name, the derived's annotation/default replaces the base's, preserving the base's slot via insertion-order dict semantics — matching what ``@dataclass`` actually synthesises. Cross-module / attribute-form bases (``class User(pkg.Base)``) that we cannot resolve from this module's AST get an ``[inherited_unresolved]`` marker prepended to the synth signature so local field changes still diff while invisible base changes correctly slip through. V1 limitations (documented in the helper docstring): - A base decorated with ``@dataclass(init=False)`` still contributes declared fields to a derived ``@dataclass``'s synth at runtime, but we treat such bases as non-dataclass for the purposes of inherited- field merging. - The ``KW_ONLY`` sentinel does not propagate across the inheritance boundary — each class re-emits its own kw-only context. Tests in ``TestPublicSurfaceBindingKind``: - ``test_dataclass_init_false_decorator_flip_to_default_is_detected`` - ``test_dataclass_init_false_decorator_keeps_zero_arg_signature`` - ``test_dataclasses_dataclass_init_false_form_works`` - ``test_dataclass_inherited_field_added_to_private_base_is_detected`` - ``test_dataclass_unchanged_inheritance_is_allowed`` - ``test_dataclass_unresolved_base_is_marked_uncertain`` - ``test_dataclass_multiple_inheritance_left_to_right`` - ``test_dataclass_explicit_init_still_takes_precedence_with_inheritance`` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Walk dataclass bases in reverse-MRO order so the synthesised ``__init__`` matches Python's ``@dataclass`` semantics (``__dataclass_fields__`` is populated by reversed iteration so later bases override earlier ones in dict insertion order). For ``class C(A, B)`` the synth becomes ``(b, a, c)``. - Stop skipping ``@dataclass(init=False)`` bases: their fields STILL live in ``__dataclass_fields__`` and the derived class merges them. ``init=False`` only suppresses the base's OWN init synthesis. - Update the existing multiple-inheritance test (renamed to ``test_dataclass_multiple_inheritance_mro_order``) and add diamond, single-inheritance regression, and init=False-base inheritance tests. - Document reverse-MRO ordering and init=False inheritance in the prompt and architecture.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Mirror prompt deps into architecture.json for update_main (was 2/9) and ci_drift_heal (was 7/9) so global sync can topologically order stale dependents correctly. - Surface repair_directive (#1012 F-H) on the canonical signatures for cmd_test_main and run_agentic_test_generate in architecture.json, plus the cmd_test_main_python.prompt Inputs section and the agentic_test_generate_python.prompt Entry Point. Also fix run_agentic_test_generate's pre-existing stale 4-arg Tuple[bool, str, str] signature in architecture.json. - Honor the anchored BREAKING-CHANGE prompt opt-out (parsed by _prompt_allows_test_churn) in the deletion-as-churn branches at cmd_test_main.py:230 and one_session_sync.py:609, closing the asymmetry where the prompt opt-out worked for wholesale rewrites but silently failed when the agent emptied/deleted the test file. Import the helper into both files and add regression tests in test_cmd_test_main.py and test_one_session_sync.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Address three findings from the external review of the public-surface and test-churn gates: 1. P1 — `_is_test_output_path` now also matches `<name>_test.go` / `<name>_test.rb` / `<name>_spec.rb` so Go and Ruby sibling test files no longer bypass the churn gate when they live next to production code instead of under `tests/`. 2. P1 — `_snapshot_public_surface` skips `from __future__ import ...` directives. Removing the directive after a Python-version bump is harmless cleanup, but the gate previously flagged `annotations` (and any other future name) as a removed public symbol. 3. P2 — `_verify_public_surface_regression` expands BREAKING-CHANGE class-removal directives to cover every captured descendant. With `BREAKING-CHANGE: remove Service` the caller no longer has to enumerate `Service.run`, `Service.Inner.go`, etc. Signature-change directives stay strict — auto-expansion there would silently mask real regressions. Additionally: - sync_main retry message: report "Public surface regression" / "Test churn threshold exceeded" instead of mislabeling them as "Architecture conformance failed". - README: document the full public-surface coverage (constants and re-exported imports) and the new class-removal descendant rule. - CHANGELOG: restore the v0.0.238 Chore/Test subsections the original feat commit accidentally dropped. Regression coverage added in TestSyncCompatibilityGates: Go/Ruby sibling-file churn, `__future__` import ignored by the gate, and `BREAKING-CHANGE: remove Class` covering nested descendants. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Address three follow-up findings from the next review pass: 1. P1 — `_snapshot_public_signatures` also skips `from __future__ import …`. Without this the surface helper would return early but the signatures dict still records `[import:from __future__]`, so a regenerated module that drops the directive and binds a real `annotations = …` value would falsely diff as a `[import:from __future__]` → `[assignment]` signature change. 2. P1 — `_is_test_output_path` now also matches PascalCase JVM-style suffixes: `Test.java`, `Tests.java`, `Test.kt`, `Tests.kt`, `Spec.kt`. The agentic test prompt already names `Test.java` as a recognised convention; without this match `src/test/java/FooTest.java` bypassed the churn gate. Case-sensitive on purpose so `latest.kt`, `manifest.java`, `request.scala` do not false-positive. 3. P1 — Prompt drift: `pdd/prompts/code_generator_main_python.prompt` now documents the `__future__` skip in BOTH `_snapshot_public_surface` and `_snapshot_public_signatures`, the full PascalCase / sibling- file test-suffix list in the test-churn gate description, and the `BREAKING-CHANGE: remove <Class>` descendant-expansion rule. A future `pdd sync code_generator_main` can now regenerate without reverting iter-9 + iter-10. Nit: - `run_agentic_test_generate` annotated return type was a stale `tuple[str, float, str, bool]` (and the docstring claimed a 4-tuple) while the actual return is `TestResult` (5 fields). Updated both. Regression coverage added in TestSyncCompatibilityGates: - `from __future__ import annotations` rebinding to a real `annotations = {...}` value (the exact reviewer-named scenario). - `FooTest.java` and `WidgetSpec.kt` churn detection. - `latest.kt` NOT classified as a test (PascalCase false-positive regression guard). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three follow-up findings from the next review pass: 1. P1 — Expand `_is_test_output_path` to cover the remaining common test-file conventions the gate was skipping: Maven failsafe integration tests (`FooIT.java`), older JUnit/TestNG TestCase convention (`FooTestCase.java`), ScalaTest (`Test.scala`, `Tests.scala`, `Spec.scala`), Swift (`Tests.swift`), .NET xUnit/ NUnit (`Test.cs`, `Tests.cs`), and Rust sibling tests (`_test.rs`). All PascalCase JVM/.NET/Swift entries stay case-sensitive so `latest.kt`/`manifest.java`/`request.scala` continue NOT to false-positive. 2. P1 — Close the alt-path test churn loophole: when `run_agentic_test_generate` falls back to reading an alternate path (e.g. `__tests__/foo.test.ts`) because the canonical `output_test_file` is absent, recover the alt path's pre-existing content via `git show HEAD:<rel>` and re-run the test-churn check against THAT baseline (not the empty canonical). On `TestChurnError`: attach `total_cost`/`model_name` and restore the pre-existing content to disk so the repair loop sees the canonical pre-sync state. Untracked / never-committed alt files harmlessly return empty and fall through to first-time-generation behavior. Limitation: when the agent writes to BOTH the canonical output AND an alt path, only the canonical churn is gated; the alt-path delta in that dual-write case is still unguarded. The reviewer's named scenario (canonical absent, alt rewritten) is fully covered. 3. P1 — `_collect_bound_module_names` also skips `from __future__ import …` so `__all__ = ["annotations"]` no longer promotes the future binding into the public surface via the `__all__`- authoritative branch. Without this skip a clean-up that drops `from __future__ import annotations` AND the matching `__all__` entry would falsely diff as `removed: annotations`. Prompt sync: - `code_generator_main_python.prompt`: full extended test-suffix list in the test-churn gate description, and the parallel `__future__` skip in `_collect_bound_module_names` documented alongside the earlier skip notes for `_snapshot_public_surface` and `_snapshot_public_signatures`. - `agentic_test_generate_python.prompt`: documents the alt-path git-recovery churn check, the `_read_prior_content_from_git` helper contract, and the restore-on-failure requirement. Regression coverage: - `FooIT.java`, `FooTestCase.java`, `FooSpec.scala`, and `widget_test.rs` churn detection. - `__all__ = ["annotations"]` + `from __future__ import annotations` removal MUST NOT trip the gate. - `run_agentic_test_generate` alt-path rewrite with a tracked prior (mocked `_read_prior_content_from_git` returning 20 tests) raises `TestChurnError` AND restores the prior content; untracked prior (mocked empty) falls through with no raise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@staticmethod

…n gates Addresses the 2 Blocker + 2 Important findings from codex review iter-1 of PR #1015. All findings have a focused regression test added. - Blocker 1 (binding kind): _snapshot_public_signatures now prefixes each class-body method signature with [instance]/[staticmethod]/[classmethod]/ [property] so a method-binding flip (def f(self, v) -> @staticmethod def f(v)) is detected as a signature change even when the receiver-stripped params match. Previously Class.f(1) callers could break silently. - Blocker 2 (YAML front matter): defensive hardening that strips a leading YAML --- ... --- front matter block before BREAKING-CHANGE: parsing, so an indented opt-out buried in metadata cannot whitelist surface removals or test-churn. Verified no shipped prompt currently uses front matter (pdd/prompts/ scan), so this guards against a future convention rather than fixing a current exploit. - Important 1 (contract docs): tightens the docstring on _verify_code_surface_after_write and the matching README sentence to make explicit that crash/fix/verify surface failures are hard failures (no PDD_REPAIR_DIRECTIVE retry); the repair loop applies to generate and one-session paths only. Backed by a regression test that asserts the gate runs exactly once and returns the typed error without retry. - Important 2 (retry budget): splits the one-session retry accounting so public-surface repair uses MAX_CONFORMANCE_ATTEMPTS (parity with the generate path) while test-churn keeps its tighter _MAX_TEST_CHURN_ATTEMPTS cap. Previously a single min() shared the smaller cap across both gates, cutting surface repair off at 2 attempts. Backed by two tests: one asserts surface attempts use the full budget, the other pins the churn cap at 2. Test coverage: - 10 new tests in tests/test_code_generator_main.py covering binding-kind detection (instance/staticmethod/classmethod/property flips) and YAML front-matter stripping (with unterminated-fence and empty-prompt edge cases). - 2 new tests in tests/test_sync_orchestration.py pinning the hard-failure contract of _verify_code_surface_after_write. - 2 new tests in tests/test_one_session_sync.py exercising the full MAX_CONFORMANCE_ATTEMPTS budget for surface repair and the _MAX_TEST_CHURN_ATTEMPTS cap for churn repair. Local pytest run: 570 passed, 0 failed across test_code_generator_main.py, test_sync_orchestration.py, test_one_session_sync.py, test_cmd_test_main.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n gates Address three new blockers introduced by the iter-1 follow-up commit on PR #1015: * `_strip_yaml_front_matter` now tolerates a leading UTF-8 BOM, CRLF / mixed line endings, trailing whitespace on the fence line, and a closing fence at EOF (no trailing newline). The previous regex only matched `---\n...---\n`, so a Windows-saved prompt could leave `BREAKING-CHANGE:` metadata visible to the directive parser and silently opt out of the gates. * `_snapshot_public_signatures` now accumulates property accessor roles per class member before recording the final entry. Previously `@x.setter def x(self, value)` overwrote the `@property` getter in the signatures dict and was misclassified as `[instance] (value)`, so a rewrite that replaced the whole descriptor with a plain `def x(self, value)` produced an identical snapshot and bypassed the gate. The merged entry is now `[property:<sorted-roles>] <getter-signature>`, distinct from any plain-method snapshot. * Top-level public symbols now carry a kind prefix (`[function]`, `[async_function]`, `[class]`) so swapping a public class with a same-named function (or a sync function with `async def`) is detected even when the receiver-stripped parameter list happens to match. The class self-entry inside `_walk_class` mirrors this with `[class]`, applying transitively to nested classes. Adds 17 tests across `TestPublicSurfaceBindingKind` and `TestFrontMatterStripping` covering each new edge case (property / property+setter / setter-loss flips, class/function/async flips, CRLF / BOM / EOF-terminated / trailing-whitespace fences, and end-to-end verification that buried directives no longer opt out of the removal gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Address four findings from the external reviewer: 1. Blocker (pdd/one_session_sync.py): `except TestChurnError` now restores the pre-session code file in addition to the test file, mirroring the `PublicSurfaceRegressionError` handler's pattern. A failed one-session attempt previously left mutated code on disk even when the test rewrite was rejected, so subsequent `pdd sync` runs treated the mutated code as the baseline. 2. High (pdd/code_generator_main.py): `_format_python_signature` now emits a literal `/` token between positional-only and regular args so `def f(x, /, y)` and `def f(x, y)` produce DIFFERENT snapshots. Previously both forms snapshotted as `(x, y)` and the public-surface gate missed a real ABI break (kwarg callers succeed against the second form, fail against the first). `skip_first` strips from the posonly group when present, otherwise from `args.args`, so the marker insertion index stays correct. 3. Medium (pdd/prompts/sync_orchestration_python.prompt): the non-generate code-write surface gate bullet now says `break`-ing the operation loop (matching the actual code at sync_orchestration.py:~2825) and mirrors the inline rationale that surface regression is a contract violation, not a transient failure — `continue` would let `sync_determine_operation` re-select the same operation and spin indefinitely on deterministic bad output. 4. High (architecture.json): refreshed signature strings for `update_file_pair` (added `strength`/`temperature` params) and `commit_and_push` (now takes `Sequence[DriftInfo]` with defaults and the new `finalized_modules` parameter) to match the live function signatures. Tests: - TestOneSessionRollback::test_test_churn_restores_code_and_test pins the blocker fix: mock rewrites both code and test, asserts both files restored after TestChurnError propagates. - TestPublicSurfaceBindingKind: four new tests for the posonly fix (added/removed/unchanged + a direct snapshot-string assertion). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three edge cases reviewer probes found after the iter-3 fix landed: * `_snapshot_public_signatures` now records top-level imports as `[import]` / `[import:<module>]` / `[import:from <module>:<source>]` entries (parity with the surface helper). Without this entry, a silent re-export break like `from pathlib import Path` -> `def Path(): pass` produced no snapshot diff because the surface set unchanged while the signatures dict treated the new function as a brand-new symbol. * `_walk_class` now synthesises a constructor signature from class- level `AnnAssign` annotations when an `@dataclass` / `@dataclasses. dataclass` decorated class has no explicit `__init__`. Adding a required field (`name: str` -> `name: str; age: int`) now flips the snapshot. `ClassVar` / `InitVar` annotations are excluded; explicit `__init__` still takes precedence. Third-party `attrs` / `pydantic` decorators are out of scope (documented). * `build_public_surface_hard_failure_from_error` and `build_test_churn_hard_failure_from_error` now include the `BREAKING-CHANGE:` opt-in instruction line required by item 9d of the runner prompt. Tests cover the four import-flip variants, six dataclass scenarios, and two hard-fail string format checks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Addresses 4 review findings on commit 6523cfa: - High: dataclass init synthesizer now honours @DataClass(kw_only=True), the in-body `_: KW_ONLY` sentinel, and InitVar[...] fields. The sentinel check runs before the underscore-prefix skip so the canonical marker is recognised; decorator wins over sentinel (single `*`); a trailing `*` without kw-only fields is stripped. - Medium: fields declared with `field(init=False, ...)` (bare or `dataclasses.field` attribute form) are excluded from the synth, matching the stdlib runtime. `init=True`, missing `init` kwarg, and non-`field` calls still include the field. - Medium: AsyncSyncRunner's internal `_build_public_surface_hard_failure` and `_build_test_churn_hard_failure` blocks now carry the BREAKING-CHANGE: opt-out instruction so end users see the directive on the subprocess hard-failure path (not just the importable builders). - Medium: prompt and architecture.json describe the new dataclass synth rules (kw_only / KW_ONLY / InitVar included; ClassVar + init=False excluded; explicit __init__ wins) plus the kind-prefix invariant and import-binding formats so prompt-driven regeneration preserves them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two more dataclass edge cases probed by the external reviewer on ``93d7fe69d``: 1. ``@dataclass(init=False)`` was ignored. The synth unconditionally built a constructor signature from class-body ``AnnAssign`` nodes when a dataclass decorator was present, but ``init=False`` keeps the class's natural (zero-arg) ``__init__``. Now detect the kwarg on Call decorators (``@dataclass(init=False)`` and ``@dataclasses.dataclass(init=False)``) and emit ``()`` instead of walking the fields. Flipping ``init=False`` -> default trips the gate; adding fields under ``init=False`` does not. 2. Inherited dataclass fields were not merged into the synthesised constructor signature. Now walk same-module ``Name`` bases left- to-right depth-first with a cycle guard, prepending each ``@dataclass`` base's fields (and ITS bases' fields, recursively) in front of the derived class's own fields. Override semantics: when both base and derived declare the same field name, the derived's annotation/default replaces the base's, preserving the base's slot via insertion-order dict semantics — matching what ``@dataclass`` actually synthesises. Cross-module / attribute-form bases (``class User(pkg.Base)``) that we cannot resolve from this module's AST get an ``[inherited_unresolved]`` marker prepended to the synth signature so local field changes still diff while invisible base changes correctly slip through. V1 limitations (documented in the helper docstring): - A base decorated with ``@dataclass(init=False)`` still contributes declared fields to a derived ``@dataclass``'s synth at runtime, but we treat such bases as non-dataclass for the purposes of inherited- field merging. - The ``KW_ONLY`` sentinel does not propagate across the inheritance boundary — each class re-emits its own kw-only context. Tests in ``TestPublicSurfaceBindingKind``: - ``test_dataclass_init_false_decorator_flip_to_default_is_detected`` - ``test_dataclass_init_false_decorator_keeps_zero_arg_signature`` - ``test_dataclasses_dataclass_init_false_form_works`` - ``test_dataclass_inherited_field_added_to_private_base_is_detected`` - ``test_dataclass_unchanged_inheritance_is_allowed`` - ``test_dataclass_unresolved_base_is_marked_uncertain`` - ``test_dataclass_multiple_inheritance_left_to_right`` - ``test_dataclass_explicit_init_still_takes_precedence_with_inheritance`` Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

- Walk dataclass bases in reverse-MRO order so the synthesised ``__init__`` matches Python's ``@dataclass`` semantics (``__dataclass_fields__`` is populated by reversed iteration so later bases override earlier ones in dict insertion order). For ``class C(A, B)`` the synth becomes ``(b, a, c)``. - Stop skipping ``@dataclass(init=False)`` bases: their fields STILL live in ``__dataclass_fields__`` and the derived class merges them. ``init=False`` only suppresses the base's OWN init synthesis. - Update the existing multiple-inheritance test (renamed to ``test_dataclass_multiple_inheritance_mro_order``) and add diamond, single-inheritance regression, and init=False-base inheritance tests. - Document reverse-MRO ordering and init=False inheritance in the prompt and architecture.json. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

gltanaka · 2026-05-22T21:37:48Z

Review verdict: needed, but not merge-ready yet.

The underlying issue in #1012 is real: the existing architecture conformance gate only checks declared architecture/interface symbols and does not protect incumbent public helpers or broad existing test coverage. A compatibility gate in this area is warranted.

Blocking required change: the agentic test-churn gate still misses existing alternate-path test rewrites whenever the canonical output file already exists. In pdd/agentic_test_generate.py, generated_content is read from output_test_file first, and the alternate-path scan only runs under if not generated_content and changed_files (around lines 336-391). So if tests/widget.test.ts already exists and the agent rewrites a pre-existing __tests__/widget.test.ts from 20 tests down to 1 while leaving the canonical file intact, run_agentic_test_generate returns success and no TestChurnError is raised. cmd_test_main then only verifies/writes the canonical path (around lines 171-212), leaving the high-churn alt-path rewrite on disk.

I reproduced this locally against the PR head with:

pre-existing canonical tests/widget.test.ts unchanged
pre-existing alt-path __tests__/widget.test.ts with 20 tests
mocked agentic run rewriting only the alt-path file to 1 test and returning success

Observed result: run_agentic_test_generate returned agentic_success=True, and the alt-path file remained reduced to 1 line. This is a false negative for the PR's stated test-churn guarantee.

Required changes before merge:

Snapshot all existing test-like files before agentic runs and, after the run, compare every changed pre-existing test-like file against its snapshot, regardless of whether the canonical output file exists or has content.
Restore every rejected changed test file to its pre-run content before raising TestChurnError.
Add regression coverage for the canonical-present + alt-path-existing case, e.g. canonical test file unchanged while __tests__/widget.test.ts shrinks from many tests to one. Please cover the analogous one-session path too if that agent can touch multiple test files.

The focused PR tests I ran are green (tests/test_agentic_test_generate.py, tests/test_cmd_test_main.py -k "churn or agentic", and tests/test_one_session_sync.py -k "churn or surface"), but they do not cover this false-negative scenario.

…e (Greg review #1015) Greg's review of PR #1015 surfaced a false negative in the agentic test-churn gate: when the canonical output file exists with content AND the agent rewrites a separate pre-existing test-like file (e.g. shrinks `__tests__/widget.test.ts` from 20 tests to 1) while leaving the canonical untouched, no `TestChurnError` was raised. Root cause — `pdd/agentic_test_generate.py`: The alt-path branch (`if not generated_content and changed_files`) only ran a churn check on alt-path files when the canonical was EMPTY. With canonical present and unchanged, the alt-path scan was skipped entirely; `cmd_test_main`'s outer churn check then compared the (unchanged) canonical against itself and trivially passed. Root cause — `pdd/one_session_sync.py`: The churn gate compared only `test_path` (canonical). The one-session agent runs under a mega-prompt with no scope constraint and can touch any test-like file in the project, so the same false negative applied. Fix: 1. `pdd/agentic_test_generate.py`: replace the alt-path-only churn check with a broad sweep over every changed test-like file that existed pre-run (snapshot already taken via `_snapshot_pre_test_contents`). Restore EVERY pre-run snapshot before raising so the retry loop's next attempt re-snapshots from the true pre-run baseline — restoring only the violating file would let attempt-N rewrites become attempt-N+1's new baseline and permanently weaken the gate. 2. `pdd/one_session_sync.py`: add a pre-session snapshot of every test-like file in the project (lives outside the retry loop, single source of truth across attempts). Inside the existing churn gate try-block, sweep changed test-like files against the snapshot before running the canonical-only check. Skip the canonical path in the sweep so the canonical's deletion-as-churn shortcut and repair directive remain intact. The opt-out env flags and prompt- side `BREAKING-CHANGE: rewrite tests` were moved above the canonical-empty early-return so they apply to the alt-path sweep too. The except handler now restores every snapshot in addition to the canonical + code paths. Test coverage: - `tests/test_agentic_test_generate.py`: - `test_canonical_present_alt_path_rewrite_raises_churn_error` (Greg's repro: canonical unchanged, alt-path 20→1) - `test_canonical_present_multiple_alt_paths_all_restored_on_failure` (restore-all invariant) - `tests/test_one_session_sync.py::TestOneSessionRollback`: - `test_alt_path_rewrite_with_canonical_unchanged_raises_churn_error` (analogous one-session repro) Both new test bodies were confirmed to FAIL against the pre-fix code (via `git stash` of the patched module) and PASS against the patched code. Existing 12-scenario gate matrix remains 12/12 PASS. Closes Greg-review blocking concern on PR #1015. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Serhan-Asad · 2026-05-22T22:39:29Z

Addresses @gltanaka review — iter-15 (commit `7c8b3bb22`)

Pushed 7c8b3bb22 on change/issue-1012. Closes the multi-file churn false negative you described. Summary of what changed and how it was verified.

Root cause — confirmed

In pdd/agentic_test_generate.py, the alt-path churn block at lines 365-421 ran only under if not generated_content and changed_files: — i.e., the canonical-empty branch. When the canonical output file existed with content AND the agent rewrote a separate pre-existing test-like file (your __tests__/widget.test.ts 20→1 repro), alt_path_used stayed None, the alt-path churn re-check was skipped, and cmd_test_main's outer churn check then compared the (unchanged) canonical against itself and trivially passed. False negative.

pdd/one_session_sync.py had the analogous gap: the gate compared only test_path (canonical), but the one-session agent runs under a mega-prompt with no scope constraint and can touch any test-like file. Same false negative when the agent rewrites an alt-path while leaving canonical alone.

Fix — both files

pdd/agentic_test_generate.py — replace the alt-path-only check with a broad sweep over EVERY changed test-like file that existed pre-run (the working-tree snapshot was already being taken via _snapshot_pre_test_contents; we just weren't using it broadly enough). On the first churn violation, restore every pre-run snapshot (not just the violating file) before raising — so the repair loop's next attempt re-snapshots from the true pre-run baseline. Restoring only the violator would let attempt-N rewrites become attempt-N+1's new baseline and permanently weaken the gate.

pdd/one_session_sync.py — add a pre-session _snapshot_pre_test_contents call outside the retry loop (single source of truth across attempts). Inside the existing churn-gate try block, sweep changed test-like files against the snapshot before running the canonical-only check. Skip the canonical path in the sweep so its deletion-as-churn shortcut + repair directive remain canonical-specific. The opt-out env flags + prompt-side BREAKING-CHANGE: rewrite tests parser were moved above the canonical-empty early-return so they apply to the alt-path sweep too. The except TestChurnError handler now restores every snapshotted file in addition to the canonical + code paths.

Test coverage — all three of your asks

1. Canonical-present + alt-path-existing on the agentic path:

tests/test_agentic_test_generate.py::test_canonical_present_alt_path_rewrite_raises_churn_error — exact repro of your scenario (canonical unchanged, alt-path 20→1). Asserts TestChurnError raised, alt-path restored to pre-run snapshot, canonical untouched, cost/model attached.

2. Multi-violator restore-all invariant:

tests/test_agentic_test_generate.py::test_canonical_present_multiple_alt_paths_all_restored_on_failure — agent rewrites two alt-path files in one run. Asserts BOTH are restored (not just the one whose violation raised), so the next attempt's snapshot reflects the true pre-run state.

3. Analogous one-session path:

tests/test_one_session_sync.py::TestOneSessionRollback::test_alt_path_rewrite_with_canonical_unchanged_raises_churn_error — mirror of pdd install_completion: install_completion() got unexpected keyword argument 'quiet' #1 driven through run_one_session_sync with the mega-prompt agent rewriting a src/widget_test.py alt-path while canonical + code stay intact.

Confirmed test-fails-without-fix

For each new regression test, I verified the test FAILS against the pre-fix code by:

git stash push pdd/<module>.py (revert just the fixed module)
Run the new test → DID NOT RAISE <class 'TestChurnError'>
git stash pop → re-apply fix
Run the new test → PASS

So the new tests genuinely close your reported false negative; they aren't testing what they assert on a no-op patch.

Wider verification

Layer	Pre-fix	Post-fix
8 PR-touched test files (focused gate suites)	983 pass, 1 skip	986 pass, 1 skip (+3 new regression tests)
12-scenario gate matrix (`scenario_matrix.py`)	12/12	12/12
All existing `test_one_session_sync.py` cases	81 pass	81 pass (no behavior drift on canonical-only paths)

CI + cloud-batch

GitHub CI re-triggered automatically on the push. Cloud Batch re-submitted (run-20260522-152907-7c8b3bb22, currently uploading source). I'll post the cloud-batch result once it lands; given iter-13 + iter-14 already passed 77/77 and this commit is additive (no existing test deletions, no behavior changes on the canonical-only path), I'd expect the same outcome — but verifying.

Open items intentionally not in this commit

Deletion-as-churn for alt-paths. Your repro is a rewrite; this commit doesn't extend deletion-as-churn to alt-paths (the canonical deletion case is still handled). If you'd like that covered, easiest follow-up is to add a sibling sweep over pre_test_contents keys that no longer exist post-session. Happy to add it on request — kept scope tight per your "the canonical-present + alt-path-existing case" framing.
PDD_TEST_CHURN_THRESHOLD=0.40 default. Unchanged this commit; carry-over flag from prior verification rounds. Independent of the false-negative fix.

Adds two repair-loop gates to prevent pdd sync from silently removing mature module APIs/tests when regenerating code. **PublicSurfaceRegressionError** — for mature Python modules (non-empty pre-generation code file), snapshots the public surface via stdlib `ast` before writing the new generation and raises when symbols are removed or signatures change. Captured surface includes: - Top-level functions and classes (recursive nested classes/methods at every depth — `Outer.Inner.method`). - Module-level public Assign / AnnAssign-with-value / Import / ImportFrom targets (so `import git` removal raises). - `__all__` is honored as authoritative when declared (last-assignment wins; computed forms fall back to the underscore-prefix heuristic; classes listed in `__all__` have their members recursively walked). - Implicit `__init__` is tracked so adding a required-arg constructor on a previously zero-arg class raises. **TestChurnError** — when overwriting an existing non-empty test file through code_generator_main, cmd_test_main, or one_session_sync, computes the stdlib `difflib.unified_diff` churn ratio between pre and post test bodies and raises when the ratio exceeds `PDD_TEST_CHURN_THRESHOLD` (default 0.40; trailing `%` accepted; invalid values log a warning and default). Deletion of a pre-existing test file is treated as maximal churn (ratio=1.0). Both gates accept anchored `BREAKING-CHANGE:` directives in the prompt body. Lines must start at column 0 (after optional whitespace), with a verb (`remove`/`rename`/`change signature`/`rewrite tests`/...) and a comma-separated symbol list. The verb's object must adjoin the relevant target (e.g. `rewriting tests` opts out churn; `rewrite docs and update tests` does not). Bare markers, prose-only mentions, and mid-line references do not opt out. Both errors carry a `repair_directive` and route through the existing `PDD_REPAIR_DIRECTIVE` plumbing. All four retry helpers (`code_generator_main`, `cmd_test_main`, `_run_test_op_with_churn_retry`, `run_one_session_sync`) follow the same contract: - Pop `PDD_REPAIR_DIRECTIVE` before attempt 1 so stale outer env doesn't contaminate the first try. - Loop-local directive variable holds the in-loop state; only set after catching a typed repair exception raised by THIS loop. - `cmd_test_main` accepts an explicit `repair_directive` keyword parameter so retry helpers pass it directly rather than via env. - The env-var write is preserved for cross-process inheritance into re-entrant PDD CLI subprocesses spawned by agentic generation. - `finally` block restores the prior env value (pop if originally unset). The gate fires on every code-write path in pdd sync: - In-process generate (`code_generator_main`). - Per-op orchestration (`crash_main`, `fix_verification_main`, `fix_main`) via `_verify_code_surface_after_write` — on regression, cost is charged and the operation loop breaks (no infinite loop on repeatable verify regressions). - One-session agentic path (`run_one_session_sync`) — surface check runs before churn check on each attempt; both files (code and test) are restored on exhaustion. 826 tests pass across the changed module set (`test_code_generator_main`, `test_cmd_test_main`, `test_sync_orchestration`, `test_sync_main`, `test_one_session_sync`, `test_agentic_sync_runner`, `commands/test_maintenance`). Closes #1012 Closes #1030 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

@staticmethod

…n gates Addresses the 2 Blocker + 2 Important findings from codex review iter-1 of PR #1015. All findings have a focused regression test added. - Blocker 1 (binding kind): _snapshot_public_signatures now prefixes each class-body method signature with [instance]/[staticmethod]/[classmethod]/ [property] so a method-binding flip (def f(self, v) -> @staticmethod def f(v)) is detected as a signature change even when the receiver-stripped params match. Previously Class.f(1) callers could break silently. - Blocker 2 (YAML front matter): defensive hardening that strips a leading YAML --- ... --- front matter block before BREAKING-CHANGE: parsing, so an indented opt-out buried in metadata cannot whitelist surface removals or test-churn. Verified no shipped prompt currently uses front matter (pdd/prompts/ scan), so this guards against a future convention rather than fixing a current exploit. - Important 1 (contract docs): tightens the docstring on _verify_code_surface_after_write and the matching README sentence to make explicit that crash/fix/verify surface failures are hard failures (no PDD_REPAIR_DIRECTIVE retry); the repair loop applies to generate and one-session paths only. Backed by a regression test that asserts the gate runs exactly once and returns the typed error without retry. - Important 2 (retry budget): splits the one-session retry accounting so public-surface repair uses MAX_CONFORMANCE_ATTEMPTS (parity with the generate path) while test-churn keeps its tighter _MAX_TEST_CHURN_ATTEMPTS cap. Previously a single min() shared the smaller cap across both gates, cutting surface repair off at 2 attempts. Backed by two tests: one asserts surface attempts use the full budget, the other pins the churn cap at 2. Test coverage: - 10 new tests in tests/test_code_generator_main.py covering binding-kind detection (instance/staticmethod/classmethod/property flips) and YAML front-matter stripping (with unterminated-fence and empty-prompt edge cases). - 2 new tests in tests/test_sync_orchestration.py pinning the hard-failure contract of _verify_code_surface_after_write. - 2 new tests in tests/test_one_session_sync.py exercising the full MAX_CONFORMANCE_ATTEMPTS budget for surface repair and the _MAX_TEST_CHURN_ATTEMPTS cap for churn repair. Local pytest run: 570 passed, 0 failed across test_code_generator_main.py, test_sync_orchestration.py, test_one_session_sync.py, test_cmd_test_main.py. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…n gates Address three new blockers introduced by the iter-1 follow-up commit on PR #1015: * `_strip_yaml_front_matter` now tolerates a leading UTF-8 BOM, CRLF / mixed line endings, trailing whitespace on the fence line, and a closing fence at EOF (no trailing newline). The previous regex only matched `---\n...---\n`, so a Windows-saved prompt could leave `BREAKING-CHANGE:` metadata visible to the directive parser and silently opt out of the gates. * `_snapshot_public_signatures` now accumulates property accessor roles per class member before recording the final entry. Previously `@x.setter def x(self, value)` overwrote the `@property` getter in the signatures dict and was misclassified as `[instance] (value)`, so a rewrite that replaced the whole descriptor with a plain `def x(self, value)` produced an identical snapshot and bypassed the gate. The merged entry is now `[property:<sorted-roles>] <getter-signature>`, distinct from any plain-method snapshot. * Top-level public symbols now carry a kind prefix (`[function]`, `[async_function]`, `[class]`) so swapping a public class with a same-named function (or a sync function with `async def`) is detected even when the receiver-stripped parameter list happens to match. The class self-entry inside `_walk_class` mirrors this with `[class]`, applying transitively to nested classes. Adds 17 tests across `TestPublicSurfaceBindingKind` and `TestFrontMatterStripping` covering each new edge case (property / property+setter / setter-loss flips, class/function/async flips, CRLF / BOM / EOF-terminated / trailing-whitespace fences, and end-to-end verification that buried directives no longer opt out of the removal gate). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Top-level `assignment ↔ def/class` kind flips were asymmetric: the iter-2 fallback caught `def/class → assignment` (signature dropped, surface kept), but `assignment → def/class` left the before-side without a signature entry — the regenerated `def Foo()` looked like a new symbol, not a kind flip. `_snapshot_public_signatures` now also records public top-level `Assign` / `AnnAssign-with-value` targets as `[assignment]`, honoring `__all__` via the existing `_is_public_top_level` predicate. Both flip directions hit the primary `changed_set` diff path. Adds 4 regression tests in `TestPublicSurfaceBindingKind` covering both flip directions and the unchanged-assignment sanity check, plus an inline note in `_walk_class` acknowledging the known class-body-redefinition limitation flagged by codex as a debatable nit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>