Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 40 additions & 5 deletions architecture.json
Original file line number Diff line number Diff line change
Expand Up @@ -2220,7 +2220,8 @@
"agentic_common_python.prompt",
"agentic_update_python.prompt",
"sync_determine_operation_python.prompt",
"operation_log_python.prompt"
"operation_log_python.prompt",
"drift_classifier_python.prompt"
],
"priority": 62,
"filename": "update_main_python.prompt",
Expand Down Expand Up @@ -2479,11 +2480,42 @@
"y": 2000
}
},
{
"reason": "Shared, pure drift classifier consumed by sync, update, ci-heal, change-preflight, and global scan paths.",
"description": "Pure side-effect-free Pydantic v2 classifier that takes stored fingerprints, current artifact hashes, resolved include/source-doc hashes, optional architecture/PRD metadata, optional git change set, and a caller scope, and returns a structured DriftVerdict (is_drift, canonical-ordered reasons, stale_artifacts, confidence, details). Git change set is an explicit input rather than CI-only correction logic. Caller adapters perform I/O; the classifier never reads the filesystem, runs git, or invokes the LLM.",
"dependencies": [],
"priority": 66,
"filename": "drift_classifier_python.prompt",
"filepath": "pdd/drift_classifier.py",
"tags": [
"module",
"python",
"sync",
"drift-detection"
],
"interface": {
"type": "module",
"module": {
"functions": [
{
"name": "classify_drift",
"signature": "(inputs: DriftInputs) -> DriftVerdict",
"returns": "DriftVerdict"
}
]
}
},
"position": {
"x": 10800,
"y": 2120
}
},
{
"reason": "Intelligently determines which sync operations to run.",
"description": "Analyzes prompt state, file timestamps, and previous results to decide which workflow steps are needed.",
"dependencies": [
"agentic_langtest_python.prompt"
"agentic_langtest_python.prompt",
"drift_classifier_python.prompt"
],
"priority": 67,
"filename": "sync_determine_operation_python.prompt",
Expand Down Expand Up @@ -5255,7 +5287,8 @@
"dependencies": [
"core/cloud_python.prompt",
"code_generator_main_python.prompt",
"agentic_sync_runner_python.prompt"
"agentic_sync_runner_python.prompt",
"drift_classifier_python.prompt"
],
"priority": 148,
"filename": "sync_main_python.prompt",
Expand Down Expand Up @@ -5782,7 +5815,8 @@
"description": "Orchestrates the 13-step agentic change workflow. Includes Step 8.5 (pre-flight drift heal) — detects prompts whose code has drifted and runs `pdd update` per module inside the worktree before Step 9 rewrites the prompts. Includes Step 10.5 (doc-sync contract verifier) — before Step 10, calls pdd.sync_order.discover_associated_documents to populate the LLM's associated_documents context, using the authoritative changed-file set so Step 9's worktree fallback path cannot bypass discovery when FILES_* markers are missing; after Step 10, enforces that every discovered doc appears in exactly one of ASSOCIATED_DOCS_MODIFIED / ASSOCIATED_DOCS_CONFLICTS / ASSOCIATED_DOCS_UNCHANGED. Silent drops and bucket overlaps are appended as ORCHESTRATOR_POSTCHECK_WARNINGS and routed to Step 11 via step10_output; PDD_STRICT_DOC_SYNC=1 turns violations into hard workflow aborts (issue #739).",
"dependencies": [
"architecture_sync_python.prompt",
"agentic_common_python.prompt"
"agentic_common_python.prompt",
"drift_classifier_python.prompt"
],
"priority": 168,
"filename": "agentic_change_orchestrator_python.prompt",
Expand Down Expand Up @@ -7952,7 +7986,8 @@
"auto_include_python.prompt",
"agentic_langtest_python.prompt",
"architecture_sync_python.prompt",
"metadata_sync_python.prompt"
"metadata_sync_python.prompt",
"drift_classifier_python.prompt"
],
"priority": 80,
"filename": "ci_drift_heal_python.prompt",
Expand Down
162 changes: 162 additions & 0 deletions docs/drift_classifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
# Shared Drift Classifier (design)

Status: design / first implementation.
Tracking: promptdriven/pdd#884 (relates to #860, promptdriven/pdd#727, #1369, #1370).

## Why

PDD classifies artifact drift in several places today:

- `sync_determine_operation()` fingerprint state machine
- `pdd update --all` code-hash / include-dep / git detection
- CI auto-heal detection plus git-diff reclassification
- `pdd change` preflight drift healing
- no-arg `pdd sync` Tier 1 global scan

These paths can legitimately run different repair operations, but the
*stale / artifact-drift classification itself* should be shared so the answer
to "is this repo in sync?" is not command-dependent.

## Scope

This module unifies **classification only**. Repair-plan selection
(`generate` vs `update` vs `auto-deps` vs `ci heal`) remains
command-specific in this PR. Repair unification is tracked separately.

## Public API (sketch)

```python
# pdd/drift_classifier.py

from enum import Enum
from typing import Any, Literal, Mapping, Optional, Sequence
from pydantic import BaseModel

class DriftReason(str, Enum):
IN_SYNC = "in_sync"
PROMPT_CHANGED = "prompt_changed"
CODE_CHANGED = "code_changed"
EXAMPLE_CHANGED = "example_changed"
TEST_CHANGED = "test_changed"
INCLUDE_DEP_CHANGED = "include_dep_changed"
SOURCE_DOC_CHANGED = "source_doc_changed"
ARCHITECTURE_CHANGED = "architecture_changed"
MISSING_ARTIFACT = "missing_artifact"
FINGERPRINT_MISSING = "fingerprint_missing"
GIT_CHANGE_OVERRIDE = "git_change_override"
UNKNOWN = "unknown"

class ArtifactHashes(BaseModel):
prompt: Optional[str] = None
code: Optional[str] = None
example: Optional[str] = None
tests: Mapping[str, str] = {}
include_deps: Mapping[str, str] = {}
source_docs: Mapping[str, str] = {}

class GitChangeSet(BaseModel):
changed_paths: Sequence[str] = ()
base_ref: Optional[str] = None
head_ref: Optional[str] = None

class DriftInputs(BaseModel):
target: str # module basename or prompt path
scope: Literal["sync", "update", "change", "ci", "scan"] # caller identity; telemetry-only, never branches the verdict
stored_fingerprint: Optional[Mapping[str, Any]] = None
stored_run_report: Optional[Mapping[str, Any]] = None
current_hashes: ArtifactHashes
architecture_metadata: Optional[Mapping[str, Any]] = None
prd_metadata: Optional[Mapping[str, Any]] = None
git_change_set: Optional[GitChangeSet] = None

class DriftVerdict(BaseModel):
is_drift: bool
reasons: Sequence[DriftReason] # canonical-ordered (sorted by enum value)
stale_artifacts: Sequence[str] # canonical-ordered subset of {"prompt","code","example","tests","include_deps","source_docs","architecture"}
confidence: Literal["high", "medium", "low"]
details: Mapping[str, Any] = {} # diagnostic, not behavior-bearing

def classify_drift(inputs: DriftInputs) -> DriftVerdict: ...
```

The function is **pure**: no filesystem, network, or git calls. Adapters
at each call site collect hashes / fingerprints / diffs and pass them in.

### Canonical ordering contract

Both `reasons` and `stale_artifacts` are returned in a stable canonical
order so test diffs and telemetry are reproducible across Python dict
iteration:

- `reasons` is sorted by `DriftReason` enum value.
- `stale_artifacts` is sorted lexicographically over its closed vocabulary
(`"architecture"`, `"code"`, `"example"`, `"include_deps"`, `"prompt"`,
`"source_docs"`, `"tests"`).

Callers MUST NOT rely on any other ordering.

### Confidence semantics

`confidence` is a closed three-value categorical (`"high" | "medium" | "low"`),
not a numeric score:

- **`high`** — full `stored_fingerprint` present, current hashes computed for
every artifact the fingerprint claims, and no `UNKNOWN` reason emitted.
- **`medium`** — partial inputs (e.g. fingerprint present but optional
metadata missing, or only `git_change_set` drives the verdict, or
`mtime_skew` downgraded a `high` verdict).
- **`low`** — `stored_fingerprint is None` and no `git_change_set` to
anchor the decision, or any `UNKNOWN` reason emitted.

This keeps callers from synthesizing thresholds on a float and forces them
to handle the three cases explicitly.

### Architecture and PRD metadata

`DriftInputs.architecture_metadata` and `DriftInputs.prd_metadata` are
both optional, both opaque `Mapping[str, Any]`, and both feed the
`ARCHITECTURE_CHANGED` reason / `"architecture"` stale-artifact slot.
Either may be `None` for callers that do not track that signal. When
both are present, either differing from its stored snapshot is sufficient
to mark `"architecture"` stale; the classifier does not distinguish
architecture-only drift from PRD-only drift in the verdict (callers can
inspect `details` if they need that split).

## CI git-diff as explicit input

Today CI auto-heal applies a post-hoc git-diff reclassification step.
Under the shared classifier, the git change set is just a field on
`DriftInputs` (`git_change_set`). Any caller (not only CI) may supply it,
and the classifier treats a path appearing in the diff as a first-class
reason to mark its owning artifact stale.

## Call sites

| Path | Today | After this PR |
|------|-------|---------------|
| `sync_determine_operation()` | inline fingerprint state machine | builds `DriftInputs`, calls `classify_drift`, then maps verdict → operation |
| `pdd update --all` | inline code-hash + include-dep + git checks | builds `DriftInputs` (with `git_change_set`), calls `classify_drift` |
| CI auto-heal detection | inline + git-diff correction | passes `git_change_set` into `classify_drift` |
| `pdd change` preflight | inline drift healing trigger | calls `classify_drift` to decide whether to heal |
| no-arg `pdd sync` Tier 1 scan | inline per-module loop | calls `classify_drift` per module |

Each call site keeps its own *repair* logic. Only the classification
step moves behind `classify_drift`.

## Behavior preservation

The first PR ships characterization tests that pin current outputs for:

- `sync_determine_operation` (full state-machine matrix)
- CI auto-heal detection + git-diff correction
- `pdd change` preflight drift detection

These tests pass against both the old inline logic and the new
`classify_drift`, so the refactor is observationally identical.

## Non-goals (first PR)

- Unifying repair plans across commands.
- Changing exit codes or CLI output.
- Replacing `failure_classification` (runtime test/build failures are a
different axis from artifact drift).
3 changes: 3 additions & 0 deletions pdd/prompts/agentic_change_orchestrator_python.prompt
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

<pdd-dependency>architecture_sync_python.prompt</pdd-dependency>
<pdd-dependency>agentic_common_python.prompt</pdd-dependency>
<pdd-dependency>drift_classifier_python.prompt</pdd-dependency>

% You are an expert Python engineer. Your goal is to write the `pdd/agentic_change_orchestrator.py` module.

Expand Down Expand Up @@ -293,6 +294,8 @@ Between Step 8 (Analyze Prompt Changes) and Step 9 (Implement), the orchestrator

4. Heal failures are **non-fatal** except metadata-finalization failures: the workflow continues when ordinary subprocess-level module heals fail, and operators inspect `state["preflight_failed_heal_modules"]` for visibility. If a preflight heal subprocess reports `metadata finalization failed`, `metadata staging verification failed`, or the `pdd update --sync-metadata` `[metadata-sync]` error prefix, raise a hard failure instead of continuing, because issue #1006 requires preflight drift-heal to fail loudly rather than publish half-synced metadata.

5. **Shared Drift Classifier (issue #884)**: The stale-artifact decision driving preflight heal (`decision.operation == "update"` from `sync_determine_operation`) is the same classification concern shared with `update_main`, `ci_drift_heal`, and the no-arg `pdd sync` global scan. `_preflight_drift_heal` MUST obtain its per-module verdict via `pdd.drift_classifier.classify_drift` (passing `scope="change"` in `DriftInputs`) and map a drift verdict whose `stale_artifacts` includes `"prompt"` to the existing `pdd update` heal path. Existing healing logic (subprocess dispatch, metadata-finalization escalation, worktree containment) is unchanged; only the "is this module stale?" decision is shared. Characterization tests pin the current preflight-detection matrix. See `docs/drift_classifier.md`.

% Step 10.5 — Associated-Document Contract Verifier (Issue #739)

Step 10 now has two responsibilities: (a) update `architecture.json` metadata as before, and (b) sync any associated documentation files reachable from the modified prompts' `<include>` / `<include-many>` graph. The verifier below enforces the contract deterministically so an LLM omission cannot silently ship a doc-out-of-sync PR.
Expand Down
5 changes: 4 additions & 1 deletion pdd/prompts/ci_drift_heal_python.prompt
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
<pdd-dependency>agentic_langtest_python.prompt</pdd-dependency>
<pdd-dependency>architecture_sync_python.prompt</pdd-dependency>
<pdd-dependency>metadata_sync_python.prompt</pdd-dependency>
<pdd-dependency>drift_classifier_python.prompt</pdd-dependency>

% You are an expert Python engineer. Your goal is to write a standalone CI script that detects prompt/example drift across all PDD modules and auto-heals drifted modules.

Expand Down Expand Up @@ -92,7 +93,9 @@ A standalone CI script (`pdd/ci_drift_heal.py`) that orchestrates drift detectio

20. **Output:** Rich console with drift summary table before healing. Per-module heal result and cost. Final summary of healed/failed/skipped counts.

21. **Finalization log surface:** On every successful return from `_run_metadata_sync_safe`, emit exactly one log line of the form `metadata finalized for <basename>: <meta_path>` where `<meta_path>` is the absolute or repo-relative `.pdd/meta/<safe_basename>_<language>.json` path. On finalization failure, emit `metadata finalization failed for <basename>: <reason>` as required by Requirement 6. On metadata-staging-verification failure, emit `metadata staging verification failed: missing <meta_path>` as required by Requirement 14. These three log surfaces are stable assertion points for regression tests in `tests/test_ci_drift_heal.py` covering both the success path (metadata committed alongside prompt/code) and the failure path (workflow exits non-zero with explicit error).
21. **Shared Drift Classifier (issue #884)**: The detection phase (Requirement 2) and the git-based reclassification step (Requirement 3) are classification concerns shared with `sync_determine_operation`, `update_main`, `pdd change` preflight, and the no-arg `pdd sync` global scan. The CI adapter MUST build a `DriftInputs` (with `scope="ci"`, populating `git_change_set` from the `--diff-base` fetch so the diff is an explicit classifier input rather than CI-only correction logic) and call `pdd.drift_classifier.classify_drift` to produce the per-module verdict. The CI-specific reclassification rules (clean-CI fingerprint-absent handling) are encoded by passing the git change set through `DriftInputs` and consuming `DriftVerdict.reasons`; this module's repair dispatch in Requirements 6–17 remains unchanged. Behavior MUST be observationally identical for the existing detection matrix; characterization tests pin the current outputs. See `docs/drift_classifier.md`.

22. **Finalization log surface:** On every successful return from `_run_metadata_sync_safe`, emit exactly one log line of the form `metadata finalized for <basename>: <meta_path>` where `<meta_path>` is the absolute or repo-relative `.pdd/meta/<safe_basename>_<language>.json` path. On finalization failure, emit `metadata finalization failed for <basename>: <reason>` as required by Requirement 6. On metadata-staging-verification failure, emit `metadata staging verification failed: missing <meta_path>` as required by Requirement 14. These three log surfaces are stable assertion points for regression tests in `tests/test_ci_drift_heal.py` covering both the success path (metadata committed alongside prompt/code) and the failure path (workflow exits non-zero with explicit error).

% Dependencies
<pdd.sync_determine_operation>
Expand Down
Loading
Loading