docs(standards): add a test-contract gate to characterization-principles#840
Merged
Conversation
…iples The principles doc covered test correctness mechanics (constant-target, kill/relaunch, cycle labels) but nothing forced an author to confirm what a test is FOR or what PASS means before editing the 20+ matrix YAML arms. With that config surface churning constantly, a wrong intent assumption silently yields a test that measures the wrong thing. Add a "state this first" contract section: behavior-under-test, platform/device, boundary+reps, what-varies vs what's-held-constant, and the pass/fail DATA signal — quiz on blanks, then echo back the resolved config before any run. Test-authoring sibling of the data-contract gate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jonathaneoliver
added a commit
that referenced
this pull request
Jun 23, 2026
) ## What Adds `.claude/skills/test-author/SKILL.md` — turns the just-merged test-contract gate (#840) into a walked authoring procedure. ## Why The gate (in `characterization-principles.md` + an always-on memory) says *confirm intent + echo resolved config before any run*. This skill carries the **authoritative knob vocabulary** so the quiz is concrete and the echo-back is a filled spec, not prose — which matters because the matrix-YAML / server_behavior config surface churns constantly. ## Covers - **char-matrix YAML** — class config|fault; the `is.*` (client, cold-relaunch) / `proxy.*` (server, config-on-connect) knob reference from `internal/charmatrix/spec.go`; `axes:` cartesian vs `groups/compare` A/B; points at the worked-example YAMLs as living templates. - **server-behavior** — the baseline-plus-tolerance contract shape + `sb_common_test.go` helpers. Procedure: classify family/class → contract quiz → draft → **echo resolved config** → `--dry-run` / `-run` verify, before any run. ## Boundaries Defers correctness to `characterization-principles.md`, follows `CONVENTIONS.md`, and is explicitly **not** for running tests (`make characterize-*` / `harness char matrix`), the unattended loop (`sweep`), analysis (`forensics`), or Roku. Closes #841. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Adds a "The test contract — state this first" section to
.claude/standards/characterization-principles.md.Why
The doc already covers test correctness mechanics (constant-target rule, kill/relaunch boundary, cycle-label schema) — what makes a test analysable. It had nothing about intent capture: nothing forces an author to confirm what a test is for or what counts as pass before editing the 20+
matrix/*.yamlarms. With that config surface changing constantly, a wrong intent assumption silently produces a test that measures the wrong thing or passes for the wrong reason.The new section requires, before writing/modifying any test:
-is.flag.*/CHAR_*) before any run.It's the test-authoring sibling of the existing data-contract gate. Paired with an always-on memory so the behavior fires every session.
Follow-up (not in this PR): a
test-authorskill that operationalizes the gate with per-family templates (matrix YAML + server_behavior).🤖 Generated with Claude Code