Skip to content

regression-canary: break PascalCase FHIR citation parsing#6

Open
tylerxia8 wants to merge 1 commit into
mainfrom
regression-canary-citation-regex
Open

regression-canary: break PascalCase FHIR citation parsing#6
tylerxia8 wants to merge 1 commit into
mainfrom
regression-canary-citation-regex

Conversation

@tylerxia8

Copy link
Copy Markdown
Owner

Deliberate regression to demonstrate the W2 eval gate catches real breaks per the PRD's hard-gate requirement: "graders will introduce a small regression and confirm your CI gate fails."

Change: structural verifier's CITATION_RE was [A-Za-z_]+ (matches both lowercase OpenEMR tables and PascalCase FHIR resource types). This PR drops the uppercase class — only [a-z_]+ — so PascalCase resources stop parsing as citations.

Expected eval-gate effect:

  • extraction_lab citations are DocumentReference#... (PascalCase) -> fail
  • citation_validity_meds wants MedicationRequest#... -> fail
  • evidence wants Guideline#... -> fail
  • golden UC-1 briefing wants Condition#, Encounter#, etc. -> fail
  • All ~10pp+ drops -> well past the 5pp regression-delta -> CI fails

DO NOT MERGE. This PR is the standing demonstration that the W2 eval gate has teeth — it sits open on GitHub as concrete proof of the regression-detection property the PRD requires.

Deliberate regression to demonstrate the W2 eval gate catches real
breaks per the PRD's hard-gate requirement: "graders will introduce
a small regression and confirm your CI gate fails."

Change: structural verifier's CITATION_RE was [A-Za-z_]+ (matches
both lowercase OpenEMR tables and PascalCase FHIR resource types).
This PR drops the uppercase class — only [a-z_]+ — so PascalCase
resources stop parsing as citations.

Expected eval-gate effect:
- extraction_lab citations are DocumentReference#... (PascalCase)
  -> fail
- citation_validity_meds wants MedicationRequest#... -> fail
- evidence wants Guideline#... -> fail
- golden UC-1 briefing wants Condition#, Encounter#, etc. -> fail
- All ~10pp+ drops -> well past the 5pp regression-delta -> CI fails

DO NOT MERGE. This PR is the standing demonstration that the W2
eval gate has teeth — it sits open on GitHub as concrete proof
of the regression-detection property the PRD requires.
tylerxia8 added a commit that referenced this pull request May 8, 2026
- .env.example: add VOYAGE_API_KEY, COHERE_API_KEY, REDIS_URL.
  Used by the hybrid-RAG layer + per-patient context cache;
  previously only documented in agent-service/README.md and
  W2_ARCHITECTURE.md, not in the root .env.example a setup
  reader sees first.

- .gitignore: ignore .oauth-creds-*.txt files generated by
  the documented OAuth recovery procedure.

- AUDIT.md §1.5: drop the stale "ARCHITECTURE §11 (Sunday)
  tracks completion" line; document the active volume
  mitigation (mounted at sites/default/documents/, seeded
  from /opt/openemr-documents-template/ on first boot) and
  add a 4-step recovery procedure for the key-mismatch
  failure mode encountered 2026-05-08 — re-register OAuth
  client, enable, swap env vars, verify.

- .github/workflows/eval-gate.yml: pre-flight ping
  \${AGENT_URL}/healthz before the 19-min eval run so a
  Railway hiccup fails fast with "Staging agent unreachable"
  instead of looking like a real regression catch. Also add
  a baseline-drift guard that emits a ::warning:: annotation
  whenever a PR modifies baseline.json, so silent re-locks
  against a regressed agent are surfaced for reviewer ack.

- agent-service/evals/w2/baseline.json: self-documenting
  _meta block — purpose, lock timestamp, and rerock checklist
  referencing the regression-canary PR #6 and the unit tests
  in tests/test_eval_runner.py. The runner only reads
  category_rates/rubric_rates, so the new key is ignored;
  gate logic unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant