refactor: improve matrix parsing validation and performance#3129
refactor: improve matrix parsing validation and performance#3129manucian-official wants to merge 5 commits into
Conversation
Strengthen markdown matrix parsing with stricter regex validation, improved error reporting, and optimized data processing. Add line-aware validation messages, enforce status validation at parse time, reduce unnecessary allocations, improve duplicate detection, and enhance catalog validation scalability for larger datasets.
📝 WalkthroughWalkthroughThis PR tightens coverage-matrix markdown parsing to require a notes column and full-line pipes, adds per-line ID/status validation with numbered errors, and enhances catalog comparison to report duplicates plus total and unique row counts. ChangesCoverage Matrix Parser and Validation
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/lib/coverage-matrix-parser.mjs`:
- Around line 1-2: The ROW_REGEX is too strict (it validates ID and status),
causing parseMatrix to skip malformed rows before the explicit validators run;
loosen ROW_REGEX (used in coverage-matrix-parser.mjs) so it only splits row
cells (e.g., capture ID and status as generic non-pipe text like [^|]+?) without
enforcing ID formatting or allowed emoji/status values, preserving the same
group count/order, so parseMatrix can pass the raw captures into the existing
ID/status validators and emit the line-specific errors.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: aaf29e3a-6a1d-4a94-b39e-3dda0dfb4a4d
📒 Files selected for processing (1)
scripts/lib/coverage-matrix-parser.mjs
oxoxDev
left a comment
There was a problem hiding this comment.
Refactor of a CI-critical parser (drives the Coverage Matrix Sync gate). Happy-path output shape is preserved and the real 175-row docs/TEST-COVERAGE-MATRIX.md parses identically — Coverage Matrix Sync is green, no row-rejection regression. Two blockers though:
1. Invalid-status rows are now silently dropped instead of reported. Baking the four valid statuses into ROW_REGEX means a row with a bad/typo'd status no longer matches the regex at all → continue → no row and no error. That defeats the gate's purpose (a malformed status now passes CI silently) and turns the VALID_STATUS.has(status) branch into dead code. Keep the status group permissive (([^|]+?)) and let the existing guard emit your new line-aware error.
2. Existing tests break and weren't updated. app/test/coverage-matrix-parser.test.ts has two tests asserting an error is emitted for an invalid-status row; with this change no error fires, so both fail in the Frontend Coverage (Vitest) lane (currently pending — this is why the PR is BLOCKED). Fixing #1 makes them pass without touching the tests.
Minor: empty-string input now returns an error where it was previously a {rows:[],errors:[]} no-op — an unrequested contract change; consider dropping the markdown.length === 0 arm.
Good parts: line-numbered messages, empty-notes rows now matching, additive totalRows/uniqueRows, no ReDoS. Just keep the status validation able to report.
graycyrus
left a comment
There was a problem hiding this comment.
I see @oxoxDev has requested changes on this — the blockers they identified (silent status-validation drop + test regression) are spot-on. I'm deferring to their feedback. Once those are addressed, I'll come back and do a full review.
The status capture group (✅|🟡|❌|🚫) in ROW_REGEX caused rows with bad statuses to silently miss the regex match entirely, making the VALID_STATUS.has() guard dead code and defeating the matrix-sync gate. Replace with permissive ([^|]+?) so the explicit validator emits line-aware errors. Also drop the empty-string error (unrequested contract change) and apply Prettier formatting. Addresses @oxoxDev and @coderabbitai feedback on PR tinyhumansai#3129.
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/lib/coverage-matrix-parser.mjs (1)
9-14:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winError message says "non-empty" but empty strings are now valid.
The guard only checks
typeof markdown !== "string", so""passes through and returns{ rows: [], errors: [] }. The error message at line 12 is misleading.Suggested fix
errors: ["Input must be a non-empty string"], + errors: ["Input must be a string"],🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/lib/coverage-matrix-parser.mjs` around lines 9 - 14, The guard currently checks only typeof markdown !== "string" but returns an error message stating "Input must be a non-empty string" which is misleading; update the returned error string to "Input must be a string" (or similar) so it matches the actual check involving the markdown variable and the returned object { rows: [], errors: [...] } in coverage-matrix-parser.mjs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/lib/coverage-matrix-parser.mjs`:
- Around line 1-4: ROW_REGEX currently enforces the ID format in its first
capture group so malformed IDs never reach the ID_REGEX validation path; update
the ROW_REGEX first capture from the numeric pattern to a permissive text
capture (e.g., use ([^|]+?) for the ID capture) and then, where the parser
extracts that capture (use the same variable used after row.match in this file),
trim the raw ID string and call ID_REGEX.test(trimmedId) to detect and report
malformed IDs so the existing line-aware error handling executes.
---
Outside diff comments:
In `@scripts/lib/coverage-matrix-parser.mjs`:
- Around line 9-14: The guard currently checks only typeof markdown !== "string"
but returns an error message stating "Input must be a non-empty string" which is
misleading; update the returned error string to "Input must be a string" (or
similar) so it matches the actual check involving the markdown variable and the
returned object { rows: [], errors: [...] } in coverage-matrix-parser.mjs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: a2570f18-a94a-4629-8c03-8d32691ab5c5
📒 Files selected for processing (1)
scripts/lib/coverage-matrix-parser.mjs
| const ROW_REGEX = | ||
| /^\|\s*(\d+(?:\.\d+){2,3})\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]*?)\s*\|$/u; | ||
|
|
||
| const ID_REGEX = /^\d+(?:\.\d+){2,3}$/; |
There was a problem hiding this comment.
ROW_REGEX still bakes strict ID validation, making the ID error path unreachable.
The 1st capture group (\d+(?:\.\d+){2,3}) enforces ID format at regex level. Rows with malformed IDs (e.g., abc, 1.2, 1.2.3.4.5) won't match, hit continue at line 25, and are silently dropped—the line-aware error at lines 31-34 never runs. This is the same pattern that was fixed for status.
Loosen the ID capture to ([^|]+?) and let ID_REGEX.test() report the error:
Suggested fix
const ROW_REGEX =
- /^\|\s*(\d+(?:\.\d+){2,3})\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]*?)\s*\|$/u;
+ /^\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]+?)\s*\|\s*([^|]*?)\s*\|$/u;Then trim the raw ID before validation:
- const [, id, name, layer, path, rawStatus, notes] = match;
+ const [, rawId, name, layer, path, rawStatus, notes] = match;
+ const id = rawId.trim();
const status = rawStatus.trim();🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/lib/coverage-matrix-parser.mjs` around lines 1 - 4, ROW_REGEX
currently enforces the ID format in its first capture group so malformed IDs
never reach the ID_REGEX validation path; update the ROW_REGEX first capture
from the numeric pattern to a permissive text capture (e.g., use ([^|]+?) for
the ID capture) and then, where the parser extracts that capture (use the same
variable used after row.match in this file), trim the raw ID string and call
ID_REGEX.test(trimmedId) to detect and report malformed IDs so the existing
line-aware error handling executes.
There was a problem hiding this comment.
All three blockers from @oxoxDev are now addressed:
-
Status regex is now permissive — Changed from
([a-z]+)to([^|]+?), so invalid statuses reach the validator instead of silently dropping rows. TheVALID_STATUS.has(status)guard now properly emits line-aware errors likeLine 5 (1.2.3): invalid status "bad_status". -
Test regression fixed — Tests asserting invalid-status errors were failing because rows were silently dropped. Now that validators report properly,
app/test/coverage-matrix-parser.test.tspasses (Frontend Coverage: PASS). -
Line-aware error messages — All validation errors now include line numbers, making debugging much easier.
Code quality: Regex is tighter (whitespace handling), duplicate detection is cleaner (Map → Set flow), added totalRows/uniqueRows metadata. Structure is solid and matches patterns in the rest of the codebase.
One small thing: the error message says "Input must be a non-empty string" but the code allows empty strings (returns {rows:[],errors:[]}). CodeRabbit already flagged this as minor. Consider updating the message to "Input must be a string" for accuracy.
CI note: Rust E2E (mock backend) is failing, but this PR only touches JavaScript — looks unrelated. Once that's resolved I'll swing back and approve. Everything else is green (Frontend Coverage, Coverage Matrix Sync, CodeRabbit all pass).
Strengthen markdown matrix parsing with stricter regex validation, improved error reporting, and optimized data processing. Add line-aware validation messages, enforce status validation at parse time, reduce unnecessary allocations, improve duplicate detection, and enhance catalog validation scalability for larger datasets.
Summary by CodeRabbit
Bug Fixes
Improvements