feat: write semantic frontmatter on synthesized files#621
Merged
bao-byterover merged 3 commits intomainfrom May 9, 2026
Merged
feat: write semantic frontmatter on synthesized files#621bao-byterover merged 3 commits intomainfrom
bao-byterover merged 3 commits intomainfrom
Conversation
Synthesized files (created by `brv dream`'s synthesize operation) wrote only four frontmatter fields (confidence, sources, synthesized_at, type). The cogit backend therefore exposed empty title / short_description / tags in DtoV3MemoryCardResource and the web UI's card-mode display rendered empty preview slots. writeSynthesisFile now also emits the seven semantic fields that regular `brv save` files write (title, summary, tags, related, keywords, createdAt, updatedAt). The four synthesis markers stay at the top of the frontmatter in their existing on-disk order so re-generating an old synthesized file does not produce a mechanical reorder diff. The seven new fields are appended in the same order generateFrontmatter uses in markdown-writer.ts so the on-disk shape matches regular files. The synthesis LLM prompt and SynthesisCandidateSchema are extended so the model produces meaningful summary / tags / keywords as part of the synthesis JSON; related is left empty since synthesized files use `sources` for provenance. Forward-looking only — existing synthesized files keep their four-field shape until manually regenerated. `needsReview` gating (confidence < 0.7) is unchanged.
Synthesized files (`synthesize.ts`) and consolidate's `addFrontmatterFields` helper produced block-style array frontmatter (`tags:\n - a\n - b`) instead of the flow style (`tags: [a, b, c]`) that `markdown-writer.ts` uses for regular `brv curate` output. Result: three on-disk dialects in the same context tree — regular files vs synthesized files vs any file consolidate had rewritten. - synthesize.ts: pass `flowLevel: 1` to yamlDump so newly synthesized files emit flow-style arrays matching markdown-writer. - consolidate.ts addFrontmatterFields: drop both yamlDump sites from flowLevel: 2 to flowLevel: 1 so TEMPORAL_UPDATE / MERGE rewrites stop reflowing brackets into block style on every consolidate pass. New regression test asserts a TEMPORAL_UPDATE round-trip preserves flow-style arrays.
Contributor
|
Claude finished @RyanNg1403's task in 4m 14s —— View job Claude PR Review —
|
…ma and writer - SynthesisCandidateSchema: add max() bounds (summary 500, tags 8, keywords 15) so a misbehaving model cannot land oversized text or unbounded arrays directly in YAML frontmatter and on the card. Bounds sit slightly above the prompt's soft targets to avoid false rejects. - writeSynthesisFile: normalize tags to lowercase kebab-case in code rather than trusting the LLM to honor the prompt rule, so card chips and BM25 search see consistent labels. - synthesize.test.ts: pin flow-style array output with a `^tags: \[/m` regex assertion so a future flowLevel revert in the writer fails the synthesis test directly, not only the consolidate test.
Contributor
|
Claude finished @RyanNg1403's task in 2m 31s —— View job Claude PR Review (re-review after
|
| Prior comment | Status | Notes |
|---|---|---|
dream-response-schemas.ts — unbounded summary / tags / keywords |
Resolved | summary: z.string().max(500), tags: z.array().max(8), keywords: z.array().max(15). Bounds set slightly above the prompt's soft targets, with a comment explaining why. Three new schema tests pin the bounds. |
synthesize.ts — tags not normalized to kebab-case |
Resolved | writeSynthesisFile now runs candidate.tags.map((t) => t.toLowerCase().trim().replaceAll(/\s+/g, '-')).filter((t) => t.length > 0) before assignment. Empty-after-trim entries are dropped (nice, slightly above what was suggested). New test at synthesize.test.ts:242 exercises mixed-case + multi-word + padded inputs. |
synthesize.test.ts — to.include('tags:') silent on flow-style |
Resolved | Replaced with expect(content).to.match(/^tags: \[/m) and the same for keywords:. A future flowLevel revert in the writer now fails this test directly, not only the consolidate test. |
consolidate.ts:298 — consolidated_from flow-style note |
Acknowledged, no change | Non-blocking; PR description's evidence (live brv dream × visual check) covers the rendered shape. Acceptable. |
synthesize.ts:288 — praise on imperative key assignment |
N/A | No action needed. |
What's still good in the original PR
- Forward-only scope (pre-existing 4-field synthesized files left alone) is honest and stated in the PR body.
- Field order in
writeSynthesisFile(synthesize.ts:275–288) mirrorsmarkdown-writer.ts:185–192for the seven semantic fields, with synthesis markers preserved first to avoid mechanical reorder diffs on regeneration. Comment block makes both intents legible. flowLevel: 1is now consistent acrossmarkdown-writer.ts:194,consolidate.ts:298,consolidate.ts:308, andsynthesize.ts:289— the on-disk shape divergence is fully eliminated.isDescendantOfpath-traversal guard (synthesize.ts:247) and theaccess-then-skip collision check (synthesize.ts:252) are still in place; the new normalization step happens after both guards run.confidence < 0.7review gating preserved verbatim.- Sidecar boundary intact —
runtimeSignalStore.set(relativePath, createDefaultRuntimeSignals())is the only signals path; theexpect(content).to.not.include('maturity:')assertions lock it down.
Minor observations (non-blocking, would not block merge)
- Tag normalization is whitespace-only.
t.toLowerCase().trim().replaceAll(/\s+/g, '-')correctly handles'Auth Service'→'auth-service', but a tag like'Auth/Service'or'auth_service'would land on disk verbatim (just lowercased). The original review pointed at the common case and the fix lands that case; broader sanitization would be over-engineering. Flagging only. - No deduplication after normalization. If the model emits both
'Auth Service'and'auth-service', both land intags:. The schema's.max(8)catches array overflow but not duplicates. Probably not worth a[...new Set(...)]here since the LLM rarely produces near-duplicate tags, but worth knowing. - Out-of-scope follow-up (carried from prior review, still relevant):
executeMerge(consolidate.ts:503) writes onlyconsolidated_at/consolidated_fromif the LLM-suppliedmergedContentalready lacks frontmatter, which means merged files can have the same empty-card-preview class of bug as pre-PR synthesized files. Different code path, different bug, but the same UX failure mode — worth a separate issue.
Concerns considered and dismissed
- Schema permissiveness on
summary/ array sizes: now bounded. consolidated_fromflow-style reflow: PR author confirmed via livebrv dreamevidence; rendered shape is acceptable.needsReviewgate, sidecar boundary, path traversal: all preserved.
· Branch:feat/ENG-2602
cuongdo-byterover
approved these changes
May 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
brv dream) wrote only four marker fields (confidence,sources,synthesized_at,type). Cogit served emptytitle/short_description/tagsto the web UI's card-mode display, so every dream-generated file rendered with empty preview slots.writeSynthesisFilenow also emits the seven semantic fields regular curate output writes (title,summary,tags,related,keywords,createdAt,updatedAt).SynthesisCandidateSchemaare extended so the model produces meaningfulsummary/tags/keywordsas part of the synthesis JSON.synthesizeandconsolidate.addFrontmatterFieldsswitched toflowLevel: 1so all dream writers produce the same on-disk shape asmarkdown-writer.ts(regular curate output).needsReviewgating (confidence < 0.7) is untouched.Type of change
Scope (select all touched areas)
Linked issues
Root cause (bug fixes only, otherwise write
N/A)writeSynthesisFilewas written in isolation with only the four synthesis-specific markers, never matching the seven-field semantic shape thatmarkdown-writer.tsemits for regular curate output. Cogit's parser returns empty values for missing fields, so the web UI rendered empty previews.Test plan
test/unit/infra/dream/operations/synthesize.test.tstest/unit/infra/dream/operations/consolidate.test.tstest/unit/infra/dream/dream-response-schemas.test.tssummary/tags/keywordsand the JSON-schema example mirrorsSynthesisCandidateSchema.consolidate.tsTEMPORAL_UPDATE preserves flow-style arrays (no block-style reflow on rewrite).User-visible changes
Evidence
synthesize.test.tshad two new tests asserting the seven-field shape; failed for missing fields before, pass after the writer change.brv curate× 8 +brv dreamproduced 5 synthesized files with the new 11-field flow-style frontmatter; visual confirmation on the web UI shows full card-mode previews for the new files while older 4-field files still render empty (expected).Checklist
npm test— 7364 passing, 16 pending)npm run lintclean on changed files)npm run typecheck)npm run build)mainRisks and mitigations
summary,tags,keywords).