feat: write semantic frontmatter on synthesized files by RyanNg1403 · Pull Request #621 · campfirein/byterover-cli

RyanNg1403 · 2026-05-08T10:04:21Z

Summary

Problem: Synthesized files (from brv dream) wrote only four marker fields (confidence, sources, synthesized_at, type). Cogit served empty title / short_description / tags to the web UI's card-mode display, so every dream-generated file rendered with empty preview slots.
Why it matters: Card-mode is the primary browsing view for context files; synthesized cards were unreadable.
What changed:
- writeSynthesisFile now also emits the seven semantic fields regular curate output writes (title, summary, tags, related, keywords, createdAt, updatedAt).
- The synthesis LLM prompt and SynthesisCandidateSchema are extended so the model produces meaningful summary / tags / keywords as part of the synthesis JSON.
- YAML serialization in synthesize and consolidate.addFrontmatterFields switched to flowLevel: 1 so all dream writers produce the same on-disk shape as markdown-writer.ts (regular curate output).
What did NOT change:
- Pre-existing synthesized files keep their four-field shape until manually regenerated (out of scope; can be a separate migration).
- needsReview gating (confidence < 0.7) is untouched.
- The sidecar runtime-signals path is untouched; ranking data still lives outside markdown frontmatter.

Type of change

Bug fix
New feature

Scope (select all touched areas)

Agent / Tools
Server / Daemon

Linked issues

Closes # N/A (linkage via branch name + commit subjects)
Related # N/A

Root cause (bug fixes only, otherwise write `N/A`)

Root cause: writeSynthesisFile was written in isolation with only the four synthesis-specific markers, never matching the seven-field semantic shape that markdown-writer.ts emits for regular curate output. Cogit's parser returns empty values for missing fields, so the web UI rendered empty previews.
Why this was not caught earlier: card-mode display in the web UI exposed the gap; the divergence pre-dated card-mode and went unnoticed because raw markdown files looked plausible on disk.

Test plan

Coverage added:
- Unit test
- Manual verification only (live curate + dream + push to remote, visual card-mode check)
Test file(s):
- test/unit/infra/dream/operations/synthesize.test.ts
- test/unit/infra/dream/operations/consolidate.test.ts
- test/unit/infra/dream/dream-response-schemas.test.ts
Key scenarios covered:
- Synthesized file emits all seven semantic fields plus the four synthesis markers.
- Synthesized frontmatter is parseable as the regular semantic shape (round-trip safe).
- LLM prompt instructs the model to produce summary / tags / keywords and the JSON-schema example mirrors SynthesisCandidateSchema.
- consolidate.ts TEMPORAL_UPDATE preserves flow-style arrays (no block-style reflow on rewrite).

User-visible changes

New synthesized files render with full title / summary excerpt / tag chips / "N minutes ago" in the web UI's card-mode display.
Pre-existing synthesized files keep their old card appearance (empty preview) until manually regenerated.

Evidence

Failing test/log before + passing after — synthesize.test.ts had two new tests asserting the seven-field shape; failed for missing fields before, pass after the writer change.
Trace/log snippets — live brv curate × 8 + brv dream produced 5 synthesized files with the new 11-field flow-style frontmatter; visual confirmation on the web UI shows full card-mode previews for the new files while older 4-field files still render empty (expected).

Checklist

Tests added or updated and passing (npm test — 7364 passing, 16 pending)
Lint passes (npm run lint clean on changed files)
Type check passes (npm run typecheck)
Build succeeds (npm run build)
Commits follow Conventional Commits format
Documentation updated (N/A — no doc surfaces touched)
No breaking changes
Branch is up to date with main

Risks and mitigations

Risk: Existing synthesized files keep their old 4-field shape and continue to render with empty card-mode previews until manually regenerated.
- Mitigation: Forward-looking only by design. A separate migration can scrub or regenerate stale entries.
Risk: Slight LLM cost increase from the three new required JSON output fields (summary, tags, keywords).
- Mitigation: ~50–100 extra output tokens per synthesis candidate; small relative to the UX improvement.

Synthesized files (created by `brv dream`'s synthesize operation) wrote only four frontmatter fields (confidence, sources, synthesized_at, type). The cogit backend therefore exposed empty title / short_description / tags in DtoV3MemoryCardResource and the web UI's card-mode display rendered empty preview slots. writeSynthesisFile now also emits the seven semantic fields that regular `brv save` files write (title, summary, tags, related, keywords, createdAt, updatedAt). The four synthesis markers stay at the top of the frontmatter in their existing on-disk order so re-generating an old synthesized file does not produce a mechanical reorder diff. The seven new fields are appended in the same order generateFrontmatter uses in markdown-writer.ts so the on-disk shape matches regular files. The synthesis LLM prompt and SynthesisCandidateSchema are extended so the model produces meaningful summary / tags / keywords as part of the synthesis JSON; related is left empty since synthesized files use `sources` for provenance. Forward-looking only — existing synthesized files keep their four-field shape until manually regenerated. `needsReview` gating (confidence < 0.7) is unchanged.

Synthesized files (`synthesize.ts`) and consolidate's `addFrontmatterFields` helper produced block-style array frontmatter (`tags:\n - a\n - b`) instead of the flow style (`tags: [a, b, c]`) that `markdown-writer.ts` uses for regular `brv curate` output. Result: three on-disk dialects in the same context tree — regular files vs synthesized files vs any file consolidate had rewritten. - synthesize.ts: pass `flowLevel: 1` to yamlDump so newly synthesized files emit flow-style arrays matching markdown-writer. - consolidate.ts addFrontmatterFields: drop both yamlDump sites from flowLevel: 2 to flowLevel: 1 so TEMPORAL_UPDATE / MERGE rewrites stop reflowing brackets into block style on every consolidate pass. New regression test asserts a TEMPORAL_UPDATE round-trip preserves flow-style arrays.

github-actions · 2026-05-08T10:04:37Z

Claude finished @RyanNg1403's task in 4m 14s —— View job

Claude PR Review — `feat: write semantic frontmatter on synthesized files`

Read changed files (synthesize.ts, consolidate.ts, dream-response-schemas.ts)
Read corresponding test files
Cross-check with sibling code (markdown-writer.ts field order + flowLevel: 1)
Provide inline review comments (5 posted)
Post summary review with score

Summary

Score: 5 / 5 — clean, well-scoped, well-tested fix. Recommend merge.

What this PR does well

Correctly diagnosed the root cause. writeSynthesisFile was emitting only the four synthesis markers; cogit's parser maps the seven semantic fields (title, summary, tags, related, keywords, createdAt, updatedAt) into DtoV3MemoryCardResource, so missing-field-becomes-empty-card was deterministic.
Field order is load-bearing and the code documents why (synthesize.ts 261–268). Synthesis markers first preserves the historical shape for any future regenerator; the seven semantic fields below mirror markdown-writer.ts:185–192. Insertion order via imperative property assignment + sortKeys: false is the right tool.
flowLevel: 1 alignment is a real consistency fix. Pre-PR, writeSynthesisFile had no flowLevel option (default -1 → block-style arrays), and consolidate.addFrontmatterFields used flowLevel: 2 — both diverged from markdown-writer.ts's flowLevel: 1. After this PR, every dream writer emits the same on-disk shape.
Tests cover the contract, not just the implementation. The "parseable as the regular semantic shape" test (yamlLoad round-trip + per-property type checks) and the "preserves flow-style arrays" test (regex rejecting block-style reflow) both lock down behavior cogit and the web UI depend on.
Forward-only scope is honest. PR description correctly flags that pre-existing 4-field synthesized files keep their old shape until manually regenerated, and that this is by design.

Inline comments (all non-blocking)

dream-response-schemas.ts:33 — schema is permissive on summary length and tags/keywords count. Optional .max() constraints would bound LLM misbehavior.
consolidate.ts:298 — flowLevel 2 → 1 also flips consolidated_from: Array<{...}> from block-style array of flow objects to single-line flow. Worth a visual sanity check on rendered consolidation metadata, but parseable.
synthesize.ts:282 — praise for the deliberate insertion-order pattern + comment block.
synthesize.ts:372 — the prompt asks for "lowercase, kebab-case" tags but nothing normalizes. A one-liner in writeSynthesisFile would close the gap without trusting the LLM.
synthesize.test.ts:217 — to.include('tags:') doesn't assert flow style. The companion tests cover it in aggregate, but a single match(/^tags: \[/m) here would catch a flowLevel revert sooner.

Concerns considered and dismissed

Existing-file collision check (synthesize.ts:252): the access-then-skip path means old 4-field files can't be silently rewritten with a different format — good, that matches the PR's forward-only intent.
Path traversal (synthesize.ts:247): isDescendantOf guard already prevents LLM-supplied placement: "../../etc". Untouched and still correct.
needsReview gating: confidence < 0.7 threshold preserved verbatim.
Sidecar boundary: runtime-signals still flow into the sidecar via runtimeSignalStore.set(relativePath, createDefaultRuntimeSignals()); nothing leaks into markdown frontmatter. Tests assert this (expect(content).to.not.include('maturity:')).

Out-of-scope but worth a follow-up issue

executeMerge writes consolidation frontmatter via addFrontmatterFields, but if the LLM-supplied mergedContent lacks frontmatter, the resulting file gets only consolidated_at/consolidated_from — same empty-card-preview class of bug for merged files. Different code path, different bug, but related.
· Branch: feat/ENG-2602

…ma and writer - SynthesisCandidateSchema: add max() bounds (summary 500, tags 8, keywords 15) so a misbehaving model cannot land oversized text or unbounded arrays directly in YAML frontmatter and on the card. Bounds sit slightly above the prompt's soft targets to avoid false rejects. - writeSynthesisFile: normalize tags to lowercase kebab-case in code rather than trusting the LLM to honor the prompt rule, so card chips and BM25 search see consistent labels. - synthesize.test.ts: pin flow-style array output with a `^tags: \[/m` regex assertion so a future flowLevel revert in the writer fails the synthesis test directly, not only the consolidate test.

github-actions · 2026-05-08T10:18:44Z

Claude finished @RyanNg1403's task in 2m 31s —— View job

Claude PR Review (re-review after `233ac51`) — `feat: write semantic frontmatter on synthesized files`

Read changed files at current HEAD
Verify which previous review comments were addressed in commit 233ac51
Cross-check sibling code (markdown-writer.ts field order + flowLevel: 1)
Provide inline review comments
Post summary review with score

Summary

Score: 5 / 5 — follow-up commit 233ac51 directly addressed the actionable items from the prior review without scope creep. Recommend merge.

Disposition of prior review comments (all addressed cleanly)

Prior comment	Status	Notes
`dream-response-schemas.ts` — unbounded `summary` / `tags` / `keywords`	Resolved	`summary: z.string().max(500)`, `tags: z.array().max(8)`, `keywords: z.array().max(15)`. Bounds set slightly above the prompt's soft targets, with a comment explaining why. Three new schema tests pin the bounds.
`synthesize.ts` — tags not normalized to kebab-case	Resolved	`writeSynthesisFile` now runs `candidate.tags.map((t) => t.toLowerCase().trim().replaceAll(/\s+/g, '-')).filter((t) => t.length > 0)` before assignment. Empty-after-trim entries are dropped (nice, slightly above what was suggested). New test at `synthesize.test.ts:242` exercises mixed-case + multi-word + padded inputs.
`synthesize.test.ts` — `to.include('tags:')` silent on flow-style	Resolved	Replaced with `expect(content).to.match(/^tags: \[/m)` and the same for `keywords:`. A future `flowLevel` revert in the writer now fails this test directly, not only the consolidate test.
`consolidate.ts:298` — `consolidated_from` flow-style note	Acknowledged, no change	Non-blocking; PR description's evidence (live `brv dream` × visual check) covers the rendered shape. Acceptable.
`synthesize.ts:288` — praise on imperative key assignment	N/A	No action needed.

What's still good in the original PR

Forward-only scope (pre-existing 4-field synthesized files left alone) is honest and stated in the PR body.
Field order in writeSynthesisFile (synthesize.ts:275–288) mirrors markdown-writer.ts:185–192 for the seven semantic fields, with synthesis markers preserved first to avoid mechanical reorder diffs on regeneration. Comment block makes both intents legible.
flowLevel: 1 is now consistent across markdown-writer.ts:194, consolidate.ts:298, consolidate.ts:308, and synthesize.ts:289 — the on-disk shape divergence is fully eliminated.
isDescendantOf path-traversal guard (synthesize.ts:247) and the access-then-skip collision check (synthesize.ts:252) are still in place; the new normalization step happens after both guards run.
confidence < 0.7 review gating preserved verbatim.
Sidecar boundary intact — runtimeSignalStore.set(relativePath, createDefaultRuntimeSignals()) is the only signals path; the expect(content).to.not.include('maturity:') assertions lock it down.

Minor observations (non-blocking, would not block merge)

Tag normalization is whitespace-only. t.toLowerCase().trim().replaceAll(/\s+/g, '-') correctly handles 'Auth Service' → 'auth-service', but a tag like 'Auth/Service' or 'auth_service' would land on disk verbatim (just lowercased). The original review pointed at the common case and the fix lands that case; broader sanitization would be over-engineering. Flagging only.
No deduplication after normalization. If the model emits both 'Auth Service' and 'auth-service', both land in tags:. The schema's .max(8) catches array overflow but not duplicates. Probably not worth a [...new Set(...)] here since the LLM rarely produces near-duplicate tags, but worth knowing.
Out-of-scope follow-up (carried from prior review, still relevant): executeMerge (consolidate.ts:503) writes only consolidated_at / consolidated_from if the LLM-supplied mergedContent already lacks frontmatter, which means merged files can have the same empty-card-preview class of bug as pre-PR synthesized files. Different code path, different bug, but the same UX failure mode — worth a separate issue.

Concerns considered and dismissed

Schema permissiveness on summary / array sizes: now bounded.
consolidated_from flow-style reflow: PR author confirmed via live brv dream evidence; rendered shape is acceptable.
needsReview gate, sidecar boundary, path traversal: all preserved.
· Branch: feat/ENG-2602

RyanNg1403 added 2 commits May 8, 2026 12:32

RyanNg1403 requested review from bao-byterover and leehpham as code owners May 8, 2026 10:04