Skip to content

feat: write semantic frontmatter on synthesized files#621

Merged
bao-byterover merged 3 commits intomainfrom
feat/ENG-2602
May 9, 2026
Merged

feat: write semantic frontmatter on synthesized files#621
bao-byterover merged 3 commits intomainfrom
feat/ENG-2602

Conversation

@RyanNg1403
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: Synthesized files (from brv dream) wrote only four marker fields (confidence, sources, synthesized_at, type). Cogit served empty title / short_description / tags to the web UI's card-mode display, so every dream-generated file rendered with empty preview slots.
  • Why it matters: Card-mode is the primary browsing view for context files; synthesized cards were unreadable.
  • What changed:
    • writeSynthesisFile now also emits the seven semantic fields regular curate output writes (title, summary, tags, related, keywords, createdAt, updatedAt).
    • The synthesis LLM prompt and SynthesisCandidateSchema are extended so the model produces meaningful summary / tags / keywords as part of the synthesis JSON.
    • YAML serialization in synthesize and consolidate.addFrontmatterFields switched to flowLevel: 1 so all dream writers produce the same on-disk shape as markdown-writer.ts (regular curate output).
  • What did NOT change:
    • Pre-existing synthesized files keep their four-field shape until manually regenerated (out of scope; can be a separate migration).
    • needsReview gating (confidence < 0.7) is untouched.
    • The sidecar runtime-signals path is untouched; ranking data still lives outside markdown frontmatter.

Type of change

  • Bug fix
  • New feature

Scope (select all touched areas)

  • Agent / Tools
  • Server / Daemon

Linked issues

  • Closes # N/A (linkage via branch name + commit subjects)
  • Related # N/A

Root cause (bug fixes only, otherwise write N/A)

  • Root cause: writeSynthesisFile was written in isolation with only the four synthesis-specific markers, never matching the seven-field semantic shape that markdown-writer.ts emits for regular curate output. Cogit's parser returns empty values for missing fields, so the web UI rendered empty previews.
  • Why this was not caught earlier: card-mode display in the web UI exposed the gap; the divergence pre-dated card-mode and went unnoticed because raw markdown files looked plausible on disk.

Test plan

  • Coverage added:
    • Unit test
    • Manual verification only (live curate + dream + push to remote, visual card-mode check)
  • Test file(s):
    • test/unit/infra/dream/operations/synthesize.test.ts
    • test/unit/infra/dream/operations/consolidate.test.ts
    • test/unit/infra/dream/dream-response-schemas.test.ts
  • Key scenarios covered:
    • Synthesized file emits all seven semantic fields plus the four synthesis markers.
    • Synthesized frontmatter is parseable as the regular semantic shape (round-trip safe).
    • LLM prompt instructs the model to produce summary / tags / keywords and the JSON-schema example mirrors SynthesisCandidateSchema.
    • consolidate.ts TEMPORAL_UPDATE preserves flow-style arrays (no block-style reflow on rewrite).

User-visible changes

  • New synthesized files render with full title / summary excerpt / tag chips / "N minutes ago" in the web UI's card-mode display.
  • Pre-existing synthesized files keep their old card appearance (empty preview) until manually regenerated.

Evidence

  • Failing test/log before + passing after — synthesize.test.ts had two new tests asserting the seven-field shape; failed for missing fields before, pass after the writer change.
  • Trace/log snippets — live brv curate × 8 + brv dream produced 5 synthesized files with the new 11-field flow-style frontmatter; visual confirmation on the web UI shows full card-mode previews for the new files while older 4-field files still render empty (expected).

Checklist

  • Tests added or updated and passing (npm test — 7364 passing, 16 pending)
  • Lint passes (npm run lint clean on changed files)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (N/A — no doc surfaces touched)
  • No breaking changes
  • Branch is up to date with main

Risks and mitigations

  • Risk: Existing synthesized files keep their old 4-field shape and continue to render with empty card-mode previews until manually regenerated.
    • Mitigation: Forward-looking only by design. A separate migration can scrub or regenerate stale entries.
  • Risk: Slight LLM cost increase from the three new required JSON output fields (summary, tags, keywords).
    • Mitigation: ~50–100 extra output tokens per synthesis candidate; small relative to the UX improvement.

RyanNg1403 added 2 commits May 8, 2026 12:32
Synthesized files (created by `brv dream`'s synthesize operation) wrote
only four frontmatter fields (confidence, sources, synthesized_at, type).
The cogit backend therefore exposed empty title / short_description /
tags in DtoV3MemoryCardResource and the web UI's card-mode display
rendered empty preview slots.

writeSynthesisFile now also emits the seven semantic fields that regular
`brv save` files write (title, summary, tags, related, keywords,
createdAt, updatedAt). The four synthesis markers stay at the top of the
frontmatter in their existing on-disk order so re-generating an old
synthesized file does not produce a mechanical reorder diff. The seven
new fields are appended in the same order generateFrontmatter uses in
markdown-writer.ts so the on-disk shape matches regular files.

The synthesis LLM prompt and SynthesisCandidateSchema are extended so
the model produces meaningful summary / tags / keywords as part of the
synthesis JSON; related is left empty since synthesized files use
`sources` for provenance.

Forward-looking only — existing synthesized files keep their four-field
shape until manually regenerated. `needsReview` gating (confidence < 0.7)
is unchanged.
Synthesized files (`synthesize.ts`) and consolidate's `addFrontmatterFields`
helper produced block-style array frontmatter (`tags:\n  - a\n  - b`)
instead of the flow style (`tags: [a, b, c]`) that `markdown-writer.ts`
uses for regular `brv curate` output. Result: three on-disk dialects in
the same context tree — regular files vs synthesized files vs any file
consolidate had rewritten.

- synthesize.ts: pass `flowLevel: 1` to yamlDump so newly synthesized
  files emit flow-style arrays matching markdown-writer.
- consolidate.ts addFrontmatterFields: drop both yamlDump sites from
  flowLevel: 2 to flowLevel: 1 so TEMPORAL_UPDATE / MERGE rewrites stop
  reflowing brackets into block style on every consolidate pass.

New regression test asserts a TEMPORAL_UPDATE round-trip preserves
flow-style arrays.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Claude finished @RyanNg1403's task in 4m 14s —— View job


Claude PR Review — feat: write semantic frontmatter on synthesized files

  • Read changed files (synthesize.ts, consolidate.ts, dream-response-schemas.ts)
  • Read corresponding test files
  • Cross-check with sibling code (markdown-writer.ts field order + flowLevel: 1)
  • Provide inline review comments (5 posted)
  • Post summary review with score

Summary

Score: 5 / 5 — clean, well-scoped, well-tested fix. Recommend merge.

What this PR does well

  • Correctly diagnosed the root cause. writeSynthesisFile was emitting only the four synthesis markers; cogit's parser maps the seven semantic fields (title, summary, tags, related, keywords, createdAt, updatedAt) into DtoV3MemoryCardResource, so missing-field-becomes-empty-card was deterministic.
  • Field order is load-bearing and the code documents why (synthesize.ts 261–268). Synthesis markers first preserves the historical shape for any future regenerator; the seven semantic fields below mirror markdown-writer.ts:185–192. Insertion order via imperative property assignment + sortKeys: false is the right tool.
  • flowLevel: 1 alignment is a real consistency fix. Pre-PR, writeSynthesisFile had no flowLevel option (default -1 → block-style arrays), and consolidate.addFrontmatterFields used flowLevel: 2 — both diverged from markdown-writer.ts's flowLevel: 1. After this PR, every dream writer emits the same on-disk shape.
  • Tests cover the contract, not just the implementation. The "parseable as the regular semantic shape" test (yamlLoad round-trip + per-property type checks) and the "preserves flow-style arrays" test (regex rejecting block-style reflow) both lock down behavior cogit and the web UI depend on.
  • Forward-only scope is honest. PR description correctly flags that pre-existing 4-field synthesized files keep their old shape until manually regenerated, and that this is by design.

Inline comments (all non-blocking)

  • dream-response-schemas.ts:33 — schema is permissive on summary length and tags/keywords count. Optional .max() constraints would bound LLM misbehavior.
  • consolidate.ts:298flowLevel 2 → 1 also flips consolidated_from: Array<{...}> from block-style array of flow objects to single-line flow. Worth a visual sanity check on rendered consolidation metadata, but parseable.
  • synthesize.ts:282 — praise for the deliberate insertion-order pattern + comment block.
  • synthesize.ts:372 — the prompt asks for "lowercase, kebab-case" tags but nothing normalizes. A one-liner in writeSynthesisFile would close the gap without trusting the LLM.
  • synthesize.test.ts:217to.include('tags:') doesn't assert flow style. The companion tests cover it in aggregate, but a single match(/^tags: \[/m) here would catch a flowLevel revert sooner.

Concerns considered and dismissed

  • Existing-file collision check (synthesize.ts:252): the access-then-skip path means old 4-field files can't be silently rewritten with a different format — good, that matches the PR's forward-only intent.
  • Path traversal (synthesize.ts:247): isDescendantOf guard already prevents LLM-supplied placement: "../../etc". Untouched and still correct.
  • needsReview gating: confidence < 0.7 threshold preserved verbatim.
  • Sidecar boundary: runtime-signals still flow into the sidecar via runtimeSignalStore.set(relativePath, createDefaultRuntimeSignals()); nothing leaks into markdown frontmatter. Tests assert this (expect(content).to.not.include('maturity:')).

Out-of-scope but worth a follow-up issue

  • executeMerge writes consolidation frontmatter via addFrontmatterFields, but if the LLM-supplied mergedContent lacks frontmatter, the resulting file gets only consolidated_at/consolidated_from — same empty-card-preview class of bug for merged files. Different code path, different bug, but related.
    · Branch: feat/ENG-2602

Comment thread src/server/infra/dream/dream-response-schemas.ts Outdated
Comment thread src/server/infra/dream/operations/consolidate.ts
Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread test/unit/infra/dream/operations/synthesize.test.ts Outdated
…ma and writer

- SynthesisCandidateSchema: add max() bounds (summary 500, tags 8,
  keywords 15) so a misbehaving model cannot land oversized text or
  unbounded arrays directly in YAML frontmatter and on the card. Bounds
  sit slightly above the prompt's soft targets to avoid false rejects.
- writeSynthesisFile: normalize tags to lowercase kebab-case in code
  rather than trusting the LLM to honor the prompt rule, so card chips
  and BM25 search see consistent labels.
- synthesize.test.ts: pin flow-style array output with a
  `^tags: \[/m` regex assertion so a future flowLevel revert in the
  writer fails the synthesis test directly, not only the consolidate
  test.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

Claude finished @RyanNg1403's task in 2m 31s —— View job


Claude PR Review (re-review after 233ac51) — feat: write semantic frontmatter on synthesized files

  • Read changed files at current HEAD
  • Verify which previous review comments were addressed in commit 233ac51
  • Cross-check sibling code (markdown-writer.ts field order + flowLevel: 1)
  • Provide inline review comments
  • Post summary review with score

Summary

Score: 5 / 5 — follow-up commit 233ac51 directly addressed the actionable items from the prior review without scope creep. Recommend merge.

Disposition of prior review comments (all addressed cleanly)

Prior comment Status Notes
dream-response-schemas.ts — unbounded summary / tags / keywords Resolved summary: z.string().max(500), tags: z.array().max(8), keywords: z.array().max(15). Bounds set slightly above the prompt's soft targets, with a comment explaining why. Three new schema tests pin the bounds.
synthesize.ts — tags not normalized to kebab-case Resolved writeSynthesisFile now runs candidate.tags.map((t) => t.toLowerCase().trim().replaceAll(/\s+/g, '-')).filter((t) => t.length > 0) before assignment. Empty-after-trim entries are dropped (nice, slightly above what was suggested). New test at synthesize.test.ts:242 exercises mixed-case + multi-word + padded inputs.
synthesize.test.tsto.include('tags:') silent on flow-style Resolved Replaced with expect(content).to.match(/^tags: \[/m) and the same for keywords:. A future flowLevel revert in the writer now fails this test directly, not only the consolidate test.
consolidate.ts:298consolidated_from flow-style note Acknowledged, no change Non-blocking; PR description's evidence (live brv dream × visual check) covers the rendered shape. Acceptable.
synthesize.ts:288 — praise on imperative key assignment N/A No action needed.

What's still good in the original PR

  • Forward-only scope (pre-existing 4-field synthesized files left alone) is honest and stated in the PR body.
  • Field order in writeSynthesisFile (synthesize.ts:275–288) mirrors markdown-writer.ts:185–192 for the seven semantic fields, with synthesis markers preserved first to avoid mechanical reorder diffs on regeneration. Comment block makes both intents legible.
  • flowLevel: 1 is now consistent across markdown-writer.ts:194, consolidate.ts:298, consolidate.ts:308, and synthesize.ts:289 — the on-disk shape divergence is fully eliminated.
  • isDescendantOf path-traversal guard (synthesize.ts:247) and the access-then-skip collision check (synthesize.ts:252) are still in place; the new normalization step happens after both guards run.
  • confidence < 0.7 review gating preserved verbatim.
  • Sidecar boundary intact — runtimeSignalStore.set(relativePath, createDefaultRuntimeSignals()) is the only signals path; the expect(content).to.not.include('maturity:') assertions lock it down.

Minor observations (non-blocking, would not block merge)

  • Tag normalization is whitespace-only. t.toLowerCase().trim().replaceAll(/\s+/g, '-') correctly handles 'Auth Service''auth-service', but a tag like 'Auth/Service' or 'auth_service' would land on disk verbatim (just lowercased). The original review pointed at the common case and the fix lands that case; broader sanitization would be over-engineering. Flagging only.
  • No deduplication after normalization. If the model emits both 'Auth Service' and 'auth-service', both land in tags:. The schema's .max(8) catches array overflow but not duplicates. Probably not worth a [...new Set(...)] here since the LLM rarely produces near-duplicate tags, but worth knowing.
  • Out-of-scope follow-up (carried from prior review, still relevant): executeMerge (consolidate.ts:503) writes only consolidated_at / consolidated_from if the LLM-supplied mergedContent already lacks frontmatter, which means merged files can have the same empty-card-preview class of bug as pre-PR synthesized files. Different code path, different bug, but the same UX failure mode — worth a separate issue.

Concerns considered and dismissed

  • Schema permissiveness on summary / array sizes: now bounded.
  • consolidated_from flow-style reflow: PR author confirmed via live brv dream evidence; rendered shape is acceptable.
  • needsReview gate, sidecar boundary, path traversal: all preserved.
    · Branch: feat/ENG-2602

Comment thread src/server/infra/dream/operations/synthesize.ts
Comment thread src/server/infra/dream/dream-response-schemas.ts
Comment thread test/unit/infra/dream/operations/synthesize.test.ts
@bao-byterover bao-byterover merged commit da4cfa6 into main May 9, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants