Skip to content

Add YAML CST as a pipeline source and harden the CST surface (kyo-schema)#1655

Merged
fwbrasil merged 56 commits into
getkyo:mainfrom
DamianReeves:codex/kyo-schema-yaml-cst-followup
Jun 14, 2026
Merged

Add YAML CST as a pipeline source and harden the CST surface (kyo-schema)#1655
fwbrasil merged 56 commits into
getkyo:mainfrom
DamianReeves:codex/kyo-schema-yaml-cst-followup

Conversation

@DamianReeves

Copy link
Copy Markdown
Collaborator

Problem

kyo-schema gained YAML support, including an opt-in concrete syntax tree (CST) that preserves comments, trivia, scalar styles, and source spans. Two gaps remained after the initial CST work:

  1. Correctness and clarity issues in the CST surface surfaced by review: block scalars were corrupted on the trivia render path, structural edits silently acted on the first of duplicate mapping keys, complex collection keys rendered a debug toString, and processor-backed cstAll could diverge in document count from the source-backed path.
  2. A CST could be produced (parse, edit, render) but could not be consumed: nothing accepted a Yaml.Cst.Document/Yaml.Cst.Stream as a decode/render/visit source, so a parsed-and-edited CST had to be rendered back to text and re-parsed to decode it.

Solution

Two themes.

CST hardening:

  • Render block scalars (|/>) on the trivia path instead of double-quoting them, reusing the writer's appendBlock (with a headerSuffix so trailing comments survive on the header line).
  • Fail clearly on ambiguous edits: duplicate mapping keys along an edit path error instead of editing the first match; insert fails on an existing key rather than overwriting.
  • Render complex collection keys as valid flow YAML rather than a node toString.
  • Align processor cstAll with the source-backed path (drop a leading marker-only segment), and document that cstAll returns every document.
  • Test hygiene: group multi-assert YAML tests into single assertResult blocks over named tuples.

CST as a source (new capability):

  • YamlCstBuilder emits events from a CST (emitDocumentBody/emitStream).
  • Yaml.Pipeline gains CST-source overloads: visit, render, parse, parseAll, decode, decodeAll over Cst.Document/Cst.Stream. Stream decoding honors ReaderConfig.documentIndex/documentMode exactly as the String path (same precedence, same error messages), including a CST-level top-level-mapping merge.
  • Two opt-in builder stages, throughCst (per document) and throughCstStream (whole stream), composed CST-first: the String read terminals build a source-backed CST, apply the stream transform then the per-document transform, emit events, then run any event processors. This lets a pipeline edit structure with comments intact and decode/render the edited result in one pass.
  • Matching top-level Yaml.decode/decodeAll/render/parse/parseAll over a CST, delegating to the default pipeline.
  • README doctest and scaladoc for the new surface.

Notes

  • Fast paths are unchanged: with no CST transform and no processor, Yaml.decode(String) and Yaml.pipeline.decode/render/parse(String) are byte-identical to before and construct no CST. This is pinned by a regression test.
  • throughCst-staged render uses the trivia-aware CST renderer (not the event renderer) so comments on unedited nodes survive; render(stream) re-derives YamlCstRenderer.stream framing (document markers, ---/... separators). A stream transform clears the stream's originalSource so a structural edit (e.g. dropping a document) cannot render stale source.
  • Yaml.render(doc) always emits canonically through the writer; Cst.Document.render returns the original source verbatim until edited. Both coexist and are documented.
  • The CST merge for MergeTopLevelMappings is structural (skips non-mapping-rooted documents), which differs from the byte-level String merge; this is documented and tested.
  • Verification: full focused YAML suites (286 tests), kyo-schema/doctest (66 blocks), and kyo-bench/compile all green; git diff --check clean.

@DamianReeves DamianReeves marked this pull request as draft June 2, 2026 23:50
@DamianReeves DamianReeves force-pushed the codex/kyo-schema-yaml-cst-followup branch from c295f4d to fd4d863 Compare June 3, 2026 08:08
@DamianReeves DamianReeves marked this pull request as ready for review June 3, 2026 15:26
@DamianReeves DamianReeves force-pushed the codex/kyo-schema-yaml-cst-followup branch 2 times, most recently from 9325897 to d535bac Compare June 6, 2026 12:29
Detect duplicate mapping keys along an edit path and fail instead of
silently editing the first match. Make insert fail on an existing key
rather than overwriting it. Render complex collection keys as flow YAML
instead of emitting a node toString in the trivia path.
Process the same document bodies the source-backed stream parser
produces by dropping a leading marker-only segment, so processor and
no-processor cstAll return matching document counts. Replace the
hand-rolled Result sequencing and its synthetic exception with a plain
Result match, and document that cstAll returns every document.
Document that source-backed render ignores WriterConfig until edited,
comment the unreachable event-emission failure boundaries and the
known collection size hint, convert appendIndent to a tail-recursive
loop, and skip the clearSource rebuild when a node carries no source.
Replace multiple per-test asserts with grouped assertResult blocks over
named tuples, matching the existing YAML test style.
Collapse tests with multiple per-test asserts into single grouped
assertResult blocks over named tuples, matching the existing YAML test
style. Stateful reader checks hoist intermediate reads into ordered vals
to preserve evaluation order. No assertions dropped or weakened.
Add emitDocumentBody and emitStream to YamlCstBuilder. emitDocument now
delegates its body emission to emitDocumentBody. emitStream emits one
stream boundary around all document bodies via a tail-recursive helper.
Add visit/render/parse overloads to Pipeline for Cst.Document and
Cst.Stream inputs, and a parseAll overload for Cst.Stream. The stream
render path joins documents with --- separators, mirroring
YamlCstRenderer.stream, and short-circuits to the original source when
no processors are attached.
Add the stream-end `...` marker to render(stream) when documentMarkers
is StartAndEnd, fix the scaladoc to explain the real reason for the
per-document render-and-join approach, avoid allocating a child Pipeline
for single-document streams, expand scaladocs on the CST visit overloads,
and strengthen the parseAll test with concrete Mapping type assertions.
Add `throughCst` and `throughCstStream` builder methods on `Pipeline[Err]`. When set, String read terminals (decode/render/parse) build a source-backed CST, apply the stream transform then the per-document transform, and continue through event processors. The fast path (no CST transform) is unchanged.
Add a test exercising throughCstStream + throughCst together, proving stream transform runs before per-document transform on survivors. Also consolidate the duplicated widening closures in throughCst/throughCstStream to use the shared documentTransformWiden/streamTransformWiden helpers, and add a clarifying comment on the per-document render path.
Rebasing onto the kyo-native test framework (getkyo#1658) removed ScalaTest's
assertResult. Convert the 182 assertResult calls across the YAML test
suites to kyo-test assert(actual == expected), splitting named-tuple
comparisons into one assertion per field. Helper methods that call
fail or assert outside a leaf body now take a using kyo.test.AssertScope.

Fix two failures the migration surfaced: restore the SOH control byte in
the writer control-character test, and compare the pipeline
single-document error by message substring instead of full equality,
since ParseException messages embed the call-site frame.
Bind the per-document parse result to a typed val so the exhaustivity
checker can resolve the Success/Failure/Panic match on the opaque Result
type. The rebased compiler emitted an E029 warning on the inferred
flatMap result type at this site; the sibling mapDocuments loop already
matches on a cleanly typed scrutinee and does not warn.
@DamianReeves DamianReeves force-pushed the codex/kyo-schema-yaml-cst-followup branch from f4ff51c to 9b15e3c Compare June 13, 2026 18:16
@fwbrasil fwbrasil merged commit 3b23a71 into getkyo:main Jun 14, 2026
9 of 10 checks passed
@DamianReeves DamianReeves deleted the codex/kyo-schema-yaml-cst-followup branch June 14, 2026 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants