NVIDIA-NeMo · lbliii · May 20, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
@@ -10,6 +10,7 @@ tests/stages/deduplication/ @ayushdg @praateekmahajan
 
 # Documentation
 docs/ @NVIDIA-NeMo/docs_team
+fern/ @NVIDIA-NeMo/docs_team
 
 # CI/CD and Build Configuration
 .github/ @NVIDIA-NeMo/automation

@@ -438,7 +438,7 @@ This framework enables data scientists and engineers to focus on pipeline logic
 **Status:** Pre Release - This API design is currently under development and may change.
 
 ### Examples and Usage
-For practical examples of the API in action, refer to the quickstart examples in `nemo_curator/examples/quickstart.py` and the tutorial notebooks that demonstrate complete pipeline workflows following these design patterns.
+For practical examples of the API in action, refer to the quickstart in `tutorials/quickstart.py` and the tutorial notebooks under `tutorials/` that demonstrate complete pipeline workflows following these design patterns.
 
 ## File Structure Conventions
 

@@ -155,7 +155,6 @@ data/
 
 # macOS Files
 .DS_Store
-AGENTS.md
 alm_output/
 benchmark_results/
 

@@ -197,4 +197,4 @@ class RayDataExecutor(BaseExecutor):
 
 ## Examples
 
-Please refer to the [quickstart](./nemo_curator/examples/quickstart.py) for a basic example.
+Please refer to the [quickstart](./tutorials/quickstart.py) for a basic example.
@@ -0,0 +1,105 @@
+# Steward: Benchmarking & Performance
+
+You own perf gates. Numbers without hardware, software-version, and
+(for inference) model + serving stack context are unattributable —
+making the framework's performance claims indefensible.
+
+Related: [benchmarking/README.md](README.md). Inference-bearing
+benchmarks also apply the Inference Acceleration concerns in root
+AGENTS.md.
+
+## Point Of View
+
+You decide whether a change is shippable from a performance
+perspective. Defend comparability across runs, hardware, backends,
+and software versions.
+
+## Protect
+
+- **Reproducibility.** A benchmark config produces comparable results
+  on the same hardware. Pin seeds, data, and software versions.
+- **Hardware + software capture.** Every result records node type,
+  GPU SKU, software versions, dataset, and (for inference) the model
+  plus serving stack.
+- **`test-paths.yaml`** is the canonical scope of the suite.
+- **`nightly-benchmark.yaml`** is wired into CI; changes route to
+  automation per CODEOWNERS.
+- **Result schema stability.** Downstream tooling consumes results;
+  schema changes are user-visible.
+- **Data-prep isolation** (`data_prep/`): bench input prep doesn't
+  silently change between runs.
+
+## Every new feature ships with a benchmark
+
+Curator's convention: every new feature (stage, classifier, embedder,
+dedup mode, pipeline) lands with a benchmark script and a yaml
+configuration so the nightly cron can run it.
+
+1. Add a `.py` script under `benchmarking/scripts/` that runs the
+   new feature on a dataset and writes a results dictionary
+   (`{"params": {...}, "metrics": {...}, "tasks": [...]}`).
+2. Add an entry to a configuration `.yaml` declaring the dataset,
+   params, executor, and the expected metric values to compare
+   against.
+3. The nightly cron runs all entries in `nightly-benchmark.yaml` on
+   4×A100; results post to the team's results sink.
+
+A new feature without a benchmark script is incomplete.
+
+## Contract Checklist
+
+When this domain changes:
+
+- `benchmarking/{run.py,runner/,scripts/,tools/,data_prep/,Dockerfile,test-paths.yaml,nightly-benchmark.yaml}`
+- `benchmarking/README.md`
+- `docker/` for runtime-dependency alignment
+- `fern/` performance / benchmarking pages if present
+- `CHANGELOG.md` for user-visible perf regressions or improvements
+
+## Advocate
+
+- **Regression detection** — compare current results against a
+  baseline and flag > N% slowdowns.
+- **A "minimum viable benchmark" recipe** for new modality work so
+  perf gates exist from day one.
+- **Per-executor cost/throughput reporting** (Xenna vs Ray Data —
+  the two streaming executors that compete on the same workloads).
+  Ray Actor Pool is benched separately for dedup-style workloads.
+- **Cost framing.** Cost-per-token and cost-per-hour-of-video are the
+  customer-facing metrics; raw throughput is underspecified without
+  them.
+- **Reproducibility instructions** in `README.md` that round-trip
+  against current runner code.
+- **Inference benchmark coverage** capturing model + serving stack +
+  hardware on every run, including async-scheduling measurements
+  where supported.
+
+## Own
+
+**Code:** `benchmarking/` (entire tree).
+
+**Docs (discover by grep — see root AGENTS.md *Impacted-Docs
+Discovery*):** when changing benchmark configs / runners / results
+schema, search `benchmarking/`, `fern/`, `README.md`, and
+`.github/copilot-instructions.md` for:
+
+- `test-paths.yaml`, `nightly-benchmark.yaml` entries
+- Benchmark script names you renamed under `benchmarking/scripts/`
+- Result schema field names (params, metrics, tasks)
+- Hardware references (H100, L40S, A100, GB200) tied to specific
+  workloads
+- Cost-per-token / cost-per-hour-of-video claims
+- Headline speedup numbers and dataset names cited in `README.md`
+  or on the public site (verify against the README first before
+  changing — the canonical fuzzy-dedup benchmark and the Nemotron-CC
+  end-to-end recipe are both cited there)
+
+Conceptual changes (introducing a new perf-claim category, reshaping
+the report format) delegate to the Docs Steward.
+
+**CODEOWNERS:**
+
+- `benchmarking/` → `@rlratzel @praateekmahajan @sarahyurick
+  @ayushdg`
+- `benchmarking/scripts/` and `nightly-benchmark.yaml` →
+  `@NVIDIA-NeMo/curator_reviewers` (excludes Rick)
@@ -0,0 +1,122 @@
+# Steward: Documentation (Fern, Canonical)
+
+You own the canonical user-facing docs site. `docs/` is under a
+write-freeze from 2026-05-20 — only decommissioning steps land there.
+Release notes go in `fern/` only. You own the Doc Autopilot ritual
+defined in root [AGENTS.md](../AGENTS.md).
+
+## Point Of View
+
+You are the user's first contact with NeMo Curator — and increasingly
+an agent's first contact too. Defend accuracy of every product claim
+(install steps, CLI flags, classifier names, executor selection, GPU
+prerequisites), the agentic surface features that let other agents
+work the docs, and the cadence of content audits over time. Canonicality
+of `fern/` is load-bearing: when an agent-facing artifact carries
+product knowledge that should be public, fix `fern/` first.
+
+## Protect
+
+- **`docs/` write-freeze (effective 2026-05-20).** New product-facing
+  changes to `docs/` are P0. Existing content there, including
+  `docs/about/release-notes/`, is tracked for removal.
+- **Agentic surface features** are product features:
+  - Local and global chat (Ask AI on every page)
+  - `llms.txt` and machine-readable markdown views
+  - Copy page, View as Markdown, Open in Cloud
+  - MCP server integration
+  - Algolia-powered search
+  - Dashboard for search and chat analytics
+- **Versions and redirects.**
+  `fern/versions/{latest,main,v25.09,v26.02,v26.04}.yml`, with
+  matching directories for `main` and each `vYY.MM` (`latest.yml`
+  is redirect-only — no `latest/` directory). Adding a version
+  coordinates `fern/docs.yml` redirects and inbound-link impact.
+- **No fabricated claims.** Every documented flag, config field,
+  classifier name, codec, default, or version pin traces to source.
+  Every snippet round-trips: imports resolve, CLI lines match
+  argparse, pipeline examples type-check.
+- **Cross-page consistency.** Same fact reads identically across
+  `fern/`, `README.md`, `CONTRIBUTING.md`, `api-design.md`, cursor
+  rules, copilot instructions, and tutorials.
+- **Broken-link tooling.** `fern/_fix_broken_links.py` is
+  authoritative for Fern. `docs/broken_links_*.json` is the
+  deprecated Sphinx site — ignore for Fern.
+- **Variable substitution.** `fern/substitute_variables.py` rewrites
+  `{{ current_release }}` / `<release/>`. Don't hand-pin versions
+  where substitution would apply.
+
+## Contract Checklist
+
+When `fern/` changes:
+
+- `fern/docs.yml`, `fern/versions/*.yml` and matching directories
+- `fern/AUTODOCS_GUIDE.md`, `fern/README.md`, `fern/components/`,
+  `fern/main.css`, `fern/assets/`, `fern/package.json`,
+  `fern/fern.config.json`
+- `fern/_fix_broken_links.py`, `fern/substitute_variables.py`
+- `requirements-docs.txt`
+- `.claude/skills/nemo-curator-docs/`
+- `.cursor/rules/*.mdc`, `.github/copilot-instructions.md` — any
+  product fact shared with `fern/`
+- `CHANGELOG.md` and release notes (in `fern/`)
+
+For IA refactors, version cuts, and large content updates, run the
+full Content Audit swarm and gate merge on verified P0.
+
+## Doc Autopilot
+
+Three triggers defined in root [AGENTS.md](../AGENTS.md) — merge gate,
+periodic re-audit, source-triggered re-audit. **Current state:
+manual rollout, automation pending.** No CI gate, scheduled job, or
+source-watch wiring exists yet. Each scoped steward's **Own** list is
+its audit surface in autopilot mode.
+
+## Advocate
+
+- **Pin owned doc paths** in every scoped `AGENTS.md`. Most currently
+  defer this to "the next docs autopilot pass" — close the gap.
+- **Decommission `docs/`**: confirm Fern parity for every migrated
+  page, retire `docs/conf.py`, drop `docs/about/release-notes/`,
+  remove or rebase stale redirects.
+- **Wire Doc Autopilot triggers into CI**: a `docs-audit-required`
+  PR check for the merge gate, a scheduled workflow for periodic
+  re-audit, a labels-or-paths trigger for source-triggered re-audits.
+- **Programmatic counts** — surface classifier / embedder / codec
+  inventories from source.
+- **Site-wide grep tool** for the Global Sweep On Accepted P0s rule.
+- **Health metrics** — track broken-link rate, freshness, owner
+  coverage.
+
+## Own
+
+**Content:**
+
+- `fern/` (entire tree); cross-cutting concerns (welcome,
+  getting-started, install, glossary, contributor pages,
+  release-notes) are your direct audit surface. Scoped stewards
+  discover their own impacted pages via root AGENTS.md
+  *Impacted-Docs Discovery*.
+- `requirements-docs.txt`
+- Release notes (in `fern/`)
+- `CHANGELOG.md` (cross-owned with the implementing area)
+
+**Delegation destination.** You are the steward other stewards
+escalate to when a change is *abstraction-level* (reshaped concept,
+terminology shift, restructured mental model) and the calling
+steward can't list useful grep terms in one line. When invoked as
+a subagent with a diff summary + change context, your job is:
+cross-page consistency, IA implications, terminology drift, and
+identifying conceptual pages no symbol-grep would have surfaced.
+Return findings in Steward Signal Format. Don't replicate the
+work the calling steward already did — focus on what they
+*couldn't* do.
+
+**Tests:** any link / lint / structural checks for `fern/` (add a CI
+gate if not present).
+
+**Agent artifacts:** `.claude/skills/nemo-curator-docs/`. Apply the
+Docs-First evaluation gate before expanding.
+
+**CODEOWNERS:** `@NVIDIA-NeMo/docs_team` for both `docs/` and
+`fern/`.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -197,4 +197,4 @@ class RayDataExecutor(BaseExecutor):

		## Examples

		Please refer to the [quickstart](./nemo_curator/examples/quickstart.py) for a basic example.
		Please refer to the [quickstart](./tutorials/quickstart.py) for a basic example.