-
Notifications
You must be signed in to change notification settings - Fork 278
docs(governance): AGENTS.md steward network #2005
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lbliii
wants to merge
13
commits into
NVIDIA-NeMo:main
Choose a base branch
from
lbliii:lbliii/refine-pasted-prompt
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
60b059c
docs(governance): bootstrap AGENTS.md steward network
lbliii 3941db3
docs(governance): tighten AGENTS.md stewards
lbliii 0225e5f
docs(governance): apply second steward swarm findings
lbliii b98abbe
docs(governance): reshape AGENTS.md stewards around motivation + inte…
lbliii a28d90e
docs(governance): drop redundant "read this file" bootnote
lbliii 7b7f61c
docs(governance): switch stewards to second person; reframe inference
lbliii 09688f8
docs(text/modifiers): align docstring param name with signature
lbliii 8f4a0a9
docs(governance): apply Sarah's PR review — Ray Actor Pool is dedup-only
lbliii e90cd9d
docs(governance): seed stewards with Praateek's architectural discipline
lbliii 1b344f1
docs(governance): integrate Praateek's What-NOT-to-Do + new KRPs
lbliii 34478d2
docs(governance): replace pinned doc paths with grep-discovery + dele…
lbliii 02f6c75
docs(governance): address Sarah's second-round review comments
lbliii b03dc78
Merge branch 'main' into lbliii/refine-pasted-prompt
lbliii File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -155,7 +155,6 @@ data/ | |
|
|
||
| # macOS Files | ||
| .DS_Store | ||
| AGENTS.md | ||
| alm_output/ | ||
| benchmark_results/ | ||
|
|
||
|
|
||
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # Steward: Benchmarking & Performance | ||
|
|
||
| You own perf gates. Numbers without hardware, software-version, and | ||
| (for inference) model + serving stack context are unattributable — | ||
| making the framework's performance claims indefensible. | ||
|
|
||
| Related: [benchmarking/README.md](README.md). Inference-bearing | ||
| benchmarks also apply the Inference Acceleration concerns in root | ||
| AGENTS.md. | ||
|
|
||
| ## Point Of View | ||
|
|
||
| You decide whether a change is shippable from a performance | ||
| perspective. Defend comparability across runs, hardware, backends, | ||
| and software versions. | ||
|
|
||
| ## Protect | ||
|
|
||
| - **Reproducibility.** A benchmark config produces comparable results | ||
| on the same hardware. Pin seeds, data, and software versions. | ||
| - **Hardware + software capture.** Every result records node type, | ||
| GPU SKU, software versions, dataset, and (for inference) the model | ||
| plus serving stack. | ||
| - **`test-paths.yaml`** is the canonical scope of the suite. | ||
| - **`nightly-benchmark.yaml`** is wired into CI; changes route to | ||
| automation per CODEOWNERS. | ||
| - **Result schema stability.** Downstream tooling consumes results; | ||
| schema changes are user-visible. | ||
| - **Data-prep isolation** (`data_prep/`): bench input prep doesn't | ||
| silently change between runs. | ||
|
|
||
| ## Every new feature ships with a benchmark | ||
|
|
||
| Curator's convention: every new feature (stage, classifier, embedder, | ||
| dedup mode, pipeline) lands with a benchmark script and a yaml | ||
| configuration so the nightly cron can run it. | ||
|
|
||
| 1. Add a `.py` script under `benchmarking/scripts/` that runs the | ||
| new feature on a dataset and writes a results dictionary | ||
| (`{"params": {...}, "metrics": {...}, "tasks": [...]}`). | ||
| 2. Add an entry to a configuration `.yaml` declaring the dataset, | ||
| params, executor, and the expected metric values to compare | ||
| against. | ||
| 3. The nightly cron runs all entries in `nightly-benchmark.yaml` on | ||
| 4×A100; results post to the team's results sink. | ||
|
|
||
| A new feature without a benchmark script is incomplete. | ||
|
|
||
| ## Contract Checklist | ||
|
|
||
| When this domain changes: | ||
|
|
||
| - `benchmarking/{run.py,runner/,scripts/,tools/,data_prep/,Dockerfile,test-paths.yaml,nightly-benchmark.yaml}` | ||
| - `benchmarking/README.md` | ||
| - `docker/` for runtime-dependency alignment | ||
| - `fern/` performance / benchmarking pages if present | ||
| - `CHANGELOG.md` for user-visible perf regressions or improvements | ||
|
|
||
| ## Advocate | ||
|
|
||
| - **Regression detection** — compare current results against a | ||
| baseline and flag > N% slowdowns. | ||
| - **A "minimum viable benchmark" recipe** for new modality work so | ||
| perf gates exist from day one. | ||
| - **Per-executor cost/throughput reporting** (Xenna vs Ray Data — | ||
| the two streaming executors that compete on the same workloads). | ||
| Ray Actor Pool is benched separately for dedup-style workloads. | ||
| - **Cost framing.** Cost-per-token and cost-per-hour-of-video are the | ||
| customer-facing metrics; raw throughput is underspecified without | ||
| them. | ||
| - **Reproducibility instructions** in `README.md` that round-trip | ||
| against current runner code. | ||
| - **Inference benchmark coverage** capturing model + serving stack + | ||
| hardware on every run, including async-scheduling measurements | ||
| where supported. | ||
|
|
||
| ## Own | ||
|
|
||
| **Code:** `benchmarking/` (entire tree). | ||
|
|
||
| **Docs (discover by grep — see root AGENTS.md *Impacted-Docs | ||
| Discovery*):** when changing benchmark configs / runners / results | ||
| schema, search `benchmarking/`, `fern/`, `README.md`, and | ||
| `.github/copilot-instructions.md` for: | ||
|
|
||
| - `test-paths.yaml`, `nightly-benchmark.yaml` entries | ||
| - Benchmark script names you renamed under `benchmarking/scripts/` | ||
| - Result schema field names (params, metrics, tasks) | ||
| - Hardware references (H100, L40S, A100, GB200) tied to specific | ||
| workloads | ||
| - Cost-per-token / cost-per-hour-of-video claims | ||
| - Headline speedup numbers and dataset names cited in `README.md` | ||
| or on the public site (verify against the README first before | ||
| changing — the canonical fuzzy-dedup benchmark and the Nemotron-CC | ||
| end-to-end recipe are both cited there) | ||
|
|
||
| Conceptual changes (introducing a new perf-claim category, reshaping | ||
| the report format) delegate to the Docs Steward. | ||
|
|
||
| **CODEOWNERS:** | ||
|
|
||
| - `benchmarking/` → `@rlratzel @praateekmahajan @sarahyurick | ||
| @ayushdg` | ||
| - `benchmarking/scripts/` and `nightly-benchmark.yaml` → | ||
| `@NVIDIA-NeMo/curator_reviewers` (excludes Rick) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,122 @@ | ||
| # Steward: Documentation (Fern, Canonical) | ||
|
|
||
| You own the canonical user-facing docs site. `docs/` is under a | ||
| write-freeze from 2026-05-20 — only decommissioning steps land there. | ||
| Release notes go in `fern/` only. You own the Doc Autopilot ritual | ||
| defined in root [AGENTS.md](../AGENTS.md). | ||
|
|
||
| ## Point Of View | ||
|
|
||
| You are the user's first contact with NeMo Curator — and increasingly | ||
| an agent's first contact too. Defend accuracy of every product claim | ||
| (install steps, CLI flags, classifier names, executor selection, GPU | ||
| prerequisites), the agentic surface features that let other agents | ||
| work the docs, and the cadence of content audits over time. Canonicality | ||
| of `fern/` is load-bearing: when an agent-facing artifact carries | ||
| product knowledge that should be public, fix `fern/` first. | ||
|
|
||
| ## Protect | ||
|
|
||
| - **`docs/` write-freeze (effective 2026-05-20).** New product-facing | ||
| changes to `docs/` are P0. Existing content there, including | ||
| `docs/about/release-notes/`, is tracked for removal. | ||
| - **Agentic surface features** are product features: | ||
| - Local and global chat (Ask AI on every page) | ||
| - `llms.txt` and machine-readable markdown views | ||
| - Copy page, View as Markdown, Open in Cloud | ||
| - MCP server integration | ||
| - Algolia-powered search | ||
| - Dashboard for search and chat analytics | ||
| - **Versions and redirects.** | ||
| `fern/versions/{latest,main,v25.09,v26.02,v26.04}.yml`, with | ||
| matching directories for `main` and each `vYY.MM` (`latest.yml` | ||
| is redirect-only — no `latest/` directory). Adding a version | ||
| coordinates `fern/docs.yml` redirects and inbound-link impact. | ||
| - **No fabricated claims.** Every documented flag, config field, | ||
| classifier name, codec, default, or version pin traces to source. | ||
| Every snippet round-trips: imports resolve, CLI lines match | ||
| argparse, pipeline examples type-check. | ||
| - **Cross-page consistency.** Same fact reads identically across | ||
| `fern/`, `README.md`, `CONTRIBUTING.md`, `api-design.md`, cursor | ||
| rules, copilot instructions, and tutorials. | ||
| - **Broken-link tooling.** `fern/_fix_broken_links.py` is | ||
| authoritative for Fern. `docs/broken_links_*.json` is the | ||
| deprecated Sphinx site — ignore for Fern. | ||
| - **Variable substitution.** `fern/substitute_variables.py` rewrites | ||
| `{{ current_release }}` / `<release/>`. Don't hand-pin versions | ||
| where substitution would apply. | ||
|
|
||
| ## Contract Checklist | ||
|
|
||
| When `fern/` changes: | ||
|
|
||
| - `fern/docs.yml`, `fern/versions/*.yml` and matching directories | ||
| - `fern/AUTODOCS_GUIDE.md`, `fern/README.md`, `fern/components/`, | ||
| `fern/main.css`, `fern/assets/`, `fern/package.json`, | ||
| `fern/fern.config.json` | ||
| - `fern/_fix_broken_links.py`, `fern/substitute_variables.py` | ||
| - `requirements-docs.txt` | ||
| - `.claude/skills/nemo-curator-docs/` | ||
| - `.cursor/rules/*.mdc`, `.github/copilot-instructions.md` — any | ||
| product fact shared with `fern/` | ||
| - `CHANGELOG.md` and release notes (in `fern/`) | ||
|
|
||
| For IA refactors, version cuts, and large content updates, run the | ||
| full Content Audit swarm and gate merge on verified P0. | ||
|
|
||
| ## Doc Autopilot | ||
|
|
||
| Three triggers defined in root [AGENTS.md](../AGENTS.md) — merge gate, | ||
| periodic re-audit, source-triggered re-audit. **Current state: | ||
| manual rollout, automation pending.** No CI gate, scheduled job, or | ||
| source-watch wiring exists yet. Each scoped steward's **Own** list is | ||
| its audit surface in autopilot mode. | ||
|
|
||
| ## Advocate | ||
|
|
||
| - **Pin owned doc paths** in every scoped `AGENTS.md`. Most currently | ||
| defer this to "the next docs autopilot pass" — close the gap. | ||
| - **Decommission `docs/`**: confirm Fern parity for every migrated | ||
| page, retire `docs/conf.py`, drop `docs/about/release-notes/`, | ||
| remove or rebase stale redirects. | ||
| - **Wire Doc Autopilot triggers into CI**: a `docs-audit-required` | ||
| PR check for the merge gate, a scheduled workflow for periodic | ||
| re-audit, a labels-or-paths trigger for source-triggered re-audits. | ||
| - **Programmatic counts** — surface classifier / embedder / codec | ||
| inventories from source. | ||
| - **Site-wide grep tool** for the Global Sweep On Accepted P0s rule. | ||
| - **Health metrics** — track broken-link rate, freshness, owner | ||
| coverage. | ||
|
|
||
| ## Own | ||
|
|
||
| **Content:** | ||
|
|
||
| - `fern/` (entire tree); cross-cutting concerns (welcome, | ||
| getting-started, install, glossary, contributor pages, | ||
| release-notes) are your direct audit surface. Scoped stewards | ||
| discover their own impacted pages via root AGENTS.md | ||
| *Impacted-Docs Discovery*. | ||
| - `requirements-docs.txt` | ||
| - Release notes (in `fern/`) | ||
| - `CHANGELOG.md` (cross-owned with the implementing area) | ||
|
|
||
| **Delegation destination.** You are the steward other stewards | ||
| escalate to when a change is *abstraction-level* (reshaped concept, | ||
| terminology shift, restructured mental model) and the calling | ||
| steward can't list useful grep terms in one line. When invoked as | ||
| a subagent with a diff summary + change context, your job is: | ||
| cross-page consistency, IA implications, terminology drift, and | ||
| identifying conceptual pages no symbol-grep would have surfaced. | ||
| Return findings in Steward Signal Format. Don't replicate the | ||
| work the calling steward already did — focus on what they | ||
| *couldn't* do. | ||
|
|
||
| **Tests:** any link / lint / structural checks for `fern/` (add a CI | ||
| gate if not present). | ||
|
|
||
| **Agent artifacts:** `.claude/skills/nemo-curator-docs/`. Apply the | ||
| Docs-First evaluation gate before expanding. | ||
|
|
||
| **CODEOWNERS:** `@NVIDIA-NeMo/docs_team` for both `docs/` and | ||
| `fern/`. | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
latest.ymldoes not existThe Protect section lists
fern/versions/{latest,main,v25.09,v26.02,v26.04}.ymlas the canonical set of version files, butfern/versions/latest.ymldoes not exist in the repository (onlymain.yml,v25.09.yml,v26.02.yml, andv26.04.ymlare present). This is the exact pattern the rootAGENTS.mddefines as "Fabricated CLI / config fields" — and this steward file is the first place an agent will look when managingfern/versioning. An agent that trusts this inventory will attempt to read, validate, or update a file that doesn't exist, then treat the missing file as a regression rather than an authoring error.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disagreeing —
fern/versions/latest.ymldoes exist on this branch:The file is referenced as the redirect-only manifest (no matching
latest/directory, which the steward already calls out two lines later: "latest.ymlis redirect-only — nolatest/directory"). This is the second "Unverified finding regression" pattern on this PR from this tool — the Known Regression Pattern is in rootAGENTS.mdfor a reason. No code change needed.