From 7c2e8af0523685667be72589c53e3c4741093fc4 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Mon, 26 Jan 2026 11:22:02 -0500 Subject: [PATCH 01/18] staging Signed-off-by: Lawrence Lane --- GH-TOPICS.MD | 52 ++++++++ profile/README.md | 317 ++++++++++++++++++++++++++++++++++------------ 2 files changed, 287 insertions(+), 82 deletions(-) create mode 100644 GH-TOPICS.MD diff --git a/GH-TOPICS.MD b/GH-TOPICS.MD new file mode 100644 index 0000000..973ab13 --- /dev/null +++ b/GH-TOPICS.MD @@ -0,0 +1,52 @@ +## GitHub Topics Strategy + +### Topic Categories + +| Category | Topics | Purpose | +|----------|--------|---------| +| **Pipeline Stage** | `stage-data`, `stage-training`, `stage-alignment`, `stage-evaluation`, `stage-deployment`, `stage-safety` | Filter by workflow stage | +| **Model Type** | `model-llm`, `model-vlm`, `model-speech`, `model-diffusion`, `model-omni` | Filter by model modality | +| **Training Method** | `method-pretraining`, `method-sft`, `method-lora`, `method-rl`, `method-dpo`, `method-grpo` | Filter by training technique | +| **Backend** | `backend-megatron`, `backend-pytorch`, `backend-vllm`, `backend-tensorrt` | Filter by infrastructure | +| **Meta** | `nvidia-nemo` | All repos in the framework | + + + +### Example Filter Links + +Once topics are applied, we you add convenience links to your project's README and docs: + +```markdown +**Browse by pipeline stage:** +[Data](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:stage-data) · +[Training](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:stage-training) · +[Alignment](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:stage-alignment) · +[Evaluation](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:stage-evaluation) · +[Deployment](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:stage-deployment) + +**Browse by model type:** +[LLM](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:model-llm) · +[VLM](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:model-vlm) · +[Speech](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:model-speech) · +[Diffusion](https://github.com/orgs/NVIDIA-NeMo/repositories?q=topic:model-diffusion) +``` \ No newline at end of file diff --git a/profile/README.md b/profile/README.md index 47f739c..e29b3ca 100644 --- a/profile/README.md +++ b/profile/README.md @@ -3,90 +3,243 @@ SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. SPDX-License-Identifier: Apache-2.0 --> -## NVIDIA NeMo Framework Overview - -NeMo Framework is NVIDIA's GPU accelerated, fully open-source, end-to-end training framework for large language models (LLMs), multi-modal models, diffusion and speech models. It enables seamless scaling of pretraining, post-training, and reinforcement learning workloads from single GPU to thousand-node clusters for both 🤗Hugging Face/PyTorch and Megatron models. This GitHub organization includes a suite of libraries and recipe collections to help users train models from end to end. - -NeMo Framework is also a part of the NVIDIA NeMo software suite for managing the AI agent lifecycle. - -## Latest 📣 announcements and 🗣️ discussions -### 🐳 NeMo AutoModel -- [10/6/2025][Enabling PyTorch Native Pipeline Parallelism for 🤗 Hugging Face Transformer Models](https://github.com/NVIDIA-NeMo/Automodel/discussions/589) -- [9/22/2025][Fine-tune Hugging Face Models Instantly with Day-0 Support with NVIDIA NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel/discussions/477) -- [9/18/2025][🚀 NeMo Framework Now Supports Google Gemma 3n: Efficient Multimodal Fine-tuning Made Simple](https://github.com/NVIDIA-NeMo/Automodel/discussions/494) - -### 🔬 NeMo RL -- [10/1/2025][On-policy Distillation](https://github.com/NVIDIA-NeMo/RL/discussions/1445) -- [9/27/2025][FP8 Quantization in NeMo RL](https://github.com/NVIDIA-NeMo/RL/discussions/1216) -- [8/15/2025][NeMo-RL: Journey of Optimizing Weight Transfer in Large MoE Models by 10x](https://github.com/NVIDIA-NeMo/RL/discussions/1189) - -### 💬 NeMo Speech -- [8/1/2025][Guide to Fine-tune Nvidia NeMo models with Granary Data](https://github.com/NVIDIA-NeMo/NeMo/discussions/14758) - -More to come and stay tuned! - -## Getting Started - -||Installation|Checkpoint Conversion HF<>Megatron|LLM example recipes and scripts|VLM example recipes and scripts| -|-|-|-|-|-| -|1 ~ 1,000 GPUs|[NeMo Automodel](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#getting-started), [NeMo RL](https://github.com/NVIDIA-NeMo/RL?tab=readme-ov-file#prerequisites)|No Need|[Pre-training](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-pre-training), [SFT](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-supervised-fine-tuning-sft), [LoRA](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-parameter-efficient-fine-tuning-peft), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py)|[SFT](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#vlm-supervised-fine-tuning-sft), [LoRA](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#vlm-parameter-efficient-fine-tuning-peft), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py) -|Over 1,000 GPUs|[NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge?tab=readme-ov-file#-installation), [NeMo RL](https://github.com/NVIDIA-NeMo/RL?tab=readme-ov-file#prerequisites)|[Conversion](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/README.md)|[Pretrain, SFT, and LoRA](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py) with [megatron_cfg](https://github.com/NVIDIA-NeMo/RL/blob/fa379fffbc9c5580301fa748dbba269c7d90f883/examples/configs/dpo.yaml#L99), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py) with [megatron_cfg](https://github.com/NVIDIA-NeMo/RL/blob/fa379fffbc9c5580301fa748dbba269c7d90f883/examples/configs/grpo_math_1B_megatron.yaml#L79)|[SFT, LoRA](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [GRPO megatron config](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/configs/vlm_grpo_3B_megatron.yaml)| - -## Repo organization under NeMo Framework - -### Summary of key functionalities and container strategy of each repo - -Visit the individual repos to find out more 🔍, raise :bug:, contribute ✍️ and participate in discussion forums 🗣️! - -Note: The NeMo Framework is currently in the process of restructuring. The original NeMo 2.0 repository will now focus specifically on speech-related components, while other parts of the framework are being modularized into separate libraries such as NeMo Automodel, NeMo Gym, NeMo RL, and more. This transition aims to make NeMo more modular and developer-friendly. -

- -|Repo|Key Functionality & Documentation Link|Training Loop|Training Backends|Infernece Backends|Model Coverage|Container| -|-|-|-|-|-|-|-| -|[NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge)|[Pretraining, LoRA, SFT](https://docs.nvidia.com/nemo/megatron-bridge/latest/)|PyT native loop|Megatron-core|NA|LLM & VLM|NeMo Framework Container -|[NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)|[Pretraining, LoRA, SFT](https://docs.nvidia.com/nemo/automodel/latest/index.html)|PyT native loop|PyTorch|NA|LLM, VLM, Omni, VFM|NeMo AutoModel Container| -|[Previous NeMo 2.0 Repo -> will be repurposed to focus on Speech](https://github.com/NVIDIA-NeMo/NeMo)|[Pretraining,SFT](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html)|PyTorch Lightning Loop|Megatron-core & PyTorch|RIVA|Speech|NA| -|[NeMo RL](https://github.com/NVIDIA-NeMo/RL)|[SFT, RL](https://docs.nvidia.com/nemo/rl/latest/index.html)|PyT native loop|Megatron-core & PyTorch|vLLM|LLM, VLM|NeMo RL container| -|[NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)|[RL Environment, integrate with RL Framework](https://docs.nvidia.com/nemo/gym/latest/index.html)|NA|NA|NA|NA|NeMo RL Container (WIP)| -|[NeMo Aligner (deprecated)](https://github.com/NVIDIA/NeMo-Aligner)|SFT, RL|PyT Lightning Loop|Megatron-core|TRTLLM|LLM|NA -|[NeMo Curator](https://github.com/NVIDIA-NeMo/Curator)|[Data curation](https://docs.nvidia.com/nemo/curator/latest/)|NA|NA|NA|Agnostic|NeMo Curator Container| -|[NeMo Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)|[Model evaluation](https://docs.nvidia.com/nemo/evaluator/latest/)|NA|NA||Agnostic|NeMo Framework Container| -|[NeMo Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy)|[Export to Production](https://docs.nvidia.com/nemo/export-deploy/latest/index.html)|NA|NA|vLLM, TRT, TRTLLM, ONNX|Agnostic|NeMo Framework Container| -|[NeMo Run](https://github.com/NVIDIA-NeMo/Run)|[Experiment launcher](https://docs.nvidia.com/nemo/run/latest/)|NA|NA|NA|Agnostic|NeMo Framework Container| -|[NeMo Guardrails](https://github.com/NVIDIA-NeMo/Guardrails)|[Guardrail model response](https://docs.nvidia.com/nemo/guardrails/latest/)|NA|NA|NA||NA| -|[NeMo Skills](https://github.com/NVIDIA-NeMo/Skills)|[Reference pipeline for SDG & Eval](https://nvidia.github.io/NeMo-Skills/)|NA|NA|NA|Agnostic|NA| -|[NeMo Emerging Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers)|[Collection of Optimizers](https://docs.nvidia.com/nemo/emerging-optimizers/0.1.0/index.html)|NA|Agnostic|NA|NA|NA| -|[NeMo DFM](https://github.com/NVIDIA-NeMo/DFM/tree/main)|[Diffusion foundation model training](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs)|PyT native loop|Megatron-core and PyTorch|NA|Diffusion models|NA| -|[Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)|Developer asset hub for Nemotron models|NA|NA|NA|Nemotron models|NA| -|[NeMo Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner)|Synthetic data generation library|NA|NA|NA|NA|NA| - -
- Table 1. NeMo Framework Repos -
-

- -### Diagram Ilustration of Repos under NeMo Framework (WIP) - - ![image](/RepoDiagram.png) - -
- Figure 1. NeMo Framework Repo Overview -
-

- -### Some background motivations and historical contexts -The NeMo GitHub Org and its repo collections are created to address the following problems -* **Need for composability**: The [Previous NeMo 2.0 version](https://github.com/NVIDIA/NeMo) is monolithic and encompasses too many things, making it hard for users to find what they need. Container size is also an issue. Breaking down the Monolithic repo into a series of functional-focused repos to facilitate code discovery. -* **Need for customizability**: The [Previous NeMo 2.0 version](https://github.com/NVIDIA/NeMo) uses PyTorch Lighting as the default trainer loop, which provides some out of the box functionality but making it hard to customize. [NeMo Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel), and [NeMo RL](https://github.com/NVIDIA-NeMo/RL) have adopted pytorch native custom loop to improve flexibility and ease of use for developers. +# NVIDIA NeMo Framework - +This GitHub org contains libraries for training, data curation, evaluation, alignment, and deployment. Scale from a single GPU to 10,000+ nodes with day-0 Hugging Face support or Megatron backends for maximum throughput. + +--- + +## Choose Your Path + + + + + + + +
+ +### Get Started + +**Start with [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)** – the simplest path to fine-tuning Hugging Face models on NVIDIA GPUs. + +```bash +pip install nemo-automodel +``` + +```python +from nemo_automodel import AutoModelForCausalLM, Trainer + +model = AutoModelForCausalLM.from_pretrained( + "meta-llama/Llama-3.3-70B-Instruct" +) +trainer = Trainer(model=model, train_dataset=dataset) +trainer.train() +``` + +[→ AutoModel Quick Start](https://docs.nvidia.com/nemo/automodel/latest/launcher/local-workstation.html#quick-start-choose-your-job-launch-option) + + + +### Scale Training + +Choose your training approach: + +- **< 1,000 GPUs**: [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) +- **1,000+ GPUs**: [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) +- **RLHF/DPO**: [NeMo RL](https://github.com/NVIDIA-NeMo/RL) + +[→ Training Recipes](#training-recipes) + +### Manage Experiments + +[NeMo Run](https://github.com/NVIDIA-NeMo/Run) for launching and tracking experiments across: + +- Local machines +- SLURM clusters +- Kubernetes + +[→ Run Documentation](https://docs.nvidia.com/nemo/run/latest/) + + + +### Explore Libraries + +- [Curator](https://github.com/NVIDIA-NeMo/Curator) – Data curation at scale +- [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) – Model benchmarking +- [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) – Production deployment +- [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) – Safety rails + +[→ All Libraries](#all-libraries) + +### Use Containers + +Pull optimized containers to get started fast. + +- [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) +- [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) +- [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) +- [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) + +[→ Explore NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/containers) +
+ +
+📋 Decision Guide — Which library should I use? + +| I want to... | Models | Scale | Library | Docs | +|--------------|--------|-------|---------|------| +| **Train/fine-tune** | LLM, VLM | ≤1K GPUs | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| **Train at scale** | LLM, VLM | 1K+ GPUs | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| **Align** (DPO/GRPO) | LLM, VLM | Any | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [docs](https://docs.nvidia.com/nemo/rl/latest/) | +| **Curate data** | — | Any | [Curator](https://github.com/NVIDIA-NeMo/Curator) | [docs](https://docs.nvidia.com/nemo/curator/latest/) | +| **Evaluate** | Any | — | [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | +| **Deploy** | Any | — | [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | +| **Speech AI** | ASR, TTS | Any | [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | [docs](https://docs.nvidia.com/nemo/speech/latest/) | + +
+ +--- + +## Training Recipes + +| Library | LLM Recipes | VLM Recipes | +|---------|-------------|-------------| +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [Llama](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/llama3_2), [Qwen](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/qwen), [Gemma](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/gemma), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_pretrain/deepseekv3_pretrain.yaml), [Mistral](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/mistral), [Phi](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/phi) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/qwen2_5), [Gemma 3n VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3n) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [Llama](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [Qwen](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/deepseek/deepseek_v3.py), [Gemma 3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma/gemma3.py), [Nemotron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/nemotronh/nemotronh.py) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma3_vl/gemma3_vl.py), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [Qwen3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen3vl.py) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_sft.py) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/grpo.md), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/sft.md) | + + +--- + +## All Libraries + +### Pipeline Overview + +```mermaid +flowchart LR + subgraph Data + Curator + DataDesigner[Data Designer] + Skills + end + + subgraph Training + AutoModel + MBridge[Megatron-Bridge] + end + + subgraph Alignment + RL[NeMo RL] + end + + subgraph Evaluation + Evaluator + end + + subgraph Deployment + Export[Export-Deploy] + Guardrails + end + + Gym[NeMo Gym] + + Data --> Training + Training --> Alignment + Training --> Evaluation + Alignment --> Evaluation + Evaluation --> Deployment + + Gym -.-> RL + Skills -.-> Evaluator +``` + +### Data + +| Repo | Description | Docs | Container | +|------|-------------|------|-----------| +| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data curation at scale | [docs](https://docs.nvidia.com/nemo/curator/latest/) | [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) | +| [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Synthetic data generation | [docs](https://nvidia-nemo.github.io/DataDesigner/latest/) | — | +| [Skills](https://github.com/NVIDIA-NeMo/Skills) | SDG pipelines (math, code, science datasets) | [docs](https://nvidia-nemo.github.io/Skills/) | — | + +### Training + +| Repo | Description | Backend | Models | Docs | Container | +|------|-------------|---------|--------|------|-----------| +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Pretraining, SFT, LoRA | PyTorch | LLM, VLM, Omni | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Pretraining, SFT, LoRA | Megatron-core | LLM, VLM | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | +| [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | Pretraining, SFT | Megatron-core | Speech | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | — | +| [DFM](https://github.com/NVIDIA-NeMo/DFM) | Diffusion training | Megatron-core | Diffusion | [docs](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | — | + +### Alignment + +| Repo | Description | Backend | Models | Docs | Container | +|------|-------------|---------|--------|------|-----------| +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | SFT, DPO, GRPO | Megatron-core, vLLM | LLM, VLM | [docs](https://docs.nvidia.com/nemo/rl/latest/) | [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) | +| [Gym](https://github.com/NVIDIA-NeMo/Gym) | RL environments | — | LLM, VLM | [docs](https://docs.nvidia.com/nemo/gym/latest/index.html) | — | + +### Evaluation + +| Repo | Description | Docs | Container | +|------|-------------|------|-----------| +| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Model benchmarking | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | +| [Skills](https://github.com/NVIDIA-NeMo/Skills) | Evaluation pipelines (math, code, science, etc.) | [docs](https://nvidia-nemo.github.io/Skills/) | — | + +### Deployment + +| Repo | Description | Backends | Docs | Container | +|------|-------------|----------|------|-----------| +| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Export to production | vLLM, TRT-LLM, ONNX | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | +| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Safety rails | — | [docs](https://docs.nvidia.com/nemo/guardrails/latest/) | — | + +### Infrastructure + +| Repo | Description | Docs | Container | +|------|-------------|------|-----------| +| [Run](https://github.com/NVIDIA-NeMo/Run) | Experiment launcher | [docs](https://docs.nvidia.com/nemo/run/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | +| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Collection of optimizers | [docs](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html) | — | +| [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes for Nemotron models | [docs](https://github.com/NVIDIA-NeMo/Nemotron#readme) | — | + +### Architecture Reference + +![Framework Architecture](RepoDiagram.png) + +*Architectural layers and dependencies across the NeMo Framework.* + +--- + +## Community + + + + + + +
+ +### 💬 Get Involved + +**[GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions)** — Questions, ideas, and announcements + +- [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) +- [Contributing Guide](https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md) +- [Release Notes](https://docs.nvidia.com/nemo/releases/) + + + +### 📣 Latest + +**🐳 AutoModel** +- [PyTorch Native Pipeline Parallelism for HF Models](https://github.com/orgs/NVIDIA-NeMo/discussions) *(Oct 2025)* +- [Day-0 Hugging Face Support](https://github.com/orgs/NVIDIA-NeMo/discussions) *(Sep 2025)* +- [Gemma 3n Multimodal Fine-tuning](https://github.com/orgs/NVIDIA-NeMo/discussions) *(Sep 2025)* + +**🔬 NeMo RL** — [On-policy Distillation](https://github.com/orgs/NVIDIA-NeMo/discussions), [FP8 Quantization](https://github.com/orgs/NVIDIA-NeMo/discussions), [10× MoE Weight Transfer](https://github.com/orgs/NVIDIA-NeMo/discussions) + +
## License -Apache 2.0 licensed with third-party attributions documented in each repository. +Apache 2.0. Third-party attributions in each repository. From ac8672747786b56236d9c8f0b61a767647be2616 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Mon, 26 Jan 2026 11:29:20 -0500 Subject: [PATCH 02/18] update Signed-off-by: Lawrence Lane --- profile/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/profile/README.md b/profile/README.md index e29b3ca..1563c32 100644 --- a/profile/README.md +++ b/profile/README.md @@ -50,7 +50,7 @@ Choose your training approach: [→ Training Recipes](#training-recipes) -### Manage Experiments +### Experiment [NeMo Run](https://github.com/NVIDIA-NeMo/Run) for launching and tracking experiments across: @@ -109,7 +109,7 @@ Pull optimized containers to get started fast. |---------|-------------|-------------| | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [Llama](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/llama3_2), [Qwen](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/qwen), [Gemma](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/gemma), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_pretrain/deepseekv3_pretrain.yaml), [Mistral](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/mistral), [Phi](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/phi) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/qwen2_5), [Gemma 3n VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3n) | | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [Llama](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [Qwen](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/deepseek/deepseek_v3.py), [Gemma 3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma/gemma3.py), [Nemotron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/nemotronh/nemotronh.py) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma3_vl/gemma3_vl.py), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [Qwen3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen3vl.py) | -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_sft.py) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/grpo.md), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/docs/guides/sft.md) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_sft.py) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_sft.py) | --- @@ -221,8 +221,8 @@ flowchart LR **[GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions)** — Questions, ideas, and announcements - [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) -- [Contributing Guide](https://github.com/NVIDIA/NeMo/blob/stable/CONTRIBUTING.md) -- [Release Notes](https://docs.nvidia.com/nemo/releases/) +- Follow each repos CONTRIBUTING guide to get started +- [Release Notes](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html) From d759a438f7adcb4a3f8bb390594243b6497c466b Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Mon, 26 Jan 2026 11:33:41 -0500 Subject: [PATCH 03/18] alignment Signed-off-by: Lawrence Lane --- profile/README.md | 3 --- 1 file changed, 3 deletions(-) diff --git a/profile/README.md b/profile/README.md index 1563c32..14d46de 100644 --- a/profile/README.md +++ b/profile/README.md @@ -42,8 +42,6 @@ trainer.train() ### Scale Training -Choose your training approach: - - **< 1,000 GPUs**: [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) - **1,000+ GPUs**: [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) - **RLHF/DPO**: [NeMo RL](https://github.com/NVIDIA-NeMo/RL) @@ -68,7 +66,6 @@ Choose your training approach: - [Curator](https://github.com/NVIDIA-NeMo/Curator) – Data curation at scale - [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) – Model benchmarking - [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) – Production deployment -- [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) – Safety rails [→ All Libraries](#all-libraries) From 6971797f2e6fdf3c9be19054922b1323657d574e Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Mon, 26 Jan 2026 11:34:26 -0500 Subject: [PATCH 04/18] word Signed-off-by: Lawrence Lane --- profile/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/profile/README.md b/profile/README.md index 14d46de..5a5d830 100644 --- a/profile/README.md +++ b/profile/README.md @@ -63,7 +63,7 @@ trainer.train() ### Explore Libraries -- [Curator](https://github.com/NVIDIA-NeMo/Curator) – Data curation at scale +- [Curator](https://github.com/NVIDIA-NeMo/Curator) – Data curation - [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) – Model benchmarking - [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) – Production deployment From 1682c52ba73ee8950ec40f908cfcd8df37f17589 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Wed, 28 Jan 2026 10:01:01 -0500 Subject: [PATCH 05/18] update Signed-off-by: Lawrence Lane --- profile/README.md | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/profile/README.md b/profile/README.md index 5a5d830..55d77a4 100644 --- a/profile/README.md +++ b/profile/README.md @@ -169,6 +169,7 @@ flowchart LR | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Pretraining, SFT, LoRA | Megatron-core | LLM, VLM | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | | [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | Pretraining, SFT | Megatron-core | Speech | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | — | | [DFM](https://github.com/NVIDIA-NeMo/DFM) | Diffusion training | Megatron-core | Diffusion | [docs](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | — | +| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Collection of optimizers | — | — | [docs](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html) | — | ### Alignment @@ -191,13 +192,17 @@ flowchart LR | [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Export to production | vLLM, TRT-LLM, ONNX | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | | [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Safety rails | — | [docs](https://docs.nvidia.com/nemo/guardrails/latest/) | — | +### Models and Recipes + +| Repo | Description | Docs | Container | +|------|-------------|------|-----------| +| [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes for Nemotron models | [docs](https://github.com/NVIDIA-NeMo/Nemotron#readme) | — | + ### Infrastructure | Repo | Description | Docs | Container | |------|-------------|------|-----------| | [Run](https://github.com/NVIDIA-NeMo/Run) | Experiment launcher | [docs](https://docs.nvidia.com/nemo/run/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | -| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Collection of optimizers | [docs](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html) | — | -| [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes for Nemotron models | [docs](https://github.com/NVIDIA-NeMo/Nemotron#readme) | — | ### Architecture Reference From 38e37c3cd4dae45ef448d1e082bdb175fd6c8cc8 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Mon, 2 Feb 2026 14:01:02 -0500 Subject: [PATCH 06/18] linkfixes Signed-off-by: Lawrence Lane --- profile/README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/profile/README.md b/profile/README.md index 55d77a4..074bc68 100644 --- a/profile/README.md +++ b/profile/README.md @@ -232,11 +232,13 @@ flowchart LR ### 📣 Latest **🐳 AutoModel** -- [PyTorch Native Pipeline Parallelism for HF Models](https://github.com/orgs/NVIDIA-NeMo/discussions) *(Oct 2025)* -- [Day-0 Hugging Face Support](https://github.com/orgs/NVIDIA-NeMo/discussions) *(Sep 2025)* -- [Gemma 3n Multimodal Fine-tuning](https://github.com/orgs/NVIDIA-NeMo/discussions) *(Sep 2025)* +- [Enabling PyTorch Native Pipeline Parallelism for HF Models](https://github.com/NVIDIA-NeMo/Automodel/discussions/589) *(Oct 2025)* +- [Day-0 Hugging Face Support](https://github.com/NVIDIA-NeMo/Automodel/discussions/477) *(Sep 2025)* +- [Gemma 3n Multimodal Fine-tuning](https://github.com/NVIDIA-NeMo/Automodel/discussions/494) *(Sep 2025)* -**🔬 NeMo RL** — [On-policy Distillation](https://github.com/orgs/NVIDIA-NeMo/discussions), [FP8 Quantization](https://github.com/orgs/NVIDIA-NeMo/discussions), [10× MoE Weight Transfer](https://github.com/orgs/NVIDIA-NeMo/discussions) +**🔬 NeMo RL** — [On-policy Distillation](https://github.com/NVIDIA-NeMo/RL/discussions/1445), [FP8 Quantization](https://github.com/NVIDIA-NeMo/RL/discussions/1216), [10× MoE Weight Transfer](https://github.com/NVIDIA-NeMo/RL/discussions/1189) + +**💬 NeMo Speech** — [Fine-tune NeMo models with Granary Data](https://github.com/NVIDIA-NeMo/NeMo/discussions/14758) From 9dcf45d7113ed72fe49b6e178192b0b05180110a Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Thu, 28 May 2026 14:08:14 -0400 Subject: [PATCH 07/18] example Signed-off-by: Lawrence Lane --- .github/workflows/fern-docs-ci.yml | 36 ++ .github/workflows/publish-fern-docs.yml | 42 ++ README.md | 9 +- fern/README.md | 59 +++ fern/components/RepoCatalog.tsx | 236 ++++++++++ fern/components/repos.ts | 241 +++++++++++ fern/docs.yml | 56 +++ fern/docs/pages/community.mdx | 46 ++ fern/docs/pages/getting-started.mdx | 85 ++++ fern/docs/pages/index.mdx | 119 ++++++ fern/docs/pages/libraries.mdx | 146 +++++++ fern/docs/pages/repositories.mdx | 22 + fern/fern.config.json | 4 + fern/tsconfig.json | 14 + nemo-fw-presentation-outline.md | 364 ++++++++++++++++ nemo-fw-product-walkthrough.md | 547 ++++++++++++++++++++++++ profile/README.md | 239 +---------- 17 files changed, 2037 insertions(+), 228 deletions(-) create mode 100644 .github/workflows/fern-docs-ci.yml create mode 100644 .github/workflows/publish-fern-docs.yml create mode 100644 fern/README.md create mode 100644 fern/components/RepoCatalog.tsx create mode 100644 fern/components/repos.ts create mode 100644 fern/docs.yml create mode 100644 fern/docs/pages/community.mdx create mode 100644 fern/docs/pages/getting-started.mdx create mode 100644 fern/docs/pages/index.mdx create mode 100644 fern/docs/pages/libraries.mdx create mode 100644 fern/docs/pages/repositories.mdx create mode 100644 fern/fern.config.json create mode 100644 fern/tsconfig.json create mode 100644 nemo-fw-presentation-outline.md create mode 100644 nemo-fw-product-walkthrough.md diff --git a/.github/workflows/fern-docs-ci.yml b/.github/workflows/fern-docs-ci.yml new file mode 100644 index 0000000..2552ae0 --- /dev/null +++ b/.github/workflows/fern-docs-ci.yml @@ -0,0 +1,36 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 + +name: Fern docs (check) + +on: + pull_request: + paths: + - 'fern/**' + - '.github/workflows/fern-docs-ci.yml' + +permissions: + contents: read + +jobs: + check: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v6 + + - name: Setup Node.js + uses: actions/setup-node@v6 + with: + node-version: '22' + + - name: Install Fern CLI + run: npm install -g fern-api@$(jq -r .version fern/fern.config.json) + + - name: Validate Fern configuration + working-directory: ./fern + env: + FERN_TOKEN: ${{ secrets.DOCS_FERN_TOKEN }} + run: | + fern check + fern docs md check diff --git a/.github/workflows/publish-fern-docs.yml b/.github/workflows/publish-fern-docs.yml new file mode 100644 index 0000000..e470f0b --- /dev/null +++ b/.github/workflows/publish-fern-docs.yml @@ -0,0 +1,42 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Publishes the NeMo Framework hub Fern site. +# git tag docs/v0.1.0 && git push origin docs/v0.1.0 +# Requires org secret: DOCS_FERN_TOKEN + +name: Publish Fern Docs + +on: + push: + tags: + - 'docs/v*' + workflow_dispatch: {} + +permissions: + contents: read + +concurrency: + group: fern-publish-nemo-framework-hub + cancel-in-progress: true + +jobs: + publish: + runs-on: ubuntu-latest + steps: + - name: Checkout repository + uses: actions/checkout@v6 + + - name: Setup Node.js + uses: actions/setup-node@v6 + with: + node-version: '22' + + - name: Install Fern CLI + run: npm install -g fern-api@$(jq -r .version fern/fern.config.json) + + - name: Publish documentation + working-directory: ./fern + env: + FERN_TOKEN: ${{ secrets.DOCS_FERN_TOKEN }} + run: fern generate --docs diff --git a/README.md b/README.md index a46ae92..73bbbd9 100644 --- a/README.md +++ b/README.md @@ -1 +1,8 @@ -# .github \ No newline at end of file +# NVIDIA-NeMo/.github + +GitHub organization profile and NeMo Framework hub documentation. + +- **Org profile** — `profile/README.md` (shown on [github.com/NVIDIA-NeMo](https://github.com/NVIDIA-NeMo)) +- **Hub docs (Fern)** — `fern/` → [docs.nvidia.com/nemo](https://docs.nvidia.com/nemo) when published + +See [fern/README.md](fern/README.md) for local preview and publish steps. diff --git a/fern/README.md b/fern/README.md new file mode 100644 index 0000000..3358a30 --- /dev/null +++ b/fern/README.md @@ -0,0 +1,59 @@ +# NeMo Framework hub documentation (Fern) + +Hub site for the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization. Routes visitors to each library's documentation using the shared NVIDIA Fern global theme from [fern-components](https://github.com/NVIDIA/fern-components). + +## Directory structure + +``` +fern/ +├── fern.config.json +├── docs.yml +├── components/ +│ ├── repos.ts # canonical org repo list +│ └── RepoCatalog.tsx # searchable catalog UI +└── docs/pages/ + ├── index.mdx + ├── getting-started.mdx + ├── repositories.mdx # full org catalog (primary) + ├── libraries.mdx # lifecycle summary + └── community.mdx +``` + +When NVIDIA-NeMo adds or archives a repo, update `components/repos.ts` to match [the org repository list](https://github.com/orgs/NVIDIA-NeMo/repositories?type=all). + +## Local development + +### Prerequisites + +- Node.js 22+ +- Fern CLI (`npm install -g fern-api`) + +### Preview + +```bash +cd fern +fern login # once, for global theme fetch +fern check +fern docs dev +``` + +Open [http://localhost:3000](http://localhost:3000). + +## Publish + +Publishing uses the NVIDIA Fern organization token (`DOCS_FERN_TOKEN` org secret). + +```bash +git tag docs/v0.1.0 && git push origin docs/v0.1.0 +``` + +Or run the **Publish Fern Docs** workflow from the Actions tab. + +Target URLs (configure in `docs.yml`): + +- Preview: `nemo-framework.docs.buildwithfern.com/nemo` +- Production: `docs.nvidia.com/nemo` + +## Theme + +This site uses `global-theme: nvidia`. Theme assets are owned by the fern-components control repo — do not copy logos, CSS, or footer components here. Update branding in fern-components and re-upload the theme. diff --git a/fern/components/RepoCatalog.tsx b/fern/components/RepoCatalog.tsx new file mode 100644 index 0000000..47e4ee6 --- /dev/null +++ b/fern/components/RepoCatalog.tsx @@ -0,0 +1,236 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + */ +import { useMemo, useState } from "react"; +import { + NEMO_REPOS, + REPO_CATEGORIES, + categoryLabel, + type NemoRepo, + type RepoCategory, +} from "./repos"; + +const ACCENT = "#76B900"; +const BORDER = "var(--border-default, #dddddd)"; +const MUTED = "var(--grayscale-a11, #666666)"; + +function matchesQuery(repo: NemoRepo, query: string): boolean { + const q = query.trim().toLowerCase(); + if (!q) return true; + const haystack = [ + repo.name, + repo.description, + repo.category, + categoryLabel(repo.category), + ...(repo.tags ?? []), + ] + .join(" ") + .toLowerCase(); + return haystack.includes(q); +} + +function RepoCard({ repo }: { repo: NemoRepo }) { + const primaryHref = repo.docsUrl ?? repo.githubUrl; + return ( +
+
+

+ + {repo.name} + +

+ {repo.status === "archived" ? ( + + Archived + + ) : null} +
+

+ {repo.description} +

+
+ + {categoryLabel(repo.category)} + + {(repo.tags ?? []).slice(0, 3).map((tag) => ( + + {tag} + + ))} +
+
+ {repo.docsUrl ? ( + + Documentation + + ) : null} + + GitHub + + {repo.containerUrl ? ( + + NGC container + + ) : null} +
+
+ ); +} + +export default function RepoCatalog() { + const [query, setQuery] = useState(""); + const [category, setCategory] = useState("all"); + + const filtered = useMemo(() => { + return NEMO_REPOS.filter((repo) => { + if (category !== "all" && repo.category !== category) return false; + return matchesQuery(repo, query); + }).sort((a, b) => a.name.localeCompare(b.name)); + }, [query, category]); + + const counts = useMemo(() => { + const byCat: Record = { all: NEMO_REPOS.length }; + for (const repo of NEMO_REPOS) { + byCat[repo.category] = (byCat[repo.category] ?? 0) + 1; + } + return byCat; + }, []); + + return ( +
+
+ + + View on GitHub → + +
+ +
+ {REPO_CATEGORIES.map(({ id, label }) => { + const active = category === id; + const count = counts[id] ?? 0; + return ( + + ); + })} +
+ +

+ Showing {filtered.length} of {NEMO_REPOS.length} repositories in{" "} + NVIDIA-NeMo. +

+ + {filtered.length === 0 ? ( +

No repositories match your search. Try another filter or clear the search box.

+ ) : ( +
+ {filtered.map((repo) => ( + + ))} +
+ )} +
+ ); +} diff --git a/fern/components/repos.ts b/fern/components/repos.ts new file mode 100644 index 0000000..7d4ae92 --- /dev/null +++ b/fern/components/repos.ts @@ -0,0 +1,241 @@ +/** + * SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + * SPDX-License-Identifier: Apache-2.0 + * + * Canonical list of NVIDIA-NeMo GitHub organization repositories. + * https://github.com/orgs/NVIDIA-NeMo/repositories + */ + +export type RepoCategory = + | "data" + | "training" + | "alignment" + | "evaluation" + | "deployment" + | "infrastructure"; + +export type RepoStatus = "active" | "archived"; + +export interface NemoRepo { + /** GitHub repo name (e.g. Automodel) */ + name: string; + description: string; + category: RepoCategory; + githubUrl: string; + docsUrl?: string; + containerUrl?: string; + status?: RepoStatus; + /** Extra facets for search (e.g. speech, agents) */ + tags?: string[]; +} + +export const REPO_CATEGORIES: { id: RepoCategory | "all"; label: string }[] = [ + { id: "all", label: "All" }, + { id: "data", label: "Data" }, + { id: "training", label: "Training" }, + { id: "alignment", label: "Alignment & agents" }, + { id: "evaluation", label: "Evaluation" }, + { id: "deployment", label: "Deployment & safety" }, + { id: "infrastructure", label: "Infrastructure" }, +]; + +/** 23 repositories as listed on the org page (including .github). */ +export const NEMO_REPOS: NemoRepo[] = [ + // Data + { + name: "Curator", + description: "Scalable data preprocessing and curation for text, image, video, and audio.", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/Curator", + docsUrl: "https://docs.nvidia.com/nemo/curator/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator", + tags: ["multimodal", "deduplication"], + }, + { + name: "DataDesigner", + description: "Generate high-quality synthetic data from scratch or from seed data.", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/DataDesigner", + docsUrl: "https://nvidia-nemo.github.io/DataDesigner/latest/", + tags: ["synthetic-data", "mcp"], + }, + { + name: "DataDesignerPlugins", + description: "Plugins extending NeMo Data Designer workflows.", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/DataDesignerPlugins", + tags: ["synthetic-data", "plugins"], + }, + { + name: "Skills", + description: "Reference pipelines for synthetic data generation and evaluation (math, code, science).", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/Skills", + docsUrl: "https://nvidia-nemo.github.io/Skills/", + tags: ["evaluation", "sdg"], + }, + { + name: "Safe-Synthesizer", + description: "Create private, safe versions of sensitive tabular datasets.", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/Safe-Synthesizer", + docsUrl: + "https://docs.nvidia.com/nemo/microservices/latest/generate-private-synthetic-data/", + tags: ["privacy", "tabular"], + }, + { + name: "Anonymizer", + description: "Detect and protect PII through context-aware replacement and rewriting.", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/Anonymizer", + tags: ["pii", "privacy"], + }, + { + name: "SDG-PGMs", + description: "Build probabilistic graphical models (PGMs) for synthetic data generation.", + category: "data", + githubUrl: "https://github.com/NVIDIA-NeMo/SDG-PGMs", + tags: ["synthetic-data", "pgm"], + }, + // Training + { + name: "Automodel", + description: "PyTorch distributed training for LLMs/VLMs with day-0 Hugging Face support.", + category: "training", + githubUrl: "https://github.com/NVIDIA-NeMo/Automodel", + docsUrl: "https://docs.nvidia.com/nemo/automodel/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel", + tags: ["llm", "vlm", "huggingface"], + }, + { + name: "Megatron-Bridge", + description: "Megatron-based training with bidirectional Hugging Face checkpoint conversion.", + category: "training", + githubUrl: "https://github.com/NVIDIA-NeMo/Megatron-Bridge", + docsUrl: "https://docs.nvidia.com/nemo/megatron-bridge/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", + tags: ["llm", "vlm", "megatron"], + }, + { + name: "NeMo", + description: "Speech AI (ASR, TTS) and legacy NeMo toolkit; org focus shifting to modular libs.", + category: "training", + githubUrl: "https://github.com/NVIDIA-NeMo/NeMo", + docsUrl: "https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html", + tags: ["speech", "asr", "tts"], + }, + { + name: "Nemotron", + description: "Developer asset hub — recipes, cookbooks, datasets, and Nemotron reference examples.", + category: "training", + githubUrl: "https://github.com/NVIDIA-NeMo/Nemotron", + docsUrl: "https://github.com/NVIDIA-NeMo/Nemotron#readme", + tags: ["nemotron", "recipes"], + }, + { + name: "Emerging-Optimizers", + description: "Collection of cutting-edge optimizers for large-scale training.", + category: "training", + githubUrl: "https://github.com/NVIDIA-NeMo/Emerging-Optimizers", + docsUrl: "https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html", + }, + { + name: "DFM", + description: "Large-scale diffusion model training and inference (archived).", + category: "training", + githubUrl: "https://github.com/NVIDIA-NeMo/DFM", + docsUrl: "https://github.com/NVIDIA-NeMo/DFM/tree/main/docs", + status: "archived", + tags: ["diffusion"], + }, + // Alignment & agents + { + name: "RL", + description: "Scalable post-training — SFT, DPO, GRPO, distillation, and reinforcement learning.", + category: "alignment", + githubUrl: "https://github.com/NVIDIA-NeMo/RL", + docsUrl: "https://docs.nvidia.com/nemo/rl/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl", + tags: ["dpo", "grpo", "rlhf"], + }, + { + name: "Gym", + description: "RL environments and benchmarks to evaluate and improve models and agents.", + category: "alignment", + githubUrl: "https://github.com/NVIDIA-NeMo/Gym", + docsUrl: "https://docs.nvidia.com/nemo/gym/latest/index.html", + tags: ["environments", "agents"], + }, + { + name: "ProRL-Agent-Server", + description: "Rollout-as-a-service for multi-turn agent RL (pairs with NeMo RL and Gym).", + category: "alignment", + githubUrl: "https://github.com/NVIDIA-NeMo/ProRL-Agent-Server", + docsUrl: "https://github.com/NVIDIA-NeMo/ProRL-Agent-Server#readme", + tags: ["agents", "rollout", "openhands"], + }, + // Evaluation + { + name: "Evaluator", + description: "Scalable, reproducible evaluation across 100+ benchmarks and harnesses.", + category: "evaluation", + githubUrl: "https://github.com/NVIDIA-NeMo/Evaluator", + docsUrl: "https://docs.nvidia.com/nemo/evaluator/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", + tags: ["benchmarks"], + }, + // Deployment & safety + { + name: "Export-Deploy", + description: "Export NeMo and Hugging Face models to TRT-LLM, vLLM, ONNX, and serving stacks.", + category: "deployment", + githubUrl: "https://github.com/NVIDIA-NeMo/Export-Deploy", + docsUrl: "https://docs.nvidia.com/nemo/export-deploy/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", + tags: ["inference", "triton"], + }, + { + name: "Guardrails", + description: "Programmable guardrails for LLM-based conversational systems (Colang).", + category: "deployment", + githubUrl: "https://github.com/NVIDIA-NeMo/Guardrails", + docsUrl: "https://docs.nvidia.com/nemo/guardrails/latest/", + tags: ["safety", "agents"], + }, + { + name: "nemo-platform", + description: "Platform to ship agents that are faster, more accurate, and safer.", + category: "deployment", + githubUrl: "https://github.com/NVIDIA-NeMo/nemo-platform", + tags: ["agents", "platform"], + }, + // Infrastructure + { + name: "Run", + description: "Configure, launch, and manage ML experiments (local, SLURM, Kubernetes).", + category: "infrastructure", + githubUrl: "https://github.com/NVIDIA-NeMo/Run", + docsUrl: "https://docs.nvidia.com/nemo/run/latest/", + containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", + tags: ["experiments", "orchestration"], + }, + { + name: "FW-CI-templates", + description: "CI/CD workflow templates shared across NeMo Framework libraries.", + category: "infrastructure", + githubUrl: "https://github.com/NVIDIA-NeMo/FW-CI-templates", + tags: ["ci", "github-actions"], + }, + { + name: ".github", + description: "Organization profile, this documentation hub, and shared org settings.", + category: "infrastructure", + githubUrl: "https://github.com/NVIDIA-NeMo/.github", + docsUrl: "https://docs.nvidia.com/nemo", + tags: ["org", "hub"], + }, +]; + +export function categoryLabel(category: RepoCategory): string { + return REPO_CATEGORIES.find((c) => c.id === category)?.label ?? category; +} diff --git a/fern/docs.yml b/fern/docs.yml new file mode 100644 index 0000000..622abba --- /dev/null +++ b/fern/docs.yml @@ -0,0 +1,56 @@ +# yaml-language-server: $schema=https://schema.buildwithfern.dev/docs-yml.json + +instances: + - url: nemo-framework.docs.buildwithfern.com/nemo + custom-domain: docs.nvidia.com/nemo + +title: NVIDIA NeMo Framework + +global-theme: nvidia + +logo: + href: /nemo + right-text: NeMo Framework + +navbar-links: + - type: github + value: https://github.com/NVIDIA-NeMo + +experimental: + mdx-components: + - ./components + +redirects: + - source: "/nemo/libraries" + destination: "/nemo/repositories" + - source: "/nemo/index.html" + destination: "/nemo" + - source: "/nemo/index" + destination: "/nemo" + - source: "/nemo/:path*/index.html" + destination: "/nemo/:path*" + - source: "/nemo/:path*.html" + destination: "/nemo/:path*" + +navigation: + - section: Overview + contents: + - page: Home + path: docs/pages/index.mdx + icon: fa-duotone fa-house + - page: Getting Started + path: docs/pages/getting-started.mdx + icon: fa-duotone fa-rocket + - section: Libraries + contents: + - page: Repositories + path: docs/pages/repositories.mdx + icon: fa-duotone fa-grid-2 + - page: Libraries (overview) + path: docs/pages/libraries.mdx + icon: fa-duotone fa-books + - section: Community + contents: + - page: Community + path: docs/pages/community.mdx + icon: fa-duotone fa-comments diff --git a/fern/docs/pages/community.mdx b/fern/docs/pages/community.mdx new file mode 100644 index 0000000..614093a --- /dev/null +++ b/fern/docs/pages/community.mdx @@ -0,0 +1,46 @@ +--- +title: Community +subtitle: Discuss, contribute, and stay up to date +--- + +## Get involved + + + + +Questions, ideas, and announcements across the org. + + + +Browse and star projects in the NVIDIA-NeMo organization. + + + +Framework changelog and release history. + + + + +Each repository includes its own `CONTRIBUTING.md` and issue templates. Open issues or discussions in the repo that owns the component you are using. + +## Recent highlights + +### AutoModel + +- [Enabling PyTorch native pipeline parallelism for HF models](https://github.com/NVIDIA-NeMo/Automodel/discussions/589) +- [Day-0 Hugging Face support](https://github.com/NVIDIA-NeMo/Automodel/discussions/477) +- [Gemma 3n multimodal fine-tuning](https://github.com/NVIDIA-NeMo/Automodel/discussions/494) + +### NeMo RL + +- [On-policy distillation](https://github.com/NVIDIA-NeMo/RL/discussions/1445) +- [FP8 quantization](https://github.com/NVIDIA-NeMo/RL/discussions/1216) +- [10× MoE weight transfer](https://github.com/NVIDIA-NeMo/RL/discussions/1189) + +### NeMo Speech + +- [Fine-tune NeMo models with Granary data](https://github.com/NVIDIA-NeMo/NeMo/discussions/14758) + +## License + +Apache 2.0. Third-party attributions are documented in each repository. diff --git a/fern/docs/pages/getting-started.mdx b/fern/docs/pages/getting-started.mdx new file mode 100644 index 0000000..3fccf33 --- /dev/null +++ b/fern/docs/pages/getting-started.mdx @@ -0,0 +1,85 @@ +--- +title: Getting Started +subtitle: Pick the right library and recipe for your workload +--- + +## Quick start with AutoModel + +The fastest path to fine-tuning Hugging Face models on NVIDIA GPUs: + +```bash +pip install nemo-automodel +``` + +```python +from nemo_automodel import AutoModelForCausalLM, Trainer + +model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") +trainer = Trainer(model=model, train_dataset=dataset) +trainer.train() +``` + + +Local workstation and cluster launch options. + + +## Decision guide + +| I want to… | Models | Scale | Library | Documentation | +| --- | --- | --- | --- | --- | +| Train or fine-tune | LLM, VLM | ≤1K GPUs | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| Train at scale | LLM, VLM | 1K+ GPUs | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Align (DPO/GRPO) | LLM, VLM | Any | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [docs](https://docs.nvidia.com/nemo/rl/latest/) | +| Curate data | — | Any | [Curator](https://github.com/NVIDIA-NeMo/Curator) | [docs](https://docs.nvidia.com/nemo/curator/latest/) | +| Evaluate | Any | — | [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | +| Deploy | Any | — | [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | +| Speech AI | ASR, TTS | Any | [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | + +## Scale and backends + +| GPUs | Installation | Checkpoint conversion | LLM recipes | VLM recipes | +| --- | --- | --- | --- | --- | +| 1–1,000 | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel), [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Not required | [Pretrain](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-pre-training), [SFT](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-supervised-fine-tuning-sft), [LoRA](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-parameter-efficient-fine-tuning-peft), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py) | [SFT](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#vlm-supervised-fine-tuning-sft), [LoRA](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#vlm-parameter-efficient-fine-tuning-peft), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py) | +| 1,000+ | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [HF ↔ Megatron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/README.md) | [Pretrain, SFT, LoRA](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py) | [SFT, LoRA](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/configs/vlm_grpo_3B_megatron.yaml) | + +## Training recipes by library + +| Library | LLM recipes | VLM recipes | +| --- | --- | --- | +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [Llama](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/llama3_2), [Qwen](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/qwen), [Gemma](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/gemma), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_pretrain/deepseekv3_pretrain.yaml), [Mistral](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/mistral), [Phi](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/phi) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/qwen2_5), [Gemma 3n VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3n) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [Llama](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [Qwen](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/deepseek/deepseek_v3.py), [Gemma 3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma/gemma3.py), [Nemotron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/nemotronh/nemotronh.py) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma3_vl/gemma3_vl.py), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [Qwen3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen3vl.py) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_sft.py) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_sft.py) | + +## NGC containers + +Pull optimized containers to get started quickly: + + + + +Megatron-Bridge, Evaluator, Export-Deploy, Run. + + + +PyTorch-native distributed training. + + + +Alignment and reinforcement learning. + + + +Data preprocessing and curation. + + + +Browse all NeMo containers. + + + + +## Experiment tracking + + +Launch and track experiments on local machines, SLURM, and Kubernetes. + diff --git a/fern/docs/pages/index.mdx b/fern/docs/pages/index.mdx new file mode 100644 index 0000000..35006a4 --- /dev/null +++ b/fern/docs/pages/index.mdx @@ -0,0 +1,119 @@ +--- +title: NVIDIA NeMo Framework +subtitle: GPU-accelerated libraries for training, curation, evaluation, alignment, and deployment +slug: "" +--- + +NeMo Framework is NVIDIA's open-source suite for large language models, multimodal models, diffusion, and speech. Scale pretraining, post-training, and reinforcement learning from a single GPU to thousand-node clusters with Hugging Face/PyTorch and Megatron backends. + +The [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization hosts modular libraries and recipes so you can compose only what you need. NeMo Framework is also part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for the AI agent lifecycle. + + +The framework is restructuring from the monolithic NeMo 2.0 repo into focused libraries (AutoModel, RL, Gym, Curator, and more). Speech AI remains in the legacy [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repository. + + +## Choose your path + + + + +Decision guide, installation paths, and training recipes by scale. + + + +Search and filter all 23 NVIDIA-NeMo repos by category. + + + +Discussions, announcements, and how to contribute. + + + + +## Start here + + + + +Fine-tune Hugging Face models on NVIDIA GPUs — the simplest on-ramp for most users. + + + +Large-scale pretraining and SFT at 1,000+ GPUs with Megatron-Core. + + + +SFT, DPO, GRPO, and on-policy distillation for LLMs and VLMs. + + + + +## Pipeline overview + +```mermaid +flowchart LR + subgraph Data + Curator + DataDesigner[Data Designer] + Skills + end + + subgraph Training + AutoModel + MBridge[Megatron-Bridge] + end + + subgraph Alignment + RL[NeMo RL] + end + + subgraph Evaluation + Evaluator + end + + subgraph Deployment + Export[Export-Deploy] + Guardrails + end + + Gym[NeMo Gym] + + Data --> Training + Training --> Alignment + Training --> Evaluation + Alignment --> Evaluation + Evaluation --> Deployment + + Gym -.-> RL + Skills -.-> Evaluator +``` + +## Popular libraries + + + + +Scalable data preprocessing and curation for LLMs and multimodal data. + + + +Model benchmarking across 100+ evaluation harnesses. + + + +Export to vLLM, TensorRT-LLM, ONNX, and production serving. + + + +Programmable safety rails for LLM applications. + + + +Launch experiments on local machines, SLURM, or Kubernetes. + + + +RL environments for model and agent improvement. + + + diff --git a/fern/docs/pages/libraries.mdx b/fern/docs/pages/libraries.mdx new file mode 100644 index 0000000..88a3a86 --- /dev/null +++ b/fern/docs/pages/libraries.mdx @@ -0,0 +1,146 @@ +--- +title: Libraries Overview +subtitle: Lifecycle-oriented summary — use the repository catalog for the full list +--- + +For a searchable catalog of **all 23 repositories** in the organization, see **[Repositories](/repositories)**. + +Each library has its own repository, documentation site, and (where applicable) NGC container. The sections below highlight the main projects by pipeline stage. + +## Data + + + + +Data curation at scale for text, image, video, and audio. + + + +Synthetic data generation from scratch or seed data. + + + +Reference pipelines for synthetic data generation and evaluation. + + + +Privacy-preserving synthetic tabular data. + + + +PII detection and context-aware anonymization. + + + + +| Repo | GitHub | Container | +| --- | --- | --- | +| Curator | [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) | +| Data Designer | [NVIDIA-NeMo/DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) | — | +| Skills | [NVIDIA-NeMo/Skills](https://github.com/NVIDIA-NeMo/Skills) | — | + +## Training + + + + +PyTorch distributed training with day-0 Hugging Face support (LLM, VLM, Omni). + + + +Megatron-Core pretraining, SFT, and LoRA with bidirectional HF conversion. + + + +Speech AI (ASR, TTS) — legacy NeMo repo focused on speech. + + + +Diffusion model training on Megatron-Core. + + + +Collection of cutting-edge optimizers. + + + +Developer asset hub for Nemotron models and recipes. + + + + +| Repo | Backend | Models | Container | +| --- | --- | --- | --- | +| AutoModel | PyTorch | LLM, VLM, Omni | [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) | +| Megatron-Bridge | Megatron-core | LLM, VLM | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | +| NeMo Speech | Megatron-core | Speech | — | + +## Alignment + + + + +SFT, DPO, GRPO, and distillation with Megatron-core and vLLM backends. + + + +RL environments for evaluating and improving models and agents. + + + + +| Repo | Backend | Models | Container | +| --- | --- | --- | --- | +| NeMo RL | Megatron-core, vLLM | LLM, VLM | [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) | +| Gym | — | LLM, VLM | — | + +## Evaluation + + + + +Model benchmarking across 100+ benchmarks and 18+ harnesses. + + + + +| Repo | GitHub | Container | +| --- | --- | --- | +| Evaluator | [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | + +## Deployment and safety + + + + +Export to vLLM, TensorRT-LLM, ONNX, and production serving. + + + +Programmable guardrails for LLM conversational systems. + + + + +| Repo | Inference backends | Container | +| --- | --- | --- | +| Export-Deploy | vLLM, TRT-LLM, ONNX | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | +| Guardrails | — | — | + +## Infrastructure + + + + +Experiment launcher for local, SLURM, and Kubernetes workflows. + + + + +| Repo | GitHub | Container | +| --- | --- | --- | +| Run | [NVIDIA-NeMo/Run](https://github.com/NVIDIA-NeMo/Run) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | + +## Why modular repos? + +The NeMo GitHub organization split the monolithic NeMo 2.0 codebase into focused libraries to improve **composability** (smaller containers, easier discovery) and **customizability** (PyTorch-native training loops in AutoModel, Megatron-Bridge, and RL instead of a single Lightning-centric stack). diff --git a/fern/docs/pages/repositories.mdx b/fern/docs/pages/repositories.mdx new file mode 100644 index 0000000..71a3d3e --- /dev/null +++ b/fern/docs/pages/repositories.mdx @@ -0,0 +1,22 @@ +--- +title: Repositories +subtitle: Every project in the NVIDIA-NeMo GitHub organization +slug: repositories +--- + +Browse all **23 repositories** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) by lifecycle stage. Use search or category filters to find a library, then open its documentation or GitHub repo. + + + +## How repos are grouped + +| Category | What lives here | +| --- | --- | +| **Data** | Curation, synthetic data, PII handling, SDG pipelines | +| **Training** | Pretraining and fine-tuning (AutoModel, Megatron-Bridge, speech, Nemotron) | +| **Alignment & agents** | RL post-training, environments, rollout infrastructure | +| **Evaluation** | Benchmarking and quality measurement | +| **Deployment & safety** | Export, serving, guardrails, agent platform | +| **Infrastructure** | Experiment launchers, shared CI templates, org hub | + +Repos without a published Fern site link to GitHub README or microservice docs until a dedicated docs instance is added. diff --git a/fern/fern.config.json b/fern/fern.config.json new file mode 100644 index 0000000..aacfddd --- /dev/null +++ b/fern/fern.config.json @@ -0,0 +1,4 @@ +{ + "organization": "nvidia", + "version": "5.29.0" +} diff --git a/fern/tsconfig.json b/fern/tsconfig.json new file mode 100644 index 0000000..9576342 --- /dev/null +++ b/fern/tsconfig.json @@ -0,0 +1,14 @@ +{ + "compilerOptions": { + "target": "ES2020", + "lib": ["ES2020", "DOM", "DOM.Iterable"], + "jsx": "react-jsx", + "module": "ESNext", + "moduleResolution": "bundler", + "strict": true, + "skipLibCheck": true, + "noEmit": true, + "isolatedModules": true + }, + "include": ["components/**/*.ts", "components/**/*.tsx"] +} diff --git a/nemo-fw-presentation-outline.md b/nemo-fw-presentation-outline.md new file mode 100644 index 0000000..524ac87 --- /dev/null +++ b/nemo-fw-presentation-outline.md @@ -0,0 +1,364 @@ +# NeMo Framework: Technical Writer Presentation Outline + +> **Audience:** Technical writing team +> **Goal:** Introduce what NeMo Framework is, walk through each repo, and highlight what writers need to know +> **Estimated time:** 45–60 min +> **Diagrams:** See the `assets/` folder for per-product architecture diagrams + +--- + +## 1. What Is NeMo Framework? + +- **One-liner:** An open-source collection of NVIDIA libraries that covers every stage of the generative AI model lifecycle — from data curation, to training, alignment, evaluation, and deployment. +- Supports LLMs, VLMs, Speech, and Diffusion models. +- Scales from a single GPU on a workstation to 10,000+ GPU nodes on SLURM/Kubernetes clusters. +- Day-0 Hugging Face support: users can train virtually any model on the HF Hub without format conversion. +- All repos live under the **[NVIDIA-NeMo](https://github.com/NVIDIA-NeMo)** GitHub org. Apache 2.0 licensed. + +### Key Talking Points + +- NeMo Framework is **not** a single repo — it is an **ecosystem of ~15 focused repositories**. +- Each repo has its own docs site, container, and release cadence. +- Optimized NGC containers are published for the core repos (AutoModel, Megatron-Bridge, RL, Curator, Evaluator, Export-Deploy). + +--- + +## 2. The Pipeline at a Glance + +![NeMo Framework Pipeline](assets/diagram-00-nemo-framework-pipeline.png) + +``` +Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deployment +``` + +| Stage | Primary Repos | +|-------|---------------| +| **Data** | Curator, Data Designer, Skills | +| **Training** | AutoModel, Megatron-Bridge, NeMo Speech, DFM, Emerging-Optimizers | +| **Alignment** | NeMo RL, NeMo Gym | +| **Evaluation** | Evaluator, Skills | +| **Deployment** | Export-Deploy, Guardrails | +| **Infrastructure** | NeMo Run | +| **Models/Recipes** | Nemotron | + +--- + +## 3. Data Stage + +### 3a. NeMo Curator + +![NeMo Curator](assets/diagram-01-curator.png) + +- **Repo:** [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) — 1,394 stars +- **What it does:** GPU-accelerated data curation at scale for training better AI models. +- **Modalities:** Text, Image, Video, Audio. +- **Highlights for writers:** + - **Text:** 30+ heuristic filters, fuzzy/exact/semantic deduplication (MinHash LSH), language detection, quality classification. + - **Image:** CLIP embeddings, aesthetic filtering, NSFW detection, deduplication. + - **Video:** Scene detection (TransNetV2), clip extraction, motion/aesthetic filtering, GPU H.264 encoding, Cosmos-Embed1 embeddings. + - **Audio:** ASR transcription, WER filtering, quality assessment. + - Powered by **NVIDIA RAPIDS** (cuDF, cuML, cuGraph) + Ray for multi-node scaling. + - Proven results: 16x faster fuzzy dedup on 8 TB dataset; 40% lower TCO vs CPU. +- **Docs:** [docs.nvidia.com/nemo/curator](https://docs.nvidia.com/nemo/curator/latest/) +- **Container:** [NGC NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) + +### 3b. NeMo Data Designer + +![NeMo Data Designer](assets/diagram-02-data-designer.png) + +- **Repo:** [NVIDIA-NeMo/DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) — 698 stars +- **What it does:** Generate high-quality **synthetic datasets** from scratch or from seed data. +- **Highlights for writers:** + - Statistical samplers for controlled distributions (category, numeric, etc.). + - LLM-powered text generation columns with dependency-aware field generation. + - Built-in validators (Python, SQL, custom) and LLM-as-a-judge scoring. + - Preview mode for fast iteration before full-scale generation. + - Supports NVIDIA Build API, OpenAI, OpenRouter, and custom providers. + - CLI for model/provider configuration (`data-designer config`). + - Collects anonymized telemetry on model usage (opt-out available). +- **Docs:** [nvidia-nemo.github.io/DataDesigner](https://nvidia-nemo.github.io/DataDesigner/latest/) + +### 3c. NeMo Skills (Data Side) + +![NeMo Skills](assets/diagram-10-skills.png) + +- **Repo:** [NVIDIA-NeMo/Skills](https://github.com/NVIDIA-NeMo/Skills) — 816 stars +- **What it does:** Synthetic data generation pipelines for math, code, and science datasets. +- **Highlights for writers:** + - End-to-end SDG pipelines: generate → filter → train → evaluate. + - Released major open datasets: OpenMathInstruct-2 (14M pairs), OpenMathReasoning (3.2M CoT solutions), OpenScienceReasoning-2. + - Flexible LLM inference: API providers, local servers, large-scale SLURM jobs. + - Host models with TensorRT-LLM, vLLM, SGLang, or Megatron. +- **Docs:** [nvidia-nemo.github.io/Skills](https://nvidia-nemo.github.io/Skills/) + +--- + +## 4. Training Stage + +### 4a. NeMo AutoModel (Primary — up to ~1K GPUs) + +- **Repo:** [NVIDIA-NeMo/Automodel](https://github.com/NVIDIA-NeMo/Automodel) — 288 stars +- **What it does:** PyTorch DTensor-native SPMD training library for LLMs and VLMs with out-of-the-box Hugging Face support. +- **Highlights for writers:** + - **SPMD philosophy:** Same script runs on 1 GPU or 1000+ — parallelism is configuration, not code rewrites. + - **YAML-driven recipes** with CLI overrides for any field. + - **Supported tasks:** Pretraining, SFT, LoRA (PEFT), Knowledge Distillation. + - **Model coverage (LLM):** Llama 3.x, Qwen 2.5/3, DeepSeek V3/V3.2, Gemma 2/3, Mistral/Mixtral, Phi 2/3/4, GPT-OSS, Nemotron, Moonlight, Baichuan, Seed, GLM, MiniMax, Step — essentially *any* HF causal LM. + - **Model coverage (VLM):** Gemma 3 VL, Gemma 3n VL, Qwen2.5 VL, Qwen3 VL 235B, Kimi K2.5 VL. + - **Parallelism:** FSDP2, Tensor Parallel, Context Parallel, Sequence Parallel, Pipeline Parallel (3D parallelism). + - **Performance features:** FP8 via torchao, sequence packing, distributed checkpointing (SafeTensors). + - **Performance numbers:** DeepSeek V3 671B → 250 TFLOPs/GPU on 256 GPUs; GPT-OSS 20B → 279 TFLOPs/GPU. + - Actively developed — new model support weekly (MiniMax-M2, DeepSeek V3.2, Step 3.5-flash in Feb 2026). + - Install: `pip install nemo-automodel` or `uv sync`. + - **Launch options:** `torchrun`, `automodel` CLI (interactive + SLURM), Kubernetes (coming). +- **Docs:** [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/) +- **Container:** [NGC NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) + +### 4b. NeMo Megatron-Bridge (Scale — 1K+ GPUs) + +- **Repo:** [NVIDIA-NeMo/Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) — 423 stars +- **What it does:** Training library that bridges Hugging Face and [Megatron-Core](https://github.com/NVIDIA/Megatron-LM/tree/main/megatron/core) for maximum throughput at extreme scale. +- **Highlights for writers:** + - **Core capability:** Bidirectional checkpoint conversion between HF and Megatron formats — online, parallelism-aware, memory-efficient. + - **`AutoBridge` API:** Auto-detect model architecture, convert, and materialize Megatron models in a few lines of Python. + - **Supported parallelisms:** TP, PP, VPP, CP, EP, ETP — 6D parallelism for near-linear scaling to thousands of nodes. + - **Training features:** Pretraining, SFT, LoRA/DoRA (PEFT), FP8/BF16/FP4 mixed precision. + - **Model coverage:** Llama 2–3.3, Qwen 2–3 (incl. MoE and VL), DeepSeek V2/V3, Gemma/Gemma 3 VL, Nemotron-H, Nemotron Nano v2/VL, GPT-OSS, GLM-4.5, Mistral/Ministral, Moonlight, OlMoE. + - **PyTorch-native training loop** — refactored from the legacy NeMo training stack for greater flexibility. + - Community adoptions: VeRL, Slime, SkyRL, Mind Lab (trained trillion-parameter GRPO LoRA on 64 H800s). +- **Docs:** [docs.nvidia.com/nemo/megatron-bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +- **Container:** [NGC NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) + +### 4c. NeMo Speech + +- **Repo:** [NVIDIA-NeMo/NeMo](https://github.com/NVIDIA-NeMo/NeMo) +- **What it does:** Pretraining and SFT for speech AI models (ASR, TTS) using Megatron-Core. +- **Highlights for writers:** + - Covers automatic speech recognition (ASR) and text-to-speech (TTS). + - Built on Megatron-Core backend. + - Part of the original NeMo monorepo, now housing the speech-specific workloads. +- **Docs:** [docs.nvidia.com/nemo-framework (Speech AI)](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) + +### 4d. NeMo DFM (Diffusion Foundation Models) + +- **Repo:** [NVIDIA-NeMo/DFM](https://github.com/NVIDIA-NeMo/DFM) — 29 stars +- **What it does:** Training and inference for diffusion models (video, image, text generation). +- **Highlights for writers:** + - **Dual-path architecture:** Megatron Bridge path (max scalability) and AutoModel path (easy experimentation). + - **Supported models:** DiT (Diffusion Transformers), WAN 2.1 (World Action Networks for video). + - Features: Flow Matching, EDM samplers, sequence packing, distributed checkpointing. + - YAML-driven recipes, `uv run` for reproducible environments. +- **Docs:** [github.com/NVIDIA-NeMo/DFM/docs](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) + +### 4e. Emerging-Optimizers + +- **Repo:** [NVIDIA-NeMo/Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) +- **What it does:** Collection of cutting-edge optimizers (e.g., Muon, Dion) for use across training libraries. +- **Docs:** [docs.nvidia.com/nemo/emerging-optimizers](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html) + +--- + +## 5. Alignment Stage + +### 5a. NeMo RL + +- **Repo:** [NVIDIA-NeMo/RL](https://github.com/NVIDIA-NeMo/RL) — 1,306 stars (most-starred in the org) +- **What it does:** Scalable post-training library for reinforcement learning on LLMs and VLMs. +- **Highlights for writers:** + - **Algorithms:** GRPO, GSPO, DAPO, DPO, SFT (w/ LoRA), Reward Modeling (RM), On-policy Distillation. + - **Multi-turn RL:** Tool use, games, multi-step environments. + - **Async RL:** Asynchronous rollouts + replay buffers for fully async GRPO. + - **Two training backends:** + - **DTensor** (PyTorch-native FSDP2, TP, CP, SP, PP) — via NeMo AutoModel. + - **Megatron-Core** (6D parallelism) — via Megatron-Bridge. + - **Two generation backends:** vLLM and Megatron Inference. + - **End-to-end FP8** training + FP8 vLLM generation. + - **VLM support:** SFT and GRPO for vision-language models. + - Ray-based infrastructure for resource management and worker isolation. + - Used to train [Nemotron-3-Nano-30B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8). + - Latest release: v0.5.0 (Jan 2026) with LoRA support for DTensor and Megatron backends. + - Install: `uv venv && uv run python examples/run_grpo.py` +- **Docs:** [docs.nvidia.com/nemo/rl](https://docs.nvidia.com/nemo/rl/latest/) +- **Container:** [NGC NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) + +### 5b. NeMo Gym + +- **Repo:** [NVIDIA-NeMo/Gym](https://github.com/NVIDIA-NeMo/Gym) — 637 stars +- **What it does:** Build RL training environments for LLMs — provides infrastructure to develop environments, scale rollout collection, and integrate with training frameworks. +- **Highlights for writers:** + - Scaffolding for multi-step, multi-turn, and user-modeling RL scenarios. + - Growing collection of **resource servers** (training environments): + - **Agent:** Calendar scheduling, Google Search, Workplace Assistant, Math Advanced Calculations. + - **Coding:** Competitive coding (Code Gen), Mini SWE Agent (SWE-bench style). + - **Knowledge:** MCQA (MMLU/GPQA/HLE style). + - **Math:** Math with Judge (OpenMathReasoning dataset). + - **Instruction following:** IFEval/IFBench style + Structured Outputs. + - Each resource server ships with datasets, configs, tests, and a README. + - Integrates with NeMo RL and other training frameworks. + - Responses API-based agent architecture. + - Early development — APIs evolving. +- **Docs:** [docs.nvidia.com/nemo/gym](https://docs.nvidia.com/nemo/gym/latest/index.html) + +--- + +## 6. Evaluation Stage + +### 6a. NeMo Evaluator + +- **Repo:** [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) — 195 stars +- **What it does:** Open-source platform for robust, reproducible, and scalable LLM evaluation across 100+ benchmarks. +- **Highlights for writers:** + - **Two components:** + - `nemo-evaluator` — core engine that manages harness ↔ model interaction. + - `nemo-evaluator-launcher` — CLI/orchestration layer that handles config, environment selection, and container launch. + - **18 evaluation harnesses** with pre-built NGC containers: + - Language: lm-evaluation-harness (MMLU, GSM8K, ARC, BBH, IFEval, etc.), simple-evals (MATH-500, AIME, HumanEval). + - Code: bigcode-evaluation-harness (MBPP, HumanEval+), livecodebench, compute-eval (CUDA), scicode. + - Safety: garak (vulnerability testing), safety-harness (Aegis v2, WildGuard). + - VLM: vlmevalkit (MMMU, ChartQA, MathVista, OCRBench). + - Specialized: BFCL (function calling), MT-Bench, TAU2-Bench, RULER (long-context), CoDec (contamination detection), MTEB (embeddings). + - Works with any **OpenAI-compatible endpoint** — hosted (build.nvidia.com, NIM) or self-hosted (vLLM, TRT-LLM). + - **Reproducibility by default:** All configs, seeds, and software provenance captured automatically. + - **Scale anywhere:** Local machine, SLURM, Lepton AI, cloud-native backends. + - Install: `pip install nemo-evaluator-launcher` +- **Docs:** [docs.nvidia.com/nemo/evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) + +### 6b. NeMo Skills (Evaluation Side) + +- Skills also provides evaluation pipelines across math (AIME 24/25, HMMT), code (SWE-bench, LiveCodeBench), science (HLE, SciCode, GPQA), instruction following (IFBench, IFEval), long-context (RULER), VLM (MMMU-Pro), and more. +- Easy to parallelize evaluations across SLURM jobs and self-host LLM judges. + +--- + +## 7. Deployment Stage + +### 7a. NeMo Export-Deploy + +- **Repo:** [NVIDIA-NeMo/Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) — 27 stars +- **What it does:** Export NeMo and HF models to optimized inference backends and deploy for efficient serving. +- **Highlights for writers:** + - **Export targets:** TensorRT-LLM, vLLM, ONNX, TensorRT. + - **Deployment options:** NVIDIA Triton Inference Server (PyTriton) and Ray Serve. + - **Model support:** NeMo LLMs, NeMo Multimodal, Hugging Face models, NIM Embedding, NIM Reranking. + - **Precision:** BF16, FP8, INT8 (PTQ, QAT), FP4 (coming soon). + - **Multi-GPU / Multi-instance** deployment support. + - Serves as the bridge from training to production inference. + - Install: `pip install nemo-export-deploy` (lightweight) or use NeMo Framework container for full features. +- **Docs:** [docs.nvidia.com/nemo/export-deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) +- **Container:** Included in [NGC NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) + +### 7b. NeMo Guardrails + +- **Repo:** [NVIDIA-NeMo/Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) — 5,635 stars (highest in the org!) +- **What it does:** Open-source toolkit for adding programmable guardrails to LLM-based conversational applications. +- **Highlights for writers:** + - **5 types of rails:** Input, Dialog, Retrieval, Execution, Output. + - **Colang language** (Python-like DSL) for defining dialog flows and rails — two versions: 1.0 and 2.0. + - **Built-in guardrails library:** Jailbreak detection, fact-checking, hallucination detection, content moderation (Aegis, ActiveFence), sensitive data masking. + - **Use cases:** RAG fact-checking, domain assistants, LLM endpoint safety, LangChain integration. + - Works with OpenAI, Llama, Falcon, Vicuna, Mosaic, and more. + - CLI: `nemoguardrails chat`, `nemoguardrails server`, `nemoguardrails evaluate`. + - OpenAI-compatible server endpoint at `/v1/chat/completions`. + - Published in EMNLP 2023 — academic paper available. + - Latest version: 0.20.0. +- **Docs:** [docs.nvidia.com/nemo/guardrails](https://docs.nvidia.com/nemo/guardrails) + +--- + +## 8. Infrastructure & Tooling + +### 8a. NeMo Run + +- **Repo:** [NVIDIA-NeMo/Run](https://github.com/NVIDIA-NeMo/Run) — 216 stars +- **What it does:** Configure, launch, and manage ML experiments across computing environments. +- **Highlights for writers:** + - **Three core responsibilities:** Configuration, Execution, Management. + - **Pythonic:** Everything configured in Python — no need for multi-tool workflows. + - **Executors:** LocalExecutor, SlurmExecutor, SkypilotExecutor — set up once, scale easily. + - **Modular:** Decouple task from executor; reuse environment configs across tasks. + - Built on Fiddle (Google), TorchX, Skypilot, XManager. + - Pre-release — API subject to change before v1.0. +- **Docs:** [docs.nvidia.com/nemo/run](https://docs.nvidia.com/nemo/run/latest/) + +### 8b. Nemotron (Models & Recipes) + +- **Repo:** [NVIDIA-NeMo/Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) +- **What it does:** Recipes and scripts for NVIDIA's Nemotron model family. +- Contains training recipes, configuration files, and documentation for reproducing Nemotron models. + +--- + +## 9. How the Repos Connect — Interoperability Map + +![NeMo Framework Interoperability Map](assets/diagram-13-interoperability-map.png) + +| From | To | How | +|------|----|-----| +| **Curator** → Training | Curated datasets feed into AutoModel / Megatron-Bridge / RL | +| **Data Designer** → Training | Synthetic data generation → SFT / RLHF datasets | +| **Skills** → Training / Eval | SDG pipelines + evaluation benchmarks | +| **AutoModel** ↔ **HF Hub** | Day-0 support — no checkpoint conversion needed | +| **AutoModel** → **NeMo RL** | Checkpoints used directly as starting points for DPO/GRPO | +| **Megatron-Bridge** ↔ **HF Hub** | Bidirectional checkpoint conversion via `AutoBridge` | +| **Megatron-Bridge** → **NeMo RL** | Megatron-Core training backend for RL at scale | +| **NeMo Gym** → **NeMo RL** | RL environments provide training data + reward signals | +| **Training** → **Evaluator** | Evaluate trained models against 100+ benchmarks | +| **Training** → **Export-Deploy** | Export to TRT-LLM / vLLM / ONNX for production | +| **Export-Deploy** → **Guardrails** | Add safety rails on top of deployed models | +| **NeMo Run** → All | Experiment launcher for any library across local/SLURM/K8s | + +--- + +## 10. NGC Containers Quick Reference + +| Container | Key Libraries Included | +|-----------|----------------------| +| [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | Megatron-Bridge, Export-Deploy, Evaluator, Run | +| [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) | AutoModel | +| [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) | NeMo RL (+ vLLM, Megatron) | +| [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) | Curator + RAPIDS | + +--- + +## 11. Key Themes for Technical Writers + +1. **Two training paths story:** AutoModel (PyTorch-native, easy HF, up to ~1K GPUs) vs. Megatron-Bridge (Megatron-Core, 6D parallelism, 1K+ GPUs). Several repos (DFM, RL) support both backends — consistent messaging is important. +2. **Day-0 HF support:** A major selling point. Both AutoModel and Megatron-Bridge work with HF models — but the mechanism differs (DTensor native vs. `AutoBridge` conversion). +3. **YAML-driven recipes:** AutoModel, Megatron-Bridge, RL, and DFM all follow a YAML config + CLI override pattern. Docs should be consistent in how they describe this. +4. **`uv` as the package manager:** Several repos have adopted `uv` for reproducible environments. Writers should document `uv venv`, `uv sync`, `uv run` patterns. +5. **Scale spectrum:** The framework spans from `pip install` on a laptop to 10K+ GPU SLURM clusters. Docs should provide clear paths for each scale. +6. **Multi-repo cross-references:** Many workflows span 2+ repos (e.g., Curator → AutoModel → RL → Evaluator → Export-Deploy). Cross-linking and journey-based docs are critical. +7. **Active development:** Several repos (AutoModel, RL, Gym) are under heavy active development with weekly model additions. Docs need processes for rapid updates. + +--- + +## 12. Suggested Discussion / Q&A Topics + +- How should we handle cross-repo documentation? (Unified getting-started guide vs. per-repo docs) +- What is the versioning/release cadence across repos? +- Which repos are highest priority for documentation investment? +- How do we keep model support tables current? +- Should we have a shared glossary across all NeMo docs sites? +- Container documentation: one page per container or integrated into each repo's docs? + +--- + +## Appendix: Repo Summary Table + +| Repo | Stage | Stars | One-Liner | Docs | +|------|-------|-------|-----------|------| +| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data | 1,394 | GPU-accelerated data curation (text, image, video, audio) | [link](https://docs.nvidia.com/nemo/curator/latest/) | +| [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Data | 698 | Synthetic data generation from scratch or seed data | [link](https://nvidia-nemo.github.io/DataDesigner/latest/) | +| [Skills](https://github.com/NVIDIA-NeMo/Skills) | Data + Eval | 816 | SDG pipelines + evaluation for math, code, science | [link](https://nvidia-nemo.github.io/Skills/) | +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Training | 288 | PyTorch DTensor-native training with HF support | [link](https://docs.nvidia.com/nemo/automodel/latest/) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Training | 423 | Megatron-Core training with bidirectional HF conversion | [link](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| [NeMo (Speech)](https://github.com/NVIDIA-NeMo/NeMo) | Training | — | Speech AI (ASR, TTS) on Megatron-Core | [link](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | +| [DFM](https://github.com/NVIDIA-NeMo/DFM) | Training | 29 | Diffusion model training (video, image) | [link](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | +| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Training | — | Collection of cutting-edge optimizers | [link](https://docs.nvidia.com/nemo/emerging-optimizers/latest/) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Alignment | 1,306 | Scalable post-training (GRPO, DPO, SFT, distillation) | [link](https://docs.nvidia.com/nemo/rl/latest/) | +| [Gym](https://github.com/NVIDIA-NeMo/Gym) | Alignment | 637 | RL environments for LLM training | [link](https://docs.nvidia.com/nemo/gym/latest/) | +| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Evaluation | 195 | 100+ benchmarks across 18 harnesses | [link](https://docs.nvidia.com/nemo/evaluator/latest/) | +| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Deployment | 27 | Export to TRT-LLM/vLLM/ONNX + Triton serving | [link](https://docs.nvidia.com/nemo/export-deploy/latest/) | +| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Deployment | 5,635 | Programmable safety rails with Colang DSL | [link](https://docs.nvidia.com/nemo/guardrails) | +| [Run](https://github.com/NVIDIA-NeMo/Run) | Infra | 216 | Experiment launcher (local, SLURM, K8s) | [link](https://docs.nvidia.com/nemo/run/latest/) | +| [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes | — | Nemotron model family recipes | [link](https://github.com/NVIDIA-NeMo/Nemotron#readme) | diff --git a/nemo-fw-product-walkthrough.md b/nemo-fw-product-walkthrough.md new file mode 100644 index 0000000..deeb7a1 --- /dev/null +++ b/nemo-fw-product-walkthrough.md @@ -0,0 +1,547 @@ +# NeMo Framework: Product Walkthrough + +> **Audience:** Technical writing team (all levels) +> **Goal:** Understand what each product is, who it's for, and where it fits +> **Not covered here:** Deep implementation details, parallelism internals, benchmark methodology + +--- + +## What Is NeMo Framework? + +NeMo Framework is NVIDIA's open-source platform for building generative AI models. It covers the **entire model lifecycle** — from preparing training data, to training models, to making them safer and smarter, to evaluating quality, to deploying them in production. + +Think of it as a **toolbox with specialized tools for each stage**, not a single monolithic product. Each tool is a separate project (repo) with its own docs, and they're designed to work together. + +**The lifecycle in plain terms:** + +``` +Prepare Data → Train the Model → Align / Improve → Evaluate Quality → Deploy to Production +``` + +![NeMo Framework Pipeline](assets/diagram-00-nemo-framework-pipeline.png) + +--- + +## Quick Reference + +| # | Product | One-Line Summary | Stage | Docs | +|---|---------|-----------------|-------|------| +| 1 | [AutoModel](#1-automodel) | Fine-tune AI models with minimal setup | Training | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| 2 | [Curator](#2-curator--video-curator) | Clean and filter training data at scale | Data | [docs](https://docs.nvidia.com/nemo/curator/latest/) | +| 3 | [Customizer](#3-customizer) | Fine-tune models via API (managed service) | Training | [docs](https://docs.nvidia.com/nemo/microservices/latest/fine-tune/index.html) | +| 4 | [Data Designer](#4-data-designer) | Generate synthetic training data | Data | [docs](https://nvidia-nemo.github.io/DataDesigner/latest/) | +| 5 | [Evaluator](#5-evaluator) | Benchmark model quality across 100+ tests | Evaluation | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | +| 6 | [Gym](#6-gym) | Build practice environments for RL training | Alignment | [docs](https://docs.nvidia.com/nemo/gym/latest/) | +| 7 | [MCORE](#7-mcore-megatron-core) | Low-level engine for large-scale training | Training (engine) | [docs](https://docs.nvidia.com/Megatron-Core/) | +| 8 | [Megatron-Bridge](#8-megatron-bridge) | Train at massive scale (1,000+ GPUs) | Training | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| 9 | [nvFSDP](#9-nvfsdp) | Memory-efficient training technique inside AutoModel | Training (component) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| 10 | [RL](#10-rl) | Improve models using reinforcement learning | Alignment | [docs](https://docs.nvidia.com/nemo/rl/latest/) | +| 11 | [Toolkit (Speech)](#11-toolkit-speech) | Train speech recognition and text-to-speech models | Training | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | + +--- + +## Key Terms + +A short glossary for terms that come up repeatedly across products. + +| Term | Plain-English Meaning | +|------|----------------------| +| **Fine-tuning** | Teaching an existing AI model new skills using your own data | +| **SFT (Supervised Fine-Tuning)** | Fine-tuning by showing the model examples of correct answers | +| **LoRA** | A memory-efficient way to fine-tune — updates a small slice of the model instead of the whole thing | +| **RLHF / RL** | Reinforcement Learning — improving a model by giving it feedback (rewards) on its outputs | +| **DPO / GRPO** | Specific RL techniques for aligning model behavior with human preferences | +| **Pretraining** | Training a model from scratch on a large dataset (expensive, rare) | +| **Inference** | Running a trained model to get predictions / answers | +| **Hugging Face (HF)** | The most popular open platform for sharing AI models — NeMo works with HF models directly | +| **NGC** | NVIDIA's container registry — pre-built Docker images optimized for NVIDIA hardware | +| **NIM** | NVIDIA Inference Microservices — a way to deploy models as ready-to-use API endpoints | +| **Checkpoint** | A saved snapshot of a model's learned knowledge (like a save file in a video game) | + +--- + +## 1. AutoModel + +![NeMo AutoModel](assets/diagram-03-automodel.png) + +**Repo:** [NVIDIA-NeMo/Automodel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/) + +### What Is It? + +AutoModel is the **easiest way to fine-tune AI models on NVIDIA GPUs**. You pick a model from Hugging Face, provide your data, and AutoModel handles the distributed training — whether you have 1 GPU or hundreds. + +### How Does It Work? + +1. You choose a model from Hugging Face (Llama, Qwen, DeepSeek, Gemma, Mistral, etc.) +2. You write a short YAML config file describing your training job +3. You run one command — AutoModel figures out how to split the work across your GPUs + +The key idea: **the same script works on any scale**. You don't rewrite code when you go from a laptop to a cluster. + +### Use Cases + +- Fine-tune a language model on company-specific data (support tickets, legal docs, medical records) +- Train a vision-language model to understand your domain's images +- Run LoRA (lightweight) fine-tuning when GPU memory is limited +- Pretrain a model from scratch on a custom dataset + +### Where It's Positioned + +AutoModel is the **recommended starting point** for most training tasks. It works natively with Hugging Face models — no format conversion needed. If you outgrow it (need 1,000+ GPUs), you move to Megatron-Bridge. + +### Writer Notes + +- Very actively developed — new model support added weekly +- Uses `uv` as its package manager (not just pip) +- Checkpoints from AutoModel plug directly into NeMo RL for alignment + +--- + +## 2. Curator / Video Curator + +![NeMo Curator](assets/diagram-01-curator.png) + +**Repo:** [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | **Docs:** [docs.nvidia.com/nemo/curator](https://docs.nvidia.com/nemo/curator/latest/) + +### What Is It? + +Curator is a **data cleaning and filtering toolkit** for AI training data. "Garbage in, garbage out" — Curator helps you go in with quality data across text, images, video, and audio. + +### How Does It Work? + +Curator runs GPU-accelerated pipelines that process raw data through stages: + +1. **Ingest** — Load data from files, web crawls, or storage +2. **Filter & Classify** — Remove low-quality, duplicate, or unsafe content +3. **Deduplicate** — Find and remove near-identical content (even fuzzy matches) +4. **Output** — Clean, curated dataset ready for training + +It works on four data types: + +| Data Type | What Curator Does | +|-----------|-------------------| +| **Text** | Filters by quality, detects language, removes duplicates, classifies content | +| **Image** | Scores visual quality, detects inappropriate content, removes duplicates | +| **Video** | Detects scene changes, extracts clips, filters by motion/quality, deduplicates | +| **Audio** | Transcribes speech, checks transcription quality, filters by accuracy | + +**Video Curator** is not a separate product — it's the video pipeline within the same Curator library. + +### Use Cases + +- Clean a web-crawled text dataset before training a language model +- Filter a large image collection for training a vision model +- Process raw video footage into training clips for a video generation model +- Prepare audio datasets for speech recognition training + +### Where It's Positioned + +Curator sits at the **very beginning of the pipeline** — before any training happens. Its output feeds into AutoModel, Megatron-Bridge, or any other training tool. It's especially valuable at scale (terabytes of data) where GPU acceleration matters most. + +### Writer Notes + +- Docs are organized by modality: text, image, video, audio — each has its own getting-started guide +- Video Curator is part of the same `nemo-curator` pip package +- Powered by NVIDIA RAPIDS (GPU data processing libraries) + Ray (distributed computing) + +--- + +## 3. Data Designer + +![NeMo Data Designer](assets/diagram-02-data-designer.png) + +**Repo:** [NVIDIA-NeMo/DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) | **Docs:** [nvidia-nemo.github.io/DataDesigner](https://nvidia-nemo.github.io/DataDesigner/latest/) + +### What Is It? + +Data Designer is a **synthetic data generation** tool. When you don't have enough real training data — or need data with specific properties — Data Designer creates it for you using a combination of statistical rules and LLM generation. + +### How Does It Work? + +You define a "schema" for the data you want: + +1. **Define columns** — Some are generated by statistical rules (e.g., "pick a random product category"), others are generated by an LLM (e.g., "write a customer review for this category") +2. **Set dependencies** — One column's output can feed into another's prompt (so a review matches its category) +3. **Validate** — Built-in validators check that generated data meets your quality rules +4. **Preview → Generate** — Test with a small sample, then scale up + +### Use Cases + +- Generate training data for a chatbot when you have few real conversations +- Create diverse test datasets for model evaluation +- Build domain-specific instruction-following datasets +- Augment limited real-world data with synthetic examples + +### Where It's Positioned + +Data Designer sits alongside Curator in the **data preparation stage** — but they solve different problems. Curator *cleans existing data*; Data Designer *creates new data*. They're complementary. + +### Writer Notes + +- Docs are hosted on GitHub Pages (`nvidia-nemo.github.io`), not `docs.nvidia.com` +- Supports NVIDIA, OpenAI, and OpenRouter as LLM providers +- Has a CLI for configuration: `data-designer config` + +--- + +## 4. Evaluator + +![NeMo Evaluator](assets/diagram-07-evaluator.png) + +**Repo:** [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | **Docs:** [docs.nvidia.com/nemo/evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) + +### What Is It? + +Evaluator is a **benchmarking platform** for testing how well AI models perform. It runs your model against 100+ standardized tests (benchmarks) and gives you reproducible scores. + +### How Does It Work? + +1. You point Evaluator at your model (any model that exposes a chat/completion API) +2. You pick which benchmarks to run (math, code, safety, general knowledge, etc.) +3. Evaluator pulls the right test container, runs the benchmark, and reports results +4. All configurations and seeds are saved so results are reproducible + +It has two parts: +- **Launcher** (`nemo-evaluator-launcher`) — the CLI you interact with; handles setup and orchestration +- **Engine** (`nemo-evaluator`) — runs the actual benchmark; most users don't touch this directly + +### Use Cases + +- Compare your fine-tuned model against the base model to measure improvement +- Run a standard benchmark suite before releasing a model +- Test for safety issues (jailbreak vulnerability, bias, hallucination) +- Evaluate code generation, math reasoning, or multilingual capabilities + +### Where It's Positioned + +Evaluator sits **after training and alignment** — it answers "how good is this model?" It works with any model that has an OpenAI-compatible API, so it's not locked to NeMo-trained models. + +### Writer Notes + +- Users primarily interact with the launcher CLI, not the engine +- Each benchmark runs in its own Docker container from NGC +- Supports running on local machines, SLURM clusters, and cloud backends + +--- + +## 5. Gym + +![NeMo Gym](assets/diagram-06-nemo-gym.png) + +**Repo:** [NVIDIA-NeMo/Gym](https://github.com/NVIDIA-NeMo/Gym) | **Docs:** [docs.nvidia.com/nemo/gym](https://docs.nvidia.com/nemo/gym/latest/) + +### What Is It? + +Gym provides **practice environments** where AI models learn through trial and error (reinforcement learning). Think of it like a training simulator — the model takes actions, the environment gives feedback, and the model improves. + +### How Does It Work? + +Gym comes with pre-built **environments** ("resource servers") across different skill domains: + +| Domain | Example Environments | What the Model Practices | +|--------|---------------------|-------------------------| +| **Agent tasks** | Calendar scheduling, web search, workplace assistant | Multi-step tool use | +| **Coding** | Competitive programming, software engineering | Writing and debugging code | +| **Math** | Math problem solving with verification | Mathematical reasoning | +| **Knowledge** | Multiple-choice Q&A | Factual knowledge (like MMLU) | +| **Instruction following** | Format constraints, structured outputs | Following precise instructions | + +Each environment includes a dataset of problems, a way to verify answers, and integration with NeMo RL for training. + +### Use Cases + +- Train a model to use tools (search, calendar, APIs) through practice +- Improve a model's math or coding skills with verifiable feedback +- Build a custom environment for your domain-specific RL training +- Collect training data (rollouts) for use with NeMo RL + +### Where It's Positioned + +Gym is a **companion to NeMo RL**. RL provides the training algorithms; Gym provides the practice environments. Together they handle the "alignment" stage of the pipeline. + +### Writer Notes + +- Early-stage project — APIs are still evolving +- No GPU required for Gym itself (only for the model doing inference) +- Each environment has its own README, config, and dataset + +--- + +## 6. MCORE (Megatron-Core) + +**Repo:** [NVIDIA/Megatron-LM](https://github.com/NVIDIA/Megatron-LM) (inside `megatron/core/`) | **Docs:** [docs.nvidia.com/Megatron-Core](https://docs.nvidia.com/Megatron-Core/) + +### What Is It? + +Megatron-Core (MCORE) is the **engine under the hood** of several NeMo products. It provides the low-level building blocks for training very large models across many GPUs efficiently. Most users never interact with MCORE directly — they use it through Megatron-Bridge or NeMo RL. + +### How Does It Work? + +When you train a model that's too large to fit on a single GPU, MCORE splits the work across many GPUs using various strategies: + +- **Split the model** across GPUs (different layers on different GPUs) +- **Split the data** across GPUs (each GPU processes different examples) +- **Split individual layers** across GPUs (for very wide layers) + +It also provides optimized model architectures, efficient checkpointing (saving/loading model snapshots), and mixed-precision training to use less memory. + +### Use Cases + +- You're building a **custom training framework** and need high-performance distributed training primitives +- You're a **framework developer** integrating with Megatron-based systems + +Most end users should use Megatron-Bridge or AutoModel instead. + +### Where It's Positioned + +MCORE is a **dependency**, not a standalone product for most users: + +``` +Users interact with: AutoModel ←or→ Megatron-Bridge ←or→ NeMo RL + ↓ ↓ ↓ +Under the hood: PyTorch MCORE MCORE + (DTensor) (Megatron-Core) (Megatron-Core) +``` + +### Writer Notes + +- Lives in the **NVIDIA/Megatron-LM** repo (different GitHub org than most NeMo repos) +- 15,000+ GitHub stars — one of the most popular NVIDIA open-source projects +- MCORE docs should be written for framework developers, not end users + +--- + +## 7. Megatron-Bridge + +![NeMo Megatron-Bridge](assets/diagram-04-megatron-bridge.png) + +**Repo:** [NVIDIA-NeMo/Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | **Docs:** [docs.nvidia.com/nemo/megatron-bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) + +### What Is It? + +Megatron-Bridge is a **training library for extreme scale** — designed for training runs on 1,000+ GPUs. It connects Hugging Face models with MCORE's powerful distributed training engine. + +### How Does It Work? + +The "bridge" metaphor is literal — it bridges two worlds: + +- **Hugging Face world:** Where models are shared, easy to use, and widely adopted +- **Megatron world:** Where training is maximally optimized for NVIDIA hardware at massive scale + +Megatron-Bridge converts models between these formats (in both directions) and provides a training loop on top. You can: + +1. Start from a Hugging Face model +2. Convert it to Megatron format (automatically, using the `AutoBridge` API) +3. Train at massive scale with MCORE +4. Convert back to Hugging Face format for sharing or deployment + +### Use Cases + +- Pretraining or fine-tuning at 1,000+ GPU scale where maximum throughput matters +- Organizations that need Megatron-level performance but want Hugging Face compatibility +- Converting between Hugging Face and Megatron checkpoint formats + +### Where It's Positioned + +Megatron-Bridge is the **heavy-duty training option** — complementary to AutoModel: + +| | AutoModel | Megatron-Bridge | +|---|---|---| +| **Best for** | Getting started, rapid iteration | Maximum scale and throughput | +| **GPU sweet spot** | 1 to ~1,000 GPUs | 1,000+ GPUs | +| **Model source** | Hugging Face (native) | Hugging Face (via conversion) | +| **Underlying engine** | PyTorch (DTensor) | Megatron-Core | + +### Writer Notes + +- Refactored from the original NeMo training stack +- Has been adopted by several community projects (VeRL, SkyRL, Slime) +- Recipes for popular models live in `src/megatron/bridge/recipes/` + +--- + +## 9. nvFSDP + +**Location:** Inside [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/) + +### What Is It? + +nvFSDP is **not a standalone product** — it's a component inside AutoModel that handles **memory-efficient distributed training**. It automatically splits a model's memory footprint across GPUs so you can train models that are too large to fit on a single GPU. + +### How Does It Work? + +In simple terms: instead of every GPU holding a full copy of the model, nvFSDP shards (splits) the model's weights, gradients, and optimizer state across all GPUs. Each GPU holds only a fraction, and they coordinate during training. + +This is NVIDIA's optimized version of PyTorch's FSDP2 (Fully Sharded Data Parallel) technology. + +### Use Cases + +- Training large models on limited GPU memory +- Users don't interact with nvFSDP directly — they benefit from it automatically when using AutoModel + +### Where It's Positioned + +nvFSDP is an **implementation detail** of AutoModel. Users configure it through AutoModel's YAML settings. Writers may encounter the term in code or internal docs, but externally it's usually referred to as "FSDP2." + +### Writer Notes + +- Not a separate repo, pip package, or product — it's part of AutoModel's internals +- The name "nvFSDP" may appear in internal docs; externally prefer "FSDP2" or "FSDP2 Strategy" +- NeMo RL's DTensor backend also uses this under the hood (via AutoModel) + +--- + +## 10. RL + +![NeMo RL](assets/diagram-05-nemo-rl.png) + +**Repo:** [NVIDIA-NeMo/RL](https://github.com/NVIDIA-NeMo/RL) | **Docs:** [docs.nvidia.com/nemo/rl](https://docs.nvidia.com/nemo/rl/latest/) + +### What Is It? + +NeMo RL is a **post-training library** that makes AI models better through reinforcement learning — the model generates outputs, gets feedback on quality, and learns to produce better results over time. + +### How Does It Work? + +The core loop: + +1. **Generate** — The model produces responses to prompts +2. **Score** — A reward system evaluates the responses (correct math? followed instructions? safe output?) +3. **Train** — The model updates its behavior based on the scores +4. Repeat + +NeMo RL supports several training techniques: + +| Technique | What It Does | +|-----------|-------------| +| **GRPO** | Reinforcement learning using group-relative scoring — the main RL method | +| **DPO** | Learns from pairs of "preferred vs. rejected" responses | +| **SFT** | Standard supervised fine-tuning (also available here for convenience) | +| **Distillation** | A smaller "student" model learns from a larger "teacher" model | +| **Reward Modeling** | Trains a model to predict human preferences (used as the scorer) | + +### Use Cases + +- Improve a model's math and reasoning abilities using verified solutions +- Align a model with human preferences (helpful, harmless, honest) +- Train a model to use tools through multi-turn interaction +- Distill a large model's capabilities into a smaller, cheaper model + +### Where It's Positioned + +RL covers the **alignment stage** — after initial training (AutoModel / Megatron-Bridge) and before evaluation. It works with: + +- **NeMo Gym** for practice environments and reward signals +- **AutoModel** or **Megatron-Bridge** as its training engine (user picks via config) +- **Evaluator** to measure improvement after alignment + +### Writer Notes + +- Most-starred repo in the NVIDIA-NeMo org (1,300+ stars) +- Used to train NVIDIA's Nemotron-3-Nano-30B model +- Training backend (AutoModel vs. Megatron) is selected automatically based on the YAML config +- Uses Ray for distributed infrastructure + +--- + +## 11. Toolkit (Speech) + +**Repo:** [NVIDIA-NeMo/NeMo](https://github.com/NVIDIA-NeMo/NeMo) | **Docs:** [docs.nvidia.com/nemo-framework (Speech AI)](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) + +### What Is It? + +The NeMo Toolkit is the **speech AI** component of the framework, providing tools for training automatic speech recognition (ASR) and text-to-speech (TTS) models. + +### How Does It Work? + +- **ASR (Speech-to-Text):** Train models that convert spoken audio into text — supports multiple languages +- **TTS (Text-to-Speech):** Train models that generate natural-sounding speech from text +- Comes with a large collection of **pretrained models** that can be fine-tuned for specific domains, accents, or languages + +### Use Cases + +- Build a custom speech recognition system for a specific industry (medical, legal, call center) +- Create a text-to-speech voice for a specific language or brand +- Fine-tune an existing speech model on your organization's vocabulary + +### Where It's Positioned + +This is the **original NeMo repository**. Historically, it contained everything (LLM, speech, vision). Over time, LLM training was split into AutoModel and Megatron-Bridge. This repo now **primarily houses the speech workloads**. + +NeMo Curator's audio pipeline can prepare data that feeds into Toolkit (Speech) for training. + +### Writer Notes + +- The repo is named "NeMo" which can be confusing — "NeMo" (the repo) now means "speech," while "NeMo Framework" means the entire ecosystem +- Docs live under the NeMo Framework User Guide, not a standalone site +- This repo is large and historically complex — speech docs should be clearly scoped + +--- + +## How the Products Connect + +![NeMo Framework Interoperability Map](assets/diagram-13-interoperability-map.png) + +Here's how data and models flow between the products: + +| Step | What Happens | Products Involved | +|------|-------------|-------------------| +| 1. **Prepare data** | Clean real data or generate synthetic data | Curator, Data Designer | +| 2. **Train** | Fine-tune or pretrain a model | AutoModel (standard) or Megatron-Bridge (large scale) | +| 3. **Align** | Improve the model with RL or preference learning | RL + Gym | +| 4. **Evaluate** | Benchmark quality across standardized tests | Evaluator | +| 5. **Deploy** | Export and serve the model in production | (Export-Deploy, Guardrails — not covered in this deck) | + +**Alternatively:** Use **Customizer** for steps 2–3 via API instead of running open-source code. + +**Under the hood:** MCORE powers Megatron-Bridge and RL's Megatron backend. nvFSDP powers AutoModel and RL's DTensor backend. **NeMo Run** (not covered in this deck) is an experiment launcher that works across all products. + +--- + +## Comparison Cheat Sheet + +### "I want to train a model" — Which product? + +| Situation | Use This | +|-----------|----------| +| Fine-tune a Hugging Face model on my data | **AutoModel** | +| Train at massive scale (1,000+ GPUs) | **Megatron-Bridge** | +| Fine-tune via API without managing infrastructure | **Customizer** | +| Train a speech recognition or TTS model | **Toolkit (Speech)** | + +### "I want to improve a trained model" — Which product? + +| Situation | Use This | +|-----------|----------| +| Align with human preferences (DPO, GRPO) | **RL** | +| Train a model to use tools via practice | **RL + Gym** | +| Distill a large model into a smaller one | **RL** (on-policy distillation) | + +### "I need training data" — Which product? + +| Situation | Use This | +|-----------|----------| +| Clean / filter / deduplicate existing data | **Curator** | +| Generate synthetic data from scratch | **Data Designer** | + +--- + +## Documentation Landscape + +Not all products are documented in the same place: + +| Docs Host | Products | +|-----------|----------| +| `docs.nvidia.com/nemo/...` | AutoModel, Megatron-Bridge, RL, Gym, Evaluator, Curator | +| `docs.nvidia.com/nemo/microservices/...` | Customizer | +| `docs.nvidia.com/Megatron-Core/` | MCORE | +| `nvidia-nemo.github.io/...` | Data Designer, Skills | +| `docs.nvidia.com/nemo-framework/user-guide/...` | Toolkit (Speech) | + +### Open-Source vs. Managed + +| Type | Products | +|------|----------| +| **Open-source (GitHub)** | AutoModel, Curator, Data Designer, Evaluator, Gym, MCORE, Megatron-Bridge, RL, Toolkit | +| **Closed-source (microservice)** | Customizer | +| **Internal component** (not a standalone product) | nvFSDP | diff --git a/profile/README.md b/profile/README.md index 074bc68..88954e7 100644 --- a/profile/README.md +++ b/profile/README.md @@ -1,5 +1,5 @@ @@ -7,242 +7,27 @@ SPDX-License-Identifier: Apache-2.0 **Train Llama 3.3 · Qwen 2.5 · Mistral · DeepSeek · Gemma · Nemotron on NVIDIA GPUs** -This GitHub org contains libraries for training, data curation, evaluation, alignment, and deployment. Scale from a single GPU to 10,000+ nodes with day-0 Hugging Face support or Megatron backends for maximum throughput. +GPU-accelerated, open-source libraries for training, data curation, evaluation, alignment, and deployment. Scale from a single GPU to 10,000+ nodes with Hugging Face or Megatron backends. ---- +## Documentation -## Choose Your Path +**[docs.nvidia.com/nemo](https://docs.nvidia.com/nemo)** — framework overview, decision guide, and links to every library's docs. - - - - - - -
- -### Get Started - -**Start with [NeMo AutoModel](https://github.com/NVIDIA-NeMo/Automodel)** – the simplest path to fine-tuning Hugging Face models on NVIDIA GPUs. +| Start here | Docs | +| --- | --- | +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) (fine-tune HF models) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) (1K+ GPUs) | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) (DPO / GRPO) | [docs](https://docs.nvidia.com/nemo/rl/latest/) | +| [Curator](https://github.com/NVIDIA-NeMo/Curator) · [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) · [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | [All 23 repos →](https://docs.nvidia.com/nemo/repositories) | ```bash pip install nemo-automodel ``` -```python -from nemo_automodel import AutoModelForCausalLM, Trainer - -model = AutoModelForCausalLM.from_pretrained( - "meta-llama/Llama-3.3-70B-Instruct" -) -trainer = Trainer(model=model, train_dataset=dataset) -trainer.train() -``` - -[→ AutoModel Quick Start](https://docs.nvidia.com/nemo/automodel/latest/launcher/local-workstation.html#quick-start-choose-your-job-launch-option) - - - -### Scale Training - -- **< 1,000 GPUs**: [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) -- **1,000+ GPUs**: [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) -- **RLHF/DPO**: [NeMo RL](https://github.com/NVIDIA-NeMo/RL) - -[→ Training Recipes](#training-recipes) - -### Experiment - -[NeMo Run](https://github.com/NVIDIA-NeMo/Run) for launching and tracking experiments across: - -- Local machines -- SLURM clusters -- Kubernetes - -[→ Run Documentation](https://docs.nvidia.com/nemo/run/latest/) - - - -### Explore Libraries - -- [Curator](https://github.com/NVIDIA-NeMo/Curator) – Data curation -- [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) – Model benchmarking -- [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) – Production deployment - -[→ All Libraries](#all-libraries) - -### Use Containers - -Pull optimized containers to get started fast. - -- [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) -- [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) -- [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) -- [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) - -[→ Explore NGC Catalog](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/containers) -
- -
-📋 Decision Guide — Which library should I use? - -| I want to... | Models | Scale | Library | Docs | -|--------------|--------|-------|---------|------| -| **Train/fine-tune** | LLM, VLM | ≤1K GPUs | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | -| **Train at scale** | LLM, VLM | 1K+ GPUs | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| **Align** (DPO/GRPO) | LLM, VLM | Any | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [docs](https://docs.nvidia.com/nemo/rl/latest/) | -| **Curate data** | — | Any | [Curator](https://github.com/NVIDIA-NeMo/Curator) | [docs](https://docs.nvidia.com/nemo/curator/latest/) | -| **Evaluate** | Any | — | [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | -| **Deploy** | Any | — | [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | -| **Speech AI** | ASR, TTS | Any | [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | [docs](https://docs.nvidia.com/nemo/speech/latest/) | - -
- ---- - -## Training Recipes - -| Library | LLM Recipes | VLM Recipes | -|---------|-------------|-------------| -| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [Llama](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/llama3_2), [Qwen](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/qwen), [Gemma](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/gemma), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_pretrain/deepseekv3_pretrain.yaml), [Mistral](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/mistral), [Phi](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/phi) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/qwen2_5), [Gemma 3n VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3n) | -| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [Llama](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [Qwen](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/deepseek/deepseek_v3.py), [Gemma 3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma/gemma3.py), [Nemotron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/nemotronh/nemotronh.py) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma3_vl/gemma3_vl.py), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [Qwen3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen3vl.py) | -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_sft.py) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_sft.py) | - - ---- - -## All Libraries - -### Pipeline Overview - -```mermaid -flowchart LR - subgraph Data - Curator - DataDesigner[Data Designer] - Skills - end - - subgraph Training - AutoModel - MBridge[Megatron-Bridge] - end - - subgraph Alignment - RL[NeMo RL] - end - - subgraph Evaluation - Evaluator - end - - subgraph Deployment - Export[Export-Deploy] - Guardrails - end - - Gym[NeMo Gym] - - Data --> Training - Training --> Alignment - Training --> Evaluation - Alignment --> Evaluation - Evaluation --> Deployment - - Gym -.-> RL - Skills -.-> Evaluator -``` - -### Data - -| Repo | Description | Docs | Container | -|------|-------------|------|-----------| -| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data curation at scale | [docs](https://docs.nvidia.com/nemo/curator/latest/) | [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) | -| [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Synthetic data generation | [docs](https://nvidia-nemo.github.io/DataDesigner/latest/) | — | -| [Skills](https://github.com/NVIDIA-NeMo/Skills) | SDG pipelines (math, code, science datasets) | [docs](https://nvidia-nemo.github.io/Skills/) | — | - -### Training - -| Repo | Description | Backend | Models | Docs | Container | -|------|-------------|---------|--------|------|-----------| -| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Pretraining, SFT, LoRA | PyTorch | LLM, VLM, Omni | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) | -| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Pretraining, SFT, LoRA | Megatron-core | LLM, VLM | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | -| [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | Pretraining, SFT | Megatron-core | Speech | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | — | -| [DFM](https://github.com/NVIDIA-NeMo/DFM) | Diffusion training | Megatron-core | Diffusion | [docs](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | — | -| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Collection of optimizers | — | — | [docs](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html) | — | - -### Alignment - -| Repo | Description | Backend | Models | Docs | Container | -|------|-------------|---------|--------|------|-----------| -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | SFT, DPO, GRPO | Megatron-core, vLLM | LLM, VLM | [docs](https://docs.nvidia.com/nemo/rl/latest/) | [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) | -| [Gym](https://github.com/NVIDIA-NeMo/Gym) | RL environments | — | LLM, VLM | [docs](https://docs.nvidia.com/nemo/gym/latest/index.html) | — | - -### Evaluation - -| Repo | Description | Docs | Container | -|------|-------------|------|-----------| -| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Model benchmarking | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | -| [Skills](https://github.com/NVIDIA-NeMo/Skills) | Evaluation pipelines (math, code, science, etc.) | [docs](https://nvidia-nemo.github.io/Skills/) | — | - -### Deployment - -| Repo | Description | Backends | Docs | Container | -|------|-------------|----------|------|-----------| -| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Export to production | vLLM, TRT-LLM, ONNX | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | -| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Safety rails | — | [docs](https://docs.nvidia.com/nemo/guardrails/latest/) | — | - -### Models and Recipes - -| Repo | Description | Docs | Container | -|------|-------------|------|-----------| -| [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes for Nemotron models | [docs](https://github.com/NVIDIA-NeMo/Nemotron#readme) | — | - -### Infrastructure - -| Repo | Description | Docs | Container | -|------|-------------|------|-----------| -| [Run](https://github.com/NVIDIA-NeMo/Run) | Experiment launcher | [docs](https://docs.nvidia.com/nemo/run/latest/) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | - -### Architecture Reference - -![Framework Architecture](RepoDiagram.png) - -*Architectural layers and dependencies across the NeMo Framework.* - ---- - ## Community - - - - - -
- -### 💬 Get Involved - -**[GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions)** — Questions, ideas, and announcements - -- [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) -- Follow each repos CONTRIBUTING guide to get started -- [Release Notes](https://docs.nvidia.com/nemo-framework/user-guide/latest/changelog.html) - - - -### 📣 Latest - -**🐳 AutoModel** -- [Enabling PyTorch Native Pipeline Parallelism for HF Models](https://github.com/NVIDIA-NeMo/Automodel/discussions/589) *(Oct 2025)* -- [Day-0 Hugging Face Support](https://github.com/NVIDIA-NeMo/Automodel/discussions/477) *(Sep 2025)* -- [Gemma 3n Multimodal Fine-tuning](https://github.com/NVIDIA-NeMo/Automodel/discussions/494) *(Sep 2025)* - -**🔬 NeMo RL** — [On-policy Distillation](https://github.com/NVIDIA-NeMo/RL/discussions/1445), [FP8 Quantization](https://github.com/NVIDIA-NeMo/RL/discussions/1216), [10× MoE Weight Transfer](https://github.com/NVIDIA-NeMo/RL/discussions/1189) - -**💬 NeMo Speech** — [Fine-tune NeMo models with Granary Data](https://github.com/NVIDIA-NeMo/NeMo/discussions/14758) - -
+- [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) +- [All repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) ## License From d85e4fe7160116bec2621799335f90f7036a61fe Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 13:44:12 -0400 Subject: [PATCH 08/18] Restructure NeMo OSS hub IA and add Platform positioning. Migrate the Fern site to canonical About/Get Started/Resources navigation with catalog components, container release pages, Framework vs Platform guidance, and NVIDIA style updates across hub content and the org README. --- .github/workflows/publish-fern-docs.yml | 2 +- GH-TOPICS.MD | 14 + README.md | 2 +- fern/README.md | 95 ++++- fern/components/ContainerCatalog.tsx | 334 ++++++++++++++++++ fern/components/RepoCatalog.tsx | 43 +-- fern/components/StageGuide.tsx | 74 ++++ fern/components/containers.ts | 127 +++++++ fern/components/repos.ts | 167 +++++---- fern/docs.yml | 90 ++++- fern/docs/pages/about/architecture.mdx | 144 ++++++++ fern/docs/pages/about/concepts.mdx | 106 ++++++ fern/docs/pages/about/ecosystem.mdx | 88 +++++ fern/docs/pages/about/libraries.mdx | 26 ++ .../pages/about/release-notes/containers.mdx | 14 + fern/docs/pages/about/release-notes/index.mdx | 40 +++ .../about/release-notes/known-issues.mdx | 143 ++++++++ fern/docs/pages/get-started/data.mdx | 36 ++ fern/docs/pages/get-started/e2e.mdx | 42 +++ fern/docs/pages/get-started/index.mdx | 61 ++++ fern/docs/pages/get-started/inference.mdx | 42 +++ fern/docs/pages/get-started/installation.mdx | 66 ++++ fern/docs/pages/get-started/pretraining.mdx | 38 ++ fern/docs/pages/get-started/quickstart.mdx | 46 +++ fern/docs/pages/get-started/rl.mdx | 37 ++ fern/docs/pages/getting-started.mdx | 85 ----- fern/docs/pages/index.mdx | 114 ++---- fern/docs/pages/libraries.mdx | 146 -------- fern/docs/pages/repositories.mdx | 22 -- fern/docs/pages/{ => resources}/community.mdx | 19 +- profile/README.md | 17 +- 31 files changed, 1791 insertions(+), 489 deletions(-) create mode 100644 fern/components/ContainerCatalog.tsx create mode 100644 fern/components/StageGuide.tsx create mode 100644 fern/components/containers.ts create mode 100644 fern/docs/pages/about/architecture.mdx create mode 100644 fern/docs/pages/about/concepts.mdx create mode 100644 fern/docs/pages/about/ecosystem.mdx create mode 100644 fern/docs/pages/about/libraries.mdx create mode 100644 fern/docs/pages/about/release-notes/containers.mdx create mode 100644 fern/docs/pages/about/release-notes/index.mdx create mode 100644 fern/docs/pages/about/release-notes/known-issues.mdx create mode 100644 fern/docs/pages/get-started/data.mdx create mode 100644 fern/docs/pages/get-started/e2e.mdx create mode 100644 fern/docs/pages/get-started/index.mdx create mode 100644 fern/docs/pages/get-started/inference.mdx create mode 100644 fern/docs/pages/get-started/installation.mdx create mode 100644 fern/docs/pages/get-started/pretraining.mdx create mode 100644 fern/docs/pages/get-started/quickstart.mdx create mode 100644 fern/docs/pages/get-started/rl.mdx delete mode 100644 fern/docs/pages/getting-started.mdx delete mode 100644 fern/docs/pages/libraries.mdx delete mode 100644 fern/docs/pages/repositories.mdx rename fern/docs/pages/{ => resources}/community.mdx (68%) diff --git a/.github/workflows/publish-fern-docs.yml b/.github/workflows/publish-fern-docs.yml index e470f0b..07594c7 100644 --- a/.github/workflows/publish-fern-docs.yml +++ b/.github/workflows/publish-fern-docs.yml @@ -1,7 +1,7 @@ # SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. # SPDX-License-Identifier: Apache-2.0 # -# Publishes the NeMo Framework hub Fern site. +# Publishes the NeMo OSS hub Fern site. # git tag docs/v0.1.0 && git push origin docs/v0.1.0 # Requires org secret: DOCS_FERN_TOKEN diff --git a/GH-TOPICS.MD b/GH-TOPICS.MD index 973ab13..86dc1ea 100644 --- a/GH-TOPICS.MD +++ b/GH-TOPICS.MD @@ -10,6 +10,20 @@ | **Backend** | `backend-megatron`, `backend-pytorch`, `backend-vllm`, `backend-tensorrt` | Filter by infrastructure | | **Meta** | `nvidia-nemo` | All repos in the framework | +### README stage mapping + +Org README columns map to catalog **`stage`** filters in `fern/components/repos.ts`. Optional GitHub topics can mirror these for org-wide filtering: + +| README stage | Catalog `stage` | Suggested GitHub topic | +|--------------|-----------------|------------------------| +| Data | `data` | `stage-data` | +| Pretraining | `pretraining` | `stage-training` | +| RL | `rl` | `stage-alignment` | +| Inference | `inference` | `stage-evaluation`, `stage-deployment`, `stage-safety` | +| E2E | `e2e` | — (recipes, pipelines, orchestration) | + +Use **`tags`** in `repos.ts` (and topic facets below) for cross-cutting search — e.g. Skills is E2E in the README but also spans SDG and evaluation. + Pretraining + Pretraining --> RL + Pretraining --> Inference + RL --> Inference + + Run -.-> Pretraining + Skills -.-> Data + Skills -.-> Inference +``` + +Solid arrows are the typical **model lifecycle** (data → train → align → evaluate/deploy). Dotted lines show **orchestration and reference pipelines** (Run, Skills, Nemotron) that span stages. + +## Functional layers + +| Layer | Purpose | +| --- | --- | +| **Data** | Ingest, filter, deduplicate, synthesize, and govern datasets | +| **Pretraining** | Train, fine-tune, and convert checkpoints | +| **RL** | Post-training alignment and agent improvement | +| **Inference** | Benchmark, export, serve, and apply guardrails | +| **E2E** | Launch experiments, ship recipes, share reference assets | + +Which repos sit in each layer changes over time — use [Libraries](/about/libraries) (filtered by stage) as the live catalog, not this table. + +## Training backends + +Two primary training paths coexist: + +```mermaid +flowchart TB + subgraph HF["PyTorch / Hugging Face path"] + AM[AutoModel] + RL1[NeMo RL] + end + + subgraph MCore["Megatron-Core path"] + MB[Megatron-Bridge] + RL2[NeMo RL] + end + + Data2[Curated data] --> AM + Data2 --> MB + AM --> RL1 + MB --> RL2 + RL1 --> Eval[Evaluator / Export-Deploy] + RL2 --> Eval +``` + +- **HF path** — `pip install nemo-automodel`, Hugging Face models and trainers, scales to roughly 1,000 GPUs. +- **Megatron path** — Megatron-Bridge on Megatron-Core, HF checkpoint conversion, scales to thousand-node clusters. + +Both paths can feed the same **RL**, **Evaluator**, and **Export-Deploy** libraries downstream. + +## Agent platform + +[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) sits alongside individual inference libraries: it wires Guardrails, Evaluator, Data Designer, and related services into one CLI and SDK for **agent** evaluate / optimize / deploy loops. It is not the Framework NGC container — refer to [Framework and Platform](/about/concepts#framework-and-platform). + + + + +Setup, CLI reference, and API for agent hardening and evaluation. + + + +Model evaluation, export, guardrails, and when to use Platform or standalone libraries. + + + + +## NGC containers + +For production-style stacks, NGC images bundle tested dependency sets: + +| Image | Scope | +| --- | --- | +| `nvcr.io/nvidia/nemo` | Multi-library **Framework** stack (Megatron-Bridge, Evaluator, Export-Deploy, Run, Speech) | +| `nvcr.io/nvidia/nemo-automodel` | Standalone AutoModel | +| `nvcr.io/nvidia/nemo-rl` | Standalone NeMo RL | +| `nvcr.io/nvidia/nemo-curator` | Standalone Curator | + +Refer to the [container catalog](/about/release-notes/containers) for pull commands and recent Framework tags. + +## Where to go next + + + + +Lifecycle stages, backends, and hub and product docs. + + + +Install paths and stage-specific entry points. + + + +Full repo catalog with docs and GitHub links. + + + diff --git a/fern/docs/pages/about/concepts.mdx b/fern/docs/pages/about/concepts.mdx new file mode 100644 index 0000000..574ecaf --- /dev/null +++ b/fern/docs/pages/about/concepts.mdx @@ -0,0 +1,106 @@ +--- +title: Concepts +subtitle: Terms and ideas used across NeMo OSS +slug: about/concepts +position: 4 +--- + +Key concepts for navigating **NeMo OSS** — the hub, the GitHub org, and the libraries it catalogs. + +## NeMo OSS + +**NeMo OSS** refers to open source libraries in the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization, documented on **docs.nvidia.com/nemo**. It is one part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, which also includes commercial offerings not listed in this catalog. + +## Lifecycle stages + +Libraries are grouped into five **lifecycle stages** — the same columns as the [org README](https://github.com/NVIDIA-NeMo): + +| Stage | You typically… | +| --- | --- | +| **Data** | Curate, filter, synthesize, or anonymize training data | +| **Pretraining** | Pretrain, fine-tune, or adapt foundation models | +| **RL** | Align models with SFT, DPO, GRPO, or reinforcement learning | +| **Inference** | Evaluate quality, export checkpoints, deploy, or add guardrails | +| **E2E** | Run multi-step recipes, orchestrate experiments, share reference pipelines | + +Stage-specific get-started guides: [Get Started](/get-started) → **By lifecycle stage**. + +## Hub and library documentation + +| Term | Meaning | +| --- | --- | +| **Hub** | This site — orientation, catalog, containers, known issues, get-started by stage | +| **Library docs** | Each product's Fern or docs site (for example `docs.nvidia.com/nemo/curator`) | +| **Repo** | GitHub source in NVIDIA-NeMo — issues, PRs, and `CONTRIBUTING.md` live there | + +The hub answers *which library and where to click next*. Library docs answer *how to use that library*. + +## What belongs on this hub + +| Keep here | Put in library docs | +| --- | --- | +| Lifecycle stages and pipeline shape | Tutorials, APIs, configuration | +| Choosing AutoModel or Megatron-Bridge | Model-specific recipes and scripts | +| Catalogs (`repos.ts`, container catalog) | Per-release install pins and changelogs | +| Framework container tags and cross-component known issues | Library-only bugs and workarounds | + +Hub content should help a **decision** and have a **long lifespan**. If it goes stale every release, link out instead. + +## Framework and Platform + +Three related terms show up across NeMo OSS docs: + +| Term | Meaning | +| --- | --- | +| **NeMo Framework** (ecosystem) | The open source **library pipeline** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) — composable repos for data through deployment, documented on this hub | +| **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:` bundling a tested multi-library training and inference stack — refer to [Containers and releases](#containers-and-releases) | +| **NeMo Platform** | Product repo [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — **CLI, SDK, and Studio UI** that integrate NeMo libraries to evaluate, harden, tune, and deploy **agents** | + +Use **Framework** when you mean the library ecosystem or the NGC bundle. Use **Platform** when you mean the agent integration layer. Platform docs live on [NeMo Platform documentation](https://nvidia-nemo.github.io/nemo-platform/main/); training and export details stay on each library's site. + +Positioning and when to choose each: [Ecosystem](/about/ecosystem#nemo-framework-and-nemo-platform). + +## Training backends + +| Term | Meaning | +| --- | --- | +| **PyTorch / HF path** | Hugging Face models and trainers with [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) — best default for ≤1K GPUs | +| **Megatron-Core path** | Large-scale training with [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) and HF ↔ Megatron checkpoint conversion | +| **Recipe** | Scripted training or alignment configuration (often in library repos or [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)) | + +## Containers and releases + +| Term | Meaning | +| --- | --- | +| **NeMo Framework container** | Multi-library NGC image `nvcr.io/nvidia/nemo:` | +| **Standalone container** | Single-library images (AutoModel, RL, Curator) | +| **Component versions** | Pinned packages inside a Framework container — refer to [software component versions](https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html) | + +Release metadata: [Release notes](/about/release-notes). + +## Tags and stages + +In the [Libraries](/about/libraries) catalog: + +- **Stage** — primary lifecycle column (one per repo). +- **Tags** — cross-cutting search facets (`speech`, `evaluation`, `agents`, and so on). + +A repo has one stage; tags help you find libraries that span concerns (for example Evaluator tagged for benchmarks). + +## Related pages + + + + +NeMo OSS within NVIDIA NeMo, Framework and Platform, and AutoModel or Megatron-Bridge. + + + +Pipeline diagram, layers, and container bundling. + + + +Searchable catalog of all repos. + + + diff --git a/fern/docs/pages/about/ecosystem.mdx b/fern/docs/pages/about/ecosystem.mdx new file mode 100644 index 0000000..7ac29fd --- /dev/null +++ b/fern/docs/pages/about/ecosystem.mdx @@ -0,0 +1,88 @@ +--- +title: Ecosystem +subtitle: NeMo OSS within the NVIDIA NeMo software suite +slug: about/ecosystem +position: 2 +--- + +**NeMo OSS** is the open source side of [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) — the public GitHub organization [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and the documentation hub you are reading now. Commercial NeMo products, enterprise services, and NIM microservices live outside this catalog. + + +This page explains **where NeMo OSS fits**. For searchable repo listings, use [Libraries](/about/libraries). For how stages connect technically, refer to [Architecture](/about/architecture). + + +## NeMo OSS within NVIDIA NeMo + +NVIDIA NeMo spans data, training, alignment, evaluation, and deployment for generative AI. **NeMo OSS** delivers that pipeline as composable open source libraries you can adopt individually or together: + +| Stage | Role in the pipeline | +| --- | --- | +| **[Data](/get-started/data)** | Prepare and govern training data | +| **[Pretraining](/get-started/pretraining)** | Train and fine-tune models | +| **[RL](/get-started/rl)** | Align and improve models with feedback | +| **[Inference](/get-started/inference)** | Evaluate, export, and safeguard models | +| **[E2E](/get-started/e2e)** | Recipes, orchestration, reference assets | + +Repo names and counts change — [Libraries](/about/libraries) is the searchable catalog. Each library's docs site has install steps and tutorials. + +## NeMo Framework and NeMo Platform + +These names sound similar but point at different layers of NeMo OSS: + +| | **NeMo Framework** | **NeMo Platform** | +| --- | --- | --- | +| **What it is** | The composable **library pipeline** in this org — data, training, RL, evaluation, export, guardrails | An integrated **CLI, Python SDK, and web UI** for shipping **agents** | +| **You adopt it when…** | You want individual libraries (or the multi-library NGC stack) for the **model lifecycle** | You want one local setup to **evaluate, secure, tune, and deploy agents** using those libraries | +| **Typical entry** | Pick a stage → library docs, or pull the Framework container | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | +| **Docs** | This hub + per-library Fern sites | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | + +**NeMo Framework** also names the multi-library NGC container (`nvcr.io/nvidia/nemo`) — refer to [Concepts](/about/concepts#framework-and-platform) for how the terms relate. Platform composes libraries such as Guardrails, Evaluator, and Data Designer; it does not replace training libraries like AutoModel or Megatron-Bridge. + +Many teams train with **AutoModel or Megatron-Bridge**, align with **NeMo RL**, benchmark with **Evaluator**, and serve with **Export-Deploy**. Agent builders can start from **NeMo Platform** instead of wiring those pieces manually. + +## What this hub covers and product docs + +| | **NeMo OSS hub** (this site) | **Per-library docs** | +| --- | --- | --- | +| **Audience** | Choosing a library, comparing stages, container releases | Using one product deeply | +| **Content** | Catalog, get-started by stage, release notes | APIs, tutorials, recipes | +| **Scope** | 22 public GitHub repos in NVIDIA-NeMo | One library at a time | + +## Choosing AutoModel or Megatron-Bridge + +Both train large language models (LLMs) and vision language models (VLMs) on NVIDIA GPUs; they target different scale and stack preferences: + +| | **AutoModel** | **Megatron-Bridge** | +| --- | --- | --- | +| **Stack** | PyTorch / Hugging Face native | Megatron-Core | +| **Typical scale** | 1–1,000 GPUs | 1,000+ GPUs | +| **Checkpoint flow** | HF models day-0 | HF ↔ Megatron conversion | +| **Best for** | Fine-tuning, research, rapid iteration | Large-scale pretraining and SFT | + +Many teams use **AutoModel or Megatron-Bridge for training**, then **NeMo RL for alignment**, **Evaluator for benchmarks**, and **Export-Deploy for serving**. Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) directly. + +## Related entry points + + + + +Search and filter all 22 NVIDIA-NeMo repositories. + + + +How lifecycle stages and backends fit together. + + + +Quickstart, installation, and guides by stage. + + + +Agent evaluate, harden, tune, and deploy — CLI, SDK, and Studio. + + + +Broader NeMo software suite beyond this OSS catalog. + + + diff --git a/fern/docs/pages/about/libraries.mdx b/fern/docs/pages/about/libraries.mdx new file mode 100644 index 0000000..ce44128 --- /dev/null +++ b/fern/docs/pages/about/libraries.mdx @@ -0,0 +1,26 @@ +--- +title: Libraries +subtitle: Open source libraries in the NVIDIA-NeMo GitHub organization +slug: about/libraries +position: 5 +--- + +import RepoCatalog from "@/components/RepoCatalog"; + +Browse all **22 open source libraries** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) by lifecycle stage. For how NeMo OSS fits in NVIDIA NeMo, refer to [Ecosystem](/about/ecosystem). For pipeline structure, refer to [Architecture](/about/architecture). + + + +## How libraries are grouped + +Stages match the [NVIDIA-NeMo org README](https://github.com/NVIDIA-NeMo) lifecycle table. Use **tags** on each card (or the search box) for cross-cutting facets like `speech`, `evaluation`, or `agents`. + +| Stage | What lives here | +| --- | --- | +| **Data** | Curation, synthetic data, PII handling, SDG pipelines | +| **Pretraining** | Model training and fine-tuning (AutoModel, Megatron-Bridge, Speech, optimizers) | +| **RL** | Post-training alignment, environments, agent rollout | +| **Inference** | Evaluation, export, serving, guardrails, agent platform | +| **E2E** | Reference pipelines, recipes, experiment orchestration, CI templates | + +Libraries without a published docs site link to GitHub README or microservice docs. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/). diff --git a/fern/docs/pages/about/release-notes/containers.mdx b/fern/docs/pages/about/release-notes/containers.mdx new file mode 100644 index 0000000..f416c92 --- /dev/null +++ b/fern/docs/pages/about/release-notes/containers.mdx @@ -0,0 +1,14 @@ +--- +title: Container releases +subtitle: NGC container announcements and version metadata +slug: about/release-notes/containers +position: 2 +--- + +import ContainerCatalog from "@/components/ContainerCatalog"; + +NGC containers bundle tested versions of NeMo libraries for training, alignment, evaluation, and deployment. Use the catalog below to compare images, filter by type or lifecycle stage, and pull the right container for your workload. + +Refer to the [release notes overview](/about/release-notes) for how this hub fits together, and [known issues](/about/release-notes/known-issues) for cross-component problems on Framework releases. + + diff --git a/fern/docs/pages/about/release-notes/index.mdx b/fern/docs/pages/about/release-notes/index.mdx new file mode 100644 index 0000000..e180c21 --- /dev/null +++ b/fern/docs/pages/about/release-notes/index.mdx @@ -0,0 +1,40 @@ +--- +title: Release notes +subtitle: Version history and release information for NeMo OSS +slug: about/release-notes +position: 1 +--- + +Release notes and version history for **NeMo OSS** — open source libraries in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) on GitHub. + +This hub publishes **NeMo Framework container** announcements. Per-library release notes and docs live on each project's own site — find them from [Libraries](/about/libraries). + + + + +Latest NGC tags, pull commands, component versions, and release history links. + + + +Cross-component container issues by release tag. + + + +Search all 22 repos — each card links to docs, GitHub, and release notes. + + + + +## Release metadata + + + + +PyTorch, Megatron-Core, Transformer Engine, and bundled library versions per container — canonical for 26.02+. + + + +Cross-component container issues by release tag. + + + diff --git a/fern/docs/pages/about/release-notes/known-issues.mdx b/fern/docs/pages/about/release-notes/known-issues.mdx new file mode 100644 index 0000000..be7b29d --- /dev/null +++ b/fern/docs/pages/about/release-notes/known-issues.mdx @@ -0,0 +1,143 @@ +--- +title: Known issues +subtitle: Cross-component issues for NeMo Framework containers +slug: about/release-notes/known-issues +position: 3 +--- + +Known issues for **NeMo Framework** NGC containers (`nvcr.io/nvidia/nemo`). Find your container tag on [Container releases](/about/release-notes/containers), then check the matching section below. + + + + +Recent container tags and bundled component versions. + + + +Pinned package versions for 26.02+ containers. + + + + +## 26.02 + +See component release notes for library-specific known issues: + +- [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +- [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) +- [Run](https://docs.nvidia.com/nemo/run/latest/) +- [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) + +## 25.11 + +See component release notes for library-specific known issues: + +- [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +- [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) +- [Run](https://docs.nvidia.com/nemo/run/latest/) +- [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) + +## 25.09 + +### AutoModel + +- Knowledge distillation validation has a known issue. Set `--step_scheduler.val_every_steps=9223372036854775807` to bypass the issue. + +### Megatron-Bridge + +- Pretraining DeepSeek in subchannel FP8 precision is not working. Pretraining DeepSeek with current scaling FP8 is a workaround, but MTP loss does not converge. + +## 25.07 + +- DeepSeek model pretraining has a memory spike at the end of training, after the validation loop and checkpoint saving. The memory spike is linked to the cross-entropy layer. This may lead to an NCCL error at the end of training. +- When fine-tuning with CP > 1, you might need to set `calculate_per_token_loss = True` for some cases. It depends on the dataset you choose. Note that this will result in slightly different loss from before, but both will lead to model convergence. +- TensorRT-LLM has to be installed in order to run the ONNX export tutorial for LLM embedding models in the Finetuning Llama 3.2 Model into Embedding Model tutorial. Use the [Export-Deploy install instructions](https://github.com/NVIDIA-NeMo/Export-Deploy). +- Exporting with ONNX requires `transformers` v4.51. By default, the container comes with v4.53. Consider downgrading with `uv pip install transformers==4.51.0`. For use cases outside the container, use `pip install transformers==4.51.0`. +- Distributed checkpoint saving fails for Nemotron-h 47B and 56B on GB200. No issues observed on H100 or B200. + +## 25.04.02 and 25.04.01 + +- **Tensor-Parallel Communication Overlap:** Functional errors may occur with specific tensor-parallel communication overlap configurations, including AllGather+GEMM overlap and the ring-exchange algorithm when `aggregate=True`. +- **LayerNorm Bias Accuracy:** Training models using LayerNorm with bias (e.g., StarCoder2) might exhibit accuracy issues. A fix is available in TransformerEngine commit 1569. This fix is not yet included in the current NeMo release container. **Workaround:** Manually mount or pip install the latest TransformerEngine version in your container. +- **Large Model Checkpoint NaN Errors (T5 11B, StarCoder2 7B):** Loading trained checkpoints for fine-tuning T5 (11B) and StarCoder2 (7B) models may result in NaN values. This is suspected to be a checkpoint saving/loading error. A potential fix is in Megatron Core PR 48cc46f. This fix is currently under testing. +- **MXFP8 Memory Usage:** MXFP8 is currently using more memory than expected. A fix is in progress. +- **FP8 in AutoModel Workflow:** Using FP8 in the AutoModel workflow requires manually setting `use_linear_ce_loss` to `False`. Alternatively, upgrade NeMo to commit `64f0fa`. FP8 support for Mixture of Experts (MoE) models is planned for a future release. +- **HF Export for Llama-3_3-Nemotron-Super-49B-v1:** Hugging Face export is not currently supported for the Llama-3_3-Nemotron-Super-49B-v1 model. + +## 25.04.00 + +- Llama 4 accuracy may degrade slightly due to an issue with the order of sigmoid application in the expert routing logic. This has been fixed in [Megatron-LM](https://github.com/NVIDIA/Megatron-LM). However, the fix is not yet included in the current NeMo release container. To apply the fix, manually mount the updated Megatron Core source when building or running your container. +- Resuming from local checkpoints using the `get_global_step_from_global_checkpoint_path` utility function may face challenges with auto-inserted metrics in the path. This is fixed in [NeMo#13012](https://github.com/NVIDIA/NeMo/pull/13012). However, the fix is not yet included in the current NeMo release container. +- Tensor-parallel communication overlap with AllGather+GEMM overlap and the ring-exchange algorithm with `aggregate=True` may have functional errors. +- In `scripts/vlm/automodel.py`, the `gbs` argument is a string instead of an integer. Additionally, this script needs to be run via `torchrun` for devices > 1. +- There might be accuracy issues when training models that use LayerNorm with bias (e.g., StarCoder2). This issue has been addressed in [TransformerEngine](https://github.com/NVIDIA/TransformerEngine). However, the fix is not yet included in the current NeMo release container. **Workaround:** Manually mount or pip install the latest TransformerEngine in your container. +- T5 and StarCoder for large config models (11B for T5, 7B for StarCoder2) may get NaN values when loading trained checkpoints for fine-tuning. We suspect a checkpoint saving/loading error, which is supposed to be fixed with a recent Megatron Core PR. Currently we are testing this fix. +- MXFP8 currently uses more memory than expected and we are still fixing it. +- FP8 in the AutoModel workflow requires setting `use_linear_ce_loss` to `False` manually, or upgrading NeMo to commit `64f0fa`. FP8 support for MoE models is scheduled for a future release. +- No HF export support for Llama-3_3-Nemotron-Super-49B-v1. + +## 25.02 + +### AutoModel + +- Primarily a functional release; performance improvements are planned for future versions. +- For large models (e.g., > 40B) trained with FSDP2, checkpoint saving can take longer than expected. +- Support for long sequences is currently limited, especially for large models > 30B. +- Models with external dependencies may fail to run if dependencies are unavailable (e.g., missing package leading to failed import). +- A small percentage of models available via `AutoModelForCausalLM` may only support inference, and have training capabilities explicitly disabled. +- Support for FSDP2 with mixed weights models (e.g. FP8 + BF16) is scheduled for future releases. +- Support for Context Parallelism with sequence packing + padding between sequences is currently broken (see [NeMo#12174](https://github.com/NVIDIA/NeMo/issues/12174)). Use 24.12 or upgrade to TE 2.0+ for working support. +- MoE based models are seeing instability with training. Please continue to use 24.12 for MoE training until 25.02 is patched with the fix for MoE. + +## 24.12 and earlier + +### Framework and training + +- In 24.12, NeMo switched from `pytorch_lightning` to `lightning.pytorch`. If you have custom code that imports `pytorch_lightning`, replace the import with `lightning.pytorch`. Failing to do so results in `ValueError: Expected a parent`. +- When using a 24.12 container or later with LM Evaluation Harness, upgrade LM Evaluation Harness to include the required commit. Otherwise you may see `ValueError: You selected an invalid strategy name...`. +- Restoring model context for NeMo 2.0 checkpoints produced using the NeMo 24.09 container fails when building `OptimizerConfig` from `megatron.core.optimizer.optimizer_config`, as `overlap_grad_reduce` and `overlap_param_gather` were moved in Megatron Core. The `update_io_context.py` script drops unknown parameters from the checkpoint context to make it compatible with the latest container. +- Griffin's (NeMo 1.0) full fine-tuning has checkpoint loading issues; the state dicts do not match between the provided checkpoint and the initialized model. Use the 24.07 container if this model is needed. +- Pretrain Gemma 2 27b recipe needs at least 2 nodes; currently the recipe has the default number of nodes set to 1. +- The Megatron Core Distributed Optimizer currently lacks memory capacity optimization, resulting in higher model state memory usage at small data parallel sizes. +- The overlap of the data-parallel parameter AllGather with `optimizer.step` (`overlap_param_gather_with_optimizer=true`) does not work with distributed checkpointing. +- Support for converting models from NeMo 2.0 to 1.0 is not yet available. +- Transformer Engine changed checkpoint metadata after v1.10, which can cause checkpoint incompatibilities. **Workaround:** use `model.dist_ckpt_load_strictness=log_all` when working with Transformer Engine v1.10 or higher. See [software component versions](https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html) for TE versions per container. +- For data preparation of GPT models, use your own dataset or an online dataset legally approved by your organization. +- A race condition in the NeMo experiment manager can occur when multiple processes or threads attempt to access and modify shared resources simultaneously. +- The Mistral and Mixtral tokenizers require a Hugging Face login. + +### Export and deployment + +- Exporting Gemma, Starcoder, and Falcon 7B models to TRT-LLM only works with a single GPU. If you attempt to export with multiple GPUs, no descriptive error message is shown. +- Export Llama70B vLLM causes an out-of-memory issue. +- Export vLLM does not support LoRA and P-tuning. +- In-framework (PyTorch level) deployment with 8 GPUs is encountering an error. +- Query script under `scripts/deploy/nlp/query.py` returns the error `'output_generation_logits'` in the 24.12 container. + +### Notebooks and tutorials + +The following notebooks had functional issues at the time of the 24.12 release: + +- `ASR_with_NeMo.ipynb` +- `ASR_with_Subword_Tokenization.ipynb` +- `AudioTranslationSample.ipynb` +- `Megatron_Synthetic_Tabular_Data_Generation.ipynb` +- `SpellMapper_English_ASR_Customization.ipynb` +- `FastPitch_ChineseTTS_Training.ipynb` +- `NeVA Tutorial.ipynb` +- `NeMo_Forced_Aligner_Tutorial.ipynb` — use the 24.09 container if this notebook is needed. + +### Multimodal + +- LITA tutorial: the data preparation part in `tutorials/multimodal/LITA_Tutorial.ipynb` requires you to manually download the YouMakeup dataset instead of using the provided script. +- Add `exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True` to the NeVA notebook pretraining procedure to ensure an end-to-end workflow. + +### ASR + +- Timestamp misalignment occurs in FastConformer ASR models when using the ASR decoder for diarization. Related issue: [NeMo#8438](https://github.com/NVIDIA/NeMo/issues/8438). + +## Report a new issue + +Open an issue or discussion in the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) repository that owns the component — use [Libraries](/about/libraries) to find the right repo and docs site. For problems that span a container release, use [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) and note your container tag (for example `nvcr.io/nvidia/nemo:26.02`). + +When a new container ships, add cross-component known issues to the matching section above. diff --git a/fern/docs/pages/get-started/data.mdx b/fern/docs/pages/get-started/data.mdx new file mode 100644 index 0000000..603fcb7 --- /dev/null +++ b/fern/docs/pages/get-started/data.mdx @@ -0,0 +1,36 @@ +--- +title: Data +subtitle: Get started with data curation and synthetic data libraries +slug: get-started/data +position: 10 +--- + +import StageGuide from "@/components/StageGuide"; + +Libraries for **Data** — curation, synthetic generation, PII handling, and SDG pipelines. Install and tutorial steps live on each library's docs site; this page lists every active repo in this stage. + +```mermaid +flowchart LR + DATA["Data"] --> PRE["Pretraining"] --> RL["RL"] --> INF["Inference"] + E2E["E2E"] -.-> DATA & PRE & RL & INF + + classDef active fill:#76B900,stroke:#3d5c00,stroke-width:3px,color:#fff + classDef dim fill:#f5f5f5,stroke:#ccc,color:#666 + + class PRE,RL,INF,E2E dim + class DATA active +``` + + + +## Typical workflow + +1. **Curate** raw corpora with [Curator](https://docs.nvidia.com/nemo/curator/latest/) (dedup, filtering, multimodal pipelines). +2. **Generate** synthetic data with [Data Designer](https://nvidia-nemo.github.io/DataDesigner/latest/) or domain SDG tools. +3. **Protect** sensitive fields with [Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer) before sharing or training. + +Hand off curated datasets to [Pretraining](/get-started/pretraining) or [RL](/get-started/rl) libraries. + + +Pre-built container for large-scale curation jobs. + diff --git a/fern/docs/pages/get-started/e2e.mdx b/fern/docs/pages/get-started/e2e.mdx new file mode 100644 index 0000000..973fb81 --- /dev/null +++ b/fern/docs/pages/get-started/e2e.mdx @@ -0,0 +1,42 @@ +--- +title: E2E +subtitle: Get started with recipes, pipelines, and orchestration +slug: get-started/e2e +position: 14 +--- + +import StageGuide from "@/components/StageGuide"; + +Libraries for **E2E** — reference pipelines, Nemotron recipes, synthetic-data workflows, and experiment launchers that tie the stack together. + +```mermaid +flowchart LR + DATA["Data"] --> PRE["Pretraining"] --> RL["RL"] --> INF["Inference"] + E2E["E2E"] -.-> DATA & PRE & RL & INF + + classDef active fill:#76B900,stroke:#3d5c00,stroke-width:3px,color:#fff + classDef dim fill:#f5f5f5,stroke:#ccc,color:#666 + + class DATA,PRE,RL,INF dim + class E2E active +``` + + + +## Orchestration + + +Configure, launch, and manage experiments on local machines, SLURM, and Kubernetes. + + +## Reference assets + + +Pipelines for synthetic data generation and evaluation (math, code, science). + + + +Cookbooks, datasets, and reference examples for Nemotron models. + + +Connect upstream stages: [Data](/get-started/data) → [Pretraining](/get-started/pretraining) → [RL](/get-started/rl) → [Inference](/get-started/inference). diff --git a/fern/docs/pages/get-started/index.mdx b/fern/docs/pages/get-started/index.mdx new file mode 100644 index 0000000..b851570 --- /dev/null +++ b/fern/docs/pages/get-started/index.mdx @@ -0,0 +1,61 @@ +--- +title: Get Started +subtitle: Choose your path into NeMo OSS +slug: get-started +position: 1 +--- + +Pick how you want to begin — a fast quickstart, installation options, or guidance by **lifecycle stage** (the same columns as the [org README](https://github.com/NVIDIA-NeMo) and [Libraries](/about/libraries) catalog). + + + + +Fastest paths: AutoModel fine-tuning and NeMo Speech inference. + + + +pip, NGC containers, scale, and backend choice. + + + + +## By lifecycle stage + + + + +Curation, synthetic data, PII handling, and SDG pipelines. + + + +Model training and fine-tuning — AutoModel, Megatron-Bridge, Speech. + + + +Post-training alignment, environments, and agent rollout. + + + +Evaluation, export, serving, guardrails, and NeMo Platform for agents. + + + +Reference pipelines, recipes, and experiment orchestration. + + + + +## Decision guide + +| I want to… | Scale | Start here | +| --- | --- | --- | +| Curate or synthesize data | Any | [Data](/get-started/data) | +| Train or fine-tune | ≤1K GPUs | [Pretraining](/get-started/pretraining) → AutoModel | +| Train at scale | 1K+ GPUs | [Pretraining](/get-started/pretraining) → Megatron-Bridge | +| Align (DPO/GRPO) | Any | [RL](/get-started/rl) | +| Evaluate or deploy a model | Any | [Inference](/get-started/inference) | +| Ship or harden agents | Any | [Inference](/get-started/inference) → [NeMo Platform](https://nvidia-nemo.github.io/nemo-platform/main/) | +| Run end-to-end recipes | Any | [E2E](/get-started/e2e) | +| Speech AI | Any | [Pretraining](/get-started/pretraining) → NeMo Speech | + +Not sure which library owns your workload? Browse all repos on [Libraries](/about/libraries). diff --git a/fern/docs/pages/get-started/inference.mdx b/fern/docs/pages/get-started/inference.mdx new file mode 100644 index 0000000..2ba62e1 --- /dev/null +++ b/fern/docs/pages/get-started/inference.mdx @@ -0,0 +1,42 @@ +--- +title: Inference +subtitle: Get started with evaluation, export, and deployment +slug: get-started/inference +position: 13 +--- + +import StageGuide from "@/components/StageGuide"; + +Libraries for **Inference** — benchmarking, export to serving stacks, guardrails, and agent platforms. + +```mermaid +flowchart LR + DATA["Data"] --> PRE["Pretraining"] --> RL["RL"] --> INF["Inference"] + E2E["E2E"] -.-> DATA & PRE & RL & INF + + classDef active fill:#76B900,stroke:#3d5c00,stroke-width:3px,color:#fff + classDef dim fill:#f5f5f5,stroke:#ccc,color:#666 + + class DATA,PRE,RL,E2E dim + class INF active +``` + + + +## Typical workflow (models) + +1. **Evaluate** with [Evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) across 100+ harnesses. +2. **Export** to vLLM, TensorRT-LLM, or ONNX with [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/). +3. **Guard** production apps with [Guardrails](https://docs.nvidia.com/nemo/guardrails/latest/). + +Models usually come from [Pretraining](/get-started/pretraining) or [RL](/get-started/rl). For bundled Framework container versions, refer to [Container releases](/about/release-notes/containers). + +## Shipping agents + +If you are building **agents** (not just serving a fine-tuned checkpoint), [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates evaluation, guardrails, tuning, and deployment in one setup — CLI, SDK, and Studio UI. You can adopt Evaluator or Guardrails standalone; Platform is the umbrella when you want those loops wired together. + + +Setup, CLI, and docs — evaluate, secure, and optimize agents with NeMo libraries. + + +For related guidance, refer to [Framework and Platform](/about/concepts#framework-and-platform) and [Ecosystem](/about/ecosystem#nemo-framework-and-nemo-platform). diff --git a/fern/docs/pages/get-started/installation.mdx b/fern/docs/pages/get-started/installation.mdx new file mode 100644 index 0000000..5d163b4 --- /dev/null +++ b/fern/docs/pages/get-started/installation.mdx @@ -0,0 +1,66 @@ +--- +title: Installation +subtitle: pip, containers, and choosing a backend +slug: get-started/installation +position: 3 +--- + +How to install NeMo OSS libraries and pick a backend for your GPU scale. For a minimal first run, start with [Quickstart](/get-started/quickstart). + +## pip install (recommended for development) + +| Workload | Install | Docs | +| --- | --- | --- | +| Hugging Face large language model (LLM) and vision language model (VLM) training | `pip install nemo-automodel` | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | +| Alignment (DPO, GRPO, SFT) | NeMo RL repo | [NeMo RL](https://docs.nvidia.com/nemo/rl/latest/) | +| Speech ASR/TTS | `pip install nemo_toolkit[asr,tts]` | [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) | +| Data curation | Curator repo | [Curator](https://docs.nvidia.com/nemo/curator/latest/) | + +Each library publishes install extras and version pins in its own documentation. Use [Libraries](/about/libraries) to find the repo and docs site for your stage. + +## NGC containers (recommended for production stacks) + +Pre-built images bundle tested dependency sets. Use the [container catalog](/about/release-notes/containers) for current tags and pull commands. + +The **NeMo Framework** image (`nvcr.io/nvidia/nemo`) is the multi-library training stack. Standalone images exist for AutoModel, RL, and Curator. + +## Scale and backends + +| GPUs | Libraries | Checkpoint conversion | Notes | +| --- | --- | --- | --- | +| 1–1,000 | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel), [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Not required for HF-native training | PyTorch / Hugging Face path | +| 1,000+ | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [HF ↔ Megatron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/README.md) | Megatron-Core at scale | + +Stage-specific libraries and entry points: + + + + +Curator, Data Designer, anonymization, SDG. + + + +AutoModel, Megatron-Bridge, Speech, optimizers. + + + +NeMo RL, Gym, agent server. + + + +Evaluator, Export-Deploy, Guardrails. + + + +Skills, Nemotron recipes, NeMo Run. + + + + +## Experiment tracking + + +Launch and track experiments on local machines, SLURM, and Kubernetes. + + +Also listed under [E2E](/get-started/e2e). diff --git a/fern/docs/pages/get-started/pretraining.mdx b/fern/docs/pages/get-started/pretraining.mdx new file mode 100644 index 0000000..62568d0 --- /dev/null +++ b/fern/docs/pages/get-started/pretraining.mdx @@ -0,0 +1,38 @@ +--- +title: Pretraining +subtitle: Get started with model training and fine-tuning +slug: get-started/pretraining +position: 11 +--- + +import StageGuide from "@/components/StageGuide"; + +Libraries for **Pretraining** — from Hugging Face fine-tuning on a single node to thousand-GPU Megatron pretraining and Speech AI. Use AutoModel for large language models (LLMs) and vision language models (VLMs) at modest scale; use Megatron-Bridge for cluster-scale jobs. + +```mermaid +flowchart LR + DATA["Data"] --> PRE["Pretraining"] --> RL["RL"] --> INF["Inference"] + E2E["E2E"] -.-> DATA & PRE & RL & INF + + classDef active fill:#76B900,stroke:#3d5c00,stroke-width:3px,color:#fff + classDef dim fill:#f5f5f5,stroke:#ccc,color:#666 + + class DATA,RL,INF,E2E dim + class PRE active +``` + + + +## Choose a path + +| Goal | GPUs | Library | +| --- | --- | --- | +| Fine-tune Hugging Face LLMs and VLMs | 1–1,000 | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | +| Large-scale pretrain / SFT | 1,000+ | [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Speech ASR, TTS, speech-LM | Any | [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) | + +Fastest first run: [Quickstart](/get-started/quickstart). Install details: [Installation](/get-started/installation). + +Model recipes, example configs, and supported architectures live on each library's docs site — for example [AutoModel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples) and [Megatron-Bridge recipes](https://docs.nvidia.com/nemo/megatron-bridge/latest/). + +Post-training alignment: [RL](/get-started/rl). diff --git a/fern/docs/pages/get-started/quickstart.mdx b/fern/docs/pages/get-started/quickstart.mdx new file mode 100644 index 0000000..4c12ea5 --- /dev/null +++ b/fern/docs/pages/get-started/quickstart.mdx @@ -0,0 +1,46 @@ +--- +title: Quickstart +subtitle: Fast paths to your first result on NVIDIA GPUs +slug: get-started/quickstart +position: 2 +--- + +Minimal steps to validate your setup. For install options and containers, refer to [Installation](/get-started/installation). + +## Fine-tune with AutoModel + +The fastest on-ramp for Hugging Face large language models (LLMs) and vision language models (VLMs) on one or more GPUs. Install and run the current quick start on the AutoModel docs site — model names, scripts, and cluster options change frequently. + + +Local workstation and cluster launch options (canonical, kept up to date by the AutoModel team). + + +More pretraining paths (Megatron-Bridge, recipes, scale): [Pretraining](/get-started/pretraining). + +## Speech inference + +Use the NeMo Speech docs for install extras, model selection, and the current five-minute inference walkthrough. + + +Installation, five-minute inference, model selection, and tutorials. + + +Speech training and full speech-language workflows: [Pretraining](/get-started/pretraining). + +## Next steps + + + + +pip and NGC containers, GPU scale, and backend choice. + + + +Search all 22 repos by stage or tag. + + + +Pull tested NGC images for Framework, AutoModel, RL, and Curator. + + + diff --git a/fern/docs/pages/get-started/rl.mdx b/fern/docs/pages/get-started/rl.mdx new file mode 100644 index 0000000..a767e6e --- /dev/null +++ b/fern/docs/pages/get-started/rl.mdx @@ -0,0 +1,37 @@ +--- +title: RL +subtitle: Get started with alignment and reinforcement learning +slug: get-started/rl +position: 12 +--- + +import StageGuide from "@/components/StageGuide"; + +Libraries for **RL** — SFT, DPO, GRPO, on-policy distillation, RL environments, and agent rollout infrastructure. + +```mermaid +flowchart LR + DATA["Data"] --> PRE["Pretraining"] --> RL["RL"] --> INF["Inference"] + E2E["E2E"] -.-> DATA & PRE & RL & INF + + classDef active fill:#76B900,stroke:#3d5c00,stroke-width:3px,color:#fff + classDef dim fill:#f5f5f5,stroke:#ccc,color:#666 + + class DATA,PRE,INF,E2E dim + class RL active +``` + + + +## Common entry points + +| Technique | Start in docs | Library | +| --- | --- | --- | +| GRPO, DPO, SFT | [NeMo RL examples](https://docs.nvidia.com/nemo/rl/latest/) | NeMo RL | +| RL environments | [NeMo Gym](https://docs.nvidia.com/nemo/gym/latest/index.html) | Gym | + +Train base models first through [Pretraining](/get-started/pretraining), then align here. Evaluate with [Inference](/get-started/inference) libraries. + + +Pre-built alignment and RL container. + diff --git a/fern/docs/pages/getting-started.mdx b/fern/docs/pages/getting-started.mdx deleted file mode 100644 index 3fccf33..0000000 --- a/fern/docs/pages/getting-started.mdx +++ /dev/null @@ -1,85 +0,0 @@ ---- -title: Getting Started -subtitle: Pick the right library and recipe for your workload ---- - -## Quick start with AutoModel - -The fastest path to fine-tuning Hugging Face models on NVIDIA GPUs: - -```bash -pip install nemo-automodel -``` - -```python -from nemo_automodel import AutoModelForCausalLM, Trainer - -model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.3-70B-Instruct") -trainer = Trainer(model=model, train_dataset=dataset) -trainer.train() -``` - - -Local workstation and cluster launch options. - - -## Decision guide - -| I want to… | Models | Scale | Library | Documentation | -| --- | --- | --- | --- | --- | -| Train or fine-tune | LLM, VLM | ≤1K GPUs | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | -| Train at scale | LLM, VLM | 1K+ GPUs | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| Align (DPO/GRPO) | LLM, VLM | Any | [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [docs](https://docs.nvidia.com/nemo/rl/latest/) | -| Curate data | — | Any | [Curator](https://github.com/NVIDIA-NeMo/Curator) | [docs](https://docs.nvidia.com/nemo/curator/latest/) | -| Evaluate | Any | — | [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | -| Deploy | Any | — | [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | [docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | -| Speech AI | ASR, TTS | Any | [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | - -## Scale and backends - -| GPUs | Installation | Checkpoint conversion | LLM recipes | VLM recipes | -| --- | --- | --- | --- | --- | -| 1–1,000 | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel), [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Not required | [Pretrain](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-pre-training), [SFT](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-supervised-fine-tuning-sft), [LoRA](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#llm-parameter-efficient-fine-tuning-peft), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py) | [SFT](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#vlm-supervised-fine-tuning-sft), [LoRA](https://github.com/NVIDIA-NeMo/Automodel?tab=readme-ov-file#vlm-parameter-efficient-fine-tuning-peft), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py) | -| 1,000+ | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [HF ↔ Megatron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/examples/conversion/README.md) | [Pretrain, SFT, LoRA](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py) | [SFT, LoRA](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/configs/vlm_grpo_3B_megatron.yaml) | - -## Training recipes by library - -| Library | LLM recipes | VLM recipes | -| --- | --- | --- | -| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | [Llama](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/llama3_2), [Qwen](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/qwen), [Gemma](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/gemma), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Automodel/blob/main/examples/llm_pretrain/deepseekv3_pretrain.yaml), [Mistral](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/mistral), [Phi](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/llm_finetune/phi) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/qwen2_5), [Gemma 3n VL](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples/vlm_finetune/gemma3n) | -| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | [Llama](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/llama/llama3.py), [Qwen](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen/qwen2.py), [DeepSeek V3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/deepseek/deepseek_v3.py), [Gemma 3](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma/gemma3.py), [Nemotron](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/nemotronh/nemotronh.py) | [Gemma 3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/gemma3_vl/gemma3_vl.py), [Qwen2.5 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen25_vl.py), [Qwen3 VL](https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/src/megatron/bridge/recipes/qwen_vl/qwen3vl.py) | -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_grpo_math.py), [DPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_dpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_sft.py) | [GRPO](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_grpo.py), [SFT](https://github.com/NVIDIA-NeMo/RL/blob/main/examples/run_vlm_sft.py) | - -## NGC containers - -Pull optimized containers to get started quickly: - - - - -Megatron-Bridge, Evaluator, Export-Deploy, Run. - - - -PyTorch-native distributed training. - - - -Alignment and reinforcement learning. - - - -Data preprocessing and curation. - - - -Browse all NeMo containers. - - - - -## Experiment tracking - - -Launch and track experiments on local machines, SLURM, and Kubernetes. - diff --git a/fern/docs/pages/index.mdx b/fern/docs/pages/index.mdx index 35006a4..3bb7e3f 100644 --- a/fern/docs/pages/index.mdx +++ b/fern/docs/pages/index.mdx @@ -1,119 +1,55 @@ --- -title: NVIDIA NeMo Framework -subtitle: GPU-accelerated libraries for training, curation, evaluation, alignment, and deployment +title: NeMo OSS +subtitle: Open source libraries from the NVIDIA-NeMo GitHub organization slug: "" --- -NeMo Framework is NVIDIA's open-source suite for large language models, multimodal models, diffusion, and speech. Scale pretraining, post-training, and reinforcement learning from a single GPU to thousand-node clusters with Hugging Face/PyTorch and Megatron backends. +**NeMo OSS** is the hub for NVIDIA's public, open source NeMo libraries — the repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) on GitHub. Scale pretraining, post-training, and reinforcement learning from a single GPU to thousand-node clusters with Hugging Face/PyTorch and Megatron backends. For production **agents**, [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates evaluation, guardrails, and tuning in one CLI and SDK. -The [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization hosts modular libraries and recipes so you can compose only what you need. NeMo Framework is also part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite for the AI agent lifecycle. - - -The framework is restructuring from the monolithic NeMo 2.0 repo into focused libraries (AutoModel, RL, Gym, Curator, and more). Speech AI remains in the legacy [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repository. - +These projects are part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, which also includes commercial products and services beyond this open source catalog. ## Choose your path - -Decision guide, installation paths, and training recipes by scale. + +NeMo OSS within NVIDIA NeMo and how libraries relate. - -Search and filter all 23 NVIDIA-NeMo repos by category. + +Pipeline layers, backends, and containers. - -Discussions, announcements, and how to contribute. + +Quickstart, installation, and guides by lifecycle stage. - - -## Start here - - - - -Fine-tune Hugging Face models on NVIDIA GPUs — the simplest on-ramp for most users. + +Search and filter all 22 NVIDIA-NeMo libraries by category. - -Large-scale pretraining and SFT at 1,000+ GPUs with Megatron-Core. + +Latest NGC container announcements and version metadata. - -SFT, DPO, GRPO, and on-policy distillation for LLMs and VLMs. - - - - -## Pipeline overview - -```mermaid -flowchart LR - subgraph Data - Curator - DataDesigner[Data Designer] - Skills - end - - subgraph Training - AutoModel - MBridge[Megatron-Bridge] - end - - subgraph Alignment - RL[NeMo RL] - end - - subgraph Evaluation - Evaluator - end - - subgraph Deployment - Export[Export-Deploy] - Guardrails - end - - Gym[NeMo Gym] - - Data --> Training - Training --> Alignment - Training --> Evaluation - Alignment --> Evaluation - Evaluation --> Deployment - - Gym -.-> RL - Skills -.-> Evaluator -``` - -## Popular libraries - - - - -Scalable data preprocessing and curation for LLMs and multimodal data. + +Overview, container releases, and known issues. - -Model benchmarking across 100+ evaluation harnesses. + +Cross-component container issues by release tag. - -Export to vLLM, TensorRT-LLM, ONNX, and production serving. + +Evaluate, harden, tune, and deploy agents — CLI, SDK, and Studio UI. - -Programmable safety rails for LLM applications. + +Discussions, announcements, and how to contribute. - -Launch experiments on local machines, SLURM, or Kubernetes. - + - -RL environments for model and agent improvement. - +## Pipeline overview - +Refer to [Architecture](/about/architecture) for the full pipeline diagram, training backends, and how NGC containers bundle libraries. diff --git a/fern/docs/pages/libraries.mdx b/fern/docs/pages/libraries.mdx deleted file mode 100644 index 88a3a86..0000000 --- a/fern/docs/pages/libraries.mdx +++ /dev/null @@ -1,146 +0,0 @@ ---- -title: Libraries Overview -subtitle: Lifecycle-oriented summary — use the repository catalog for the full list ---- - -For a searchable catalog of **all 23 repositories** in the organization, see **[Repositories](/repositories)**. - -Each library has its own repository, documentation site, and (where applicable) NGC container. The sections below highlight the main projects by pipeline stage. - -## Data - - - - -Data curation at scale for text, image, video, and audio. - - - -Synthetic data generation from scratch or seed data. - - - -Reference pipelines for synthetic data generation and evaluation. - - - -Privacy-preserving synthetic tabular data. - - - -PII detection and context-aware anonymization. - - - - -| Repo | GitHub | Container | -| --- | --- | --- | -| Curator | [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | [NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) | -| Data Designer | [NVIDIA-NeMo/DataDesigner](https://github.com/NVIDIA-NeMo/DataDesigner) | — | -| Skills | [NVIDIA-NeMo/Skills](https://github.com/NVIDIA-NeMo/Skills) | — | - -## Training - - - - -PyTorch distributed training with day-0 Hugging Face support (LLM, VLM, Omni). - - - -Megatron-Core pretraining, SFT, and LoRA with bidirectional HF conversion. - - - -Speech AI (ASR, TTS) — legacy NeMo repo focused on speech. - - - -Diffusion model training on Megatron-Core. - - - -Collection of cutting-edge optimizers. - - - -Developer asset hub for Nemotron models and recipes. - - - - -| Repo | Backend | Models | Container | -| --- | --- | --- | --- | -| AutoModel | PyTorch | LLM, VLM, Omni | [NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) | -| Megatron-Bridge | Megatron-core | LLM, VLM | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | -| NeMo Speech | Megatron-core | Speech | — | - -## Alignment - - - - -SFT, DPO, GRPO, and distillation with Megatron-core and vLLM backends. - - - -RL environments for evaluating and improving models and agents. - - - - -| Repo | Backend | Models | Container | -| --- | --- | --- | --- | -| NeMo RL | Megatron-core, vLLM | LLM, VLM | [NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) | -| Gym | — | LLM, VLM | — | - -## Evaluation - - - - -Model benchmarking across 100+ benchmarks and 18+ harnesses. - - - - -| Repo | GitHub | Container | -| --- | --- | --- | -| Evaluator | [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | - -## Deployment and safety - - - - -Export to vLLM, TensorRT-LLM, ONNX, and production serving. - - - -Programmable guardrails for LLM conversational systems. - - - - -| Repo | Inference backends | Container | -| --- | --- | --- | -| Export-Deploy | vLLM, TRT-LLM, ONNX | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | -| Guardrails | — | — | - -## Infrastructure - - - - -Experiment launcher for local, SLURM, and Kubernetes workflows. - - - - -| Repo | GitHub | Container | -| --- | --- | --- | -| Run | [NVIDIA-NeMo/Run](https://github.com/NVIDIA-NeMo/Run) | [NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) | - -## Why modular repos? - -The NeMo GitHub organization split the monolithic NeMo 2.0 codebase into focused libraries to improve **composability** (smaller containers, easier discovery) and **customizability** (PyTorch-native training loops in AutoModel, Megatron-Bridge, and RL instead of a single Lightning-centric stack). diff --git a/fern/docs/pages/repositories.mdx b/fern/docs/pages/repositories.mdx deleted file mode 100644 index 71a3d3e..0000000 --- a/fern/docs/pages/repositories.mdx +++ /dev/null @@ -1,22 +0,0 @@ ---- -title: Repositories -subtitle: Every project in the NVIDIA-NeMo GitHub organization -slug: repositories ---- - -Browse all **23 repositories** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) by lifecycle stage. Use search or category filters to find a library, then open its documentation or GitHub repo. - - - -## How repos are grouped - -| Category | What lives here | -| --- | --- | -| **Data** | Curation, synthetic data, PII handling, SDG pipelines | -| **Training** | Pretraining and fine-tuning (AutoModel, Megatron-Bridge, speech, Nemotron) | -| **Alignment & agents** | RL post-training, environments, rollout infrastructure | -| **Evaluation** | Benchmarking and quality measurement | -| **Deployment & safety** | Export, serving, guardrails, agent platform | -| **Infrastructure** | Experiment launchers, shared CI templates, org hub | - -Repos without a published Fern site link to GitHub README or microservice docs until a dedicated docs instance is added. diff --git a/fern/docs/pages/community.mdx b/fern/docs/pages/resources/community.mdx similarity index 68% rename from fern/docs/pages/community.mdx rename to fern/docs/pages/resources/community.mdx index 614093a..63e754b 100644 --- a/fern/docs/pages/community.mdx +++ b/fern/docs/pages/resources/community.mdx @@ -1,6 +1,7 @@ --- title: Community subtitle: Discuss, contribute, and stay up to date +slug: resources/community --- ## Get involved @@ -15,14 +16,28 @@ Questions, ideas, and announcements across the org. Browse and star projects in the NVIDIA-NeMo organization. - -Framework changelog and release history. + +Latest NGC container tags, component versions, and known issues. + + + +Cross-component container issues by release tag. + + + +Container releases and known issues for NeMo Framework. Each repository includes its own `CONTRIBUTING.md` and issue templates. Open issues or discussions in the repo that owns the component you are using. +## NeMo Assist + + +Chat with NeMo documentation and code — try NeMo Assist for guided answers across the open source libraries. + + ## Recent highlights ### AutoModel diff --git a/profile/README.md b/profile/README.md index 88954e7..37d000d 100644 --- a/profile/README.md +++ b/profile/README.md @@ -3,22 +3,19 @@ SPDX-FileCopyrightText: Copyright (c) 2024-2026 NVIDIA CORPORATION & AFFILIATES. SPDX-License-Identifier: Apache-2.0 --> -# NVIDIA NeMo Framework +# NeMo OSS **Train Llama 3.3 · Qwen 2.5 · Mistral · DeepSeek · Gemma · Nemotron on NVIDIA GPUs** -GPU-accelerated, open-source libraries for training, data curation, evaluation, alignment, and deployment. Scale from a single GPU to 10,000+ nodes with Hugging Face or Megatron backends. +Open source GPU libraries for data, training, alignment, evaluation, deployment, and agents. Scale from one GPU to 10,000+ nodes with Hugging Face or Megatron backends. Part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite — this org is the public GitHub home for NeMo OSS. -## Documentation +## Libraries by stage -**[docs.nvidia.com/nemo](https://docs.nvidia.com/nemo)** — framework overview, decision guide, and links to every library's docs. +| Data | Pretraining | RL | Inference | E2E | +| --- | --- | --- | --- | --- | +| [Curator](https://github.com/NVIDIA-NeMo/Curator)
[Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer)
[Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner)
[Safe Synthesizer](https://github.com/NVIDIA-NeMo/Safe-Synthesizer)
[SDG-PGMs](https://github.com/NVIDIA-NeMo/SDG-PGMs) | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge)
[AutoModel](https://github.com/NVIDIA-NeMo/Automodel)
[Speech](https://github.com/NVIDIA-NeMo/NeMo)
[Emerging Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | [RL](https://github.com/NVIDIA-NeMo/RL)
[Gym](https://github.com/NVIDIA-NeMo/Gym)
[ProRL-Agent-Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server) | [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails)
[Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)
[Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy)
[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) | [Skills](https://github.com/NVIDIA-NeMo/Skills)
[Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)
[Run](https://github.com/NVIDIA-NeMo/Run) | -| Start here | Docs | -| --- | --- | -| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) (fine-tune HF models) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | -| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) (1K+ GPUs) | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) (DPO / GRPO) | [docs](https://docs.nvidia.com/nemo/rl/latest/) | -| [Curator](https://github.com/NVIDIA-NeMo/Curator) · [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) · [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | [All 23 repos →](https://docs.nvidia.com/nemo/repositories) | +**[docs.nvidia.com/nemo](https://docs.nvidia.com/nemo)** — NeMo OSS hub: decision guide, all libraries, recipes, and community links. ```bash pip install nemo-automodel From ac45e3e11f7be6d84b6b53de1898d5a9e2fc6f13 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 13:59:44 -0400 Subject: [PATCH 09/18] updates Signed-off-by: Lawrence Lane --- fern/README.md | 5 +- fern/TAXONOMY.md | 53 ++++++++++++++++ fern/components/RepoCatalog.tsx | 21 ++++++- fern/components/repos.ts | 45 ++++++++++++- fern/docs/pages/about/architecture.mdx | 61 +++++++++++++++--- fern/docs/pages/about/concepts.mdx | 71 +++++++++++++++------ fern/docs/pages/about/ecosystem.mdx | 77 +++++++++++++++++------ fern/docs/pages/about/libraries.mdx | 23 +++++-- fern/docs/pages/get-started/index.mdx | 18 +++++- fern/docs/pages/get-started/inference.mdx | 4 +- fern/docs/pages/index.mdx | 57 ++++++++++------- profile/README.md | 2 +- 12 files changed, 348 insertions(+), 89 deletions(-) create mode 100644 fern/TAXONOMY.md diff --git a/fern/README.md b/fern/README.md index a38f709..1c84b82 100644 --- a/fern/README.md +++ b/fern/README.md @@ -2,6 +2,8 @@ Hub site for open source [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub repositories. Routes visitors to each library's documentation using the shared NVIDIA Fern global theme from [fern-components](https://github.com/NVIDIA/fern-components). Commercial NeMo products live outside this catalog — refer to [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/). +**Canonical taxonomy:** [TAXONOMY.md](./TAXONOMY.md) — NeMo OSS, Framework, Platform, stages, and repo kinds. + ## Information architecture This hub follows the NVIDIA canonical doc IA from [`tpl-new-site`](https://gitlab-master.nvidia.com/tech-docs/template-library) (`::tpl site`), adapted for an **ecosystem catalog** rather than a single-product manual: @@ -26,7 +28,7 @@ Per-library docs (Curator, AutoModel, Megatron-Bridge, and so on) stay on their **Keep on the hub** when it helps a reader **choose** at the umbrella level and is likely to stay valid for a long time: -- Lifecycle stages, pipeline shape, and hub and library docs (refer to [Concepts](/about/concepts)) +- Lifecycle stages, pipeline shape, Framework vs Platform, and homonyms (refer to [Concepts](/about/concepts) and [TAXONOMY.md](./TAXONOMY.md)) - AutoModel vs Megatron-Bridge and similar **stable forks** - Catalogs driven from `repos.ts` / `containers.ts` (not hand-maintained repo lists) - Container release metadata and cross-component known issues for Framework tags @@ -77,6 +79,7 @@ fern/ When NVIDIA-NeMo adds or archives a repo, update `components/repos.ts`: - Set **`stage`** to match the org README lifecycle column (Data · Pretraining · RL · Inference · E2E). +- Set **`kind`** to `library`, `integration`, `reference`, or `infrastructure` — see [TAXONOMY.md](./TAXONOMY.md). - Add **`tags`** for search facets (modality, technique, role). See `GH-TOPICS.MD` for optional GitHub topic alignment. When a new **NeMo Framework** NGC container ships: diff --git a/fern/TAXONOMY.md b/fern/TAXONOMY.md new file mode 100644 index 0000000..1d92f21 --- /dev/null +++ b/fern/TAXONOMY.md @@ -0,0 +1,53 @@ +# NeMo OSS taxonomy + +Canonical vocabulary for the Fern hub (`docs.nvidia.com/nemo`), org README, and `components/repos.ts`. When copy disagrees, this file wins. + +## Top-level map + +``` +NVIDIA NeMo (commercial suite — OSS + microservices + NIM + services) +└── NeMo OSS (GitHub org NVIDIA-NeMo + this hub) + ├── NeMo Framework — model lifecycle (libraries + optional NGC bundle) + ├── NeMo Platform — agent integration product (CLI, SDK, Studio) + └── 22 catalog repos (libraries, integration, reference, infrastructure) +``` + +## Terms + +| Term | Meaning | +| --- | --- | +| **NVIDIA NeMo** | Full software suite. Includes commercial products not listed on this hub. | +| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and documentation on **docs.nvidia.com/nemo**. Discovery layer — not a single product. | +| **NeMo Framework** | Named **model-lifecycle** stack: composable libraries from data through deployment. **Not one codebase.** | +| **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:`. Bundles Megatron-Bridge, Evaluator, Export-Deploy, Run, and NeMo Speech. | +| **NeMo Platform** | [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — CLI, SDK, and Studio for **agent** evaluate / secure / tune / deploy. Composes libraries; not a pipeline stage. | +| **Library** | A focused repo with its own docs and release cadence (Curator, AutoModel, RL, …). | +| **NeMo Speech** | The [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo — speech AI only. Do not use “NeMo” alone for the whole ecosystem. | + +## Repo `kind` (catalog metadata) + +| Kind | Role | Examples | +| --- | --- | --- | +| `library` | Default product repo for a lifecycle stage | Curator, AutoModel, RL, Evaluator | +| `integration` | Composes multiple libraries into one product surface | NeMo Platform | +| `reference` | Recipes, cookbooks, reference pipelines | Skills, Nemotron | +| `infrastructure` | Shared CI or meta repos | FW-CI-templates | + +Every repo still has one **stage** (org README columns). **Kind** clarifies role when stage alone is misleading (for example Platform listed under Inference for discoverability). + +## Lifecycle stages + +Data · Pretraining · RL · Inference · E2E — same columns as [profile/README.md](../profile/README.md). + +## Hub page roles + +| Page | Job | +| --- | --- | +| **Concepts** | Glossary | +| **Ecosystem** | Positioning and choices (Framework vs Platform, commercial boundary) | +| **Architecture** | Structure — pipeline, backends, containers, Platform overlay | +| **Libraries** | Inventory from `repos.ts` | + +## Out of scope for this hub + +Customizer, NIM, and other commercial NeMo microservices — link from Ecosystem only; do not duplicate product docs. diff --git a/fern/components/RepoCatalog.tsx b/fern/components/RepoCatalog.tsx index 17b0e04..3ec8165 100644 --- a/fern/components/RepoCatalog.tsx +++ b/fern/components/RepoCatalog.tsx @@ -8,6 +8,7 @@ import { useMemo, useState } from "react"; import { NEMO_REPOS, REPO_STAGES, + kindLabel, stageLabel, type NemoRepo, type RepoStage, @@ -25,6 +26,8 @@ function matchesQuery(repo: NemoRepo, query: string): boolean { repo.description, repo.stage, stageLabel(repo.stage), + repo.kind, + kindLabel(repo.kind), ...(repo.tags ?? []), ] .join(" ") @@ -86,6 +89,20 @@ function RepoCard({ repo }: { repo: NemoRepo }) { > {stageLabel(repo.stage)} + {repo.kind !== "library" ? ( + + {kindLabel(repo.kind)} + + ) : null} {(repo.tags ?? []).slice(0, 3).map((tag) => (

- Showing {filtered.length} of {NEMO_REPOS.length} libraries in{" "} + Showing {filtered.length} of {NEMO_REPOS.length} repositories in{" "} NVIDIA-NeMo. Stages match the{" "} - org README; tags add cross-cutting search facets. + org README; kind and tags add role and cross-cutting facets.

{filtered.length === 0 ? ( diff --git a/fern/components/repos.ts b/fern/components/repos.ts index a40543c..0f2b9ff 100644 --- a/fern/components/repos.ts +++ b/fern/components/repos.ts @@ -5,8 +5,9 @@ * Canonical list of NVIDIA-NeMo GitHub organization repositories. * https://github.com/orgs/NVIDIA-NeMo/repositories * - * Taxonomy (two layers): + * Taxonomy (three layers): * - `stage` — README lifecycle column (Data · Pretraining · RL · Inference · E2E). Drives catalog filters. + * - `kind` — repo role (library · integration · reference · infrastructure). See fern/TAXONOMY.md. * - `tags` — Search facets (modality, technique, role). Use for cross-cutting discovery in the search box. * * GitHub topic strategy (see GH-TOPICS.MD) can map `stage-*` topics to these stages when applied on repos. @@ -20,12 +21,17 @@ export type RepoStage = "data" | "pretraining" | "rl" | "inference" | "e2e"; export type RepoStatus = "active" | "archived"; +/** Catalog role — see fern/TAXONOMY.md */ +export type RepoKind = "library" | "integration" | "reference" | "infrastructure"; + export interface NemoRepo { /** GitHub repo name (e.g. Automodel) */ name: string; description: string; /** Primary lifecycle stage (README columns) */ stage: RepoStage; + /** Catalog role when stage alone is misleading */ + kind: RepoKind; githubUrl: string; docsUrl?: string; containerUrl?: string; @@ -44,6 +50,13 @@ export const REPO_STAGES: { id: RepoStage | "all"; label: string }[] = [ { id: "e2e", label: "E2E" }, ]; +export const REPO_KINDS: { id: RepoKind; label: string }[] = [ + { id: "library", label: "Library" }, + { id: "integration", label: "Integration" }, + { id: "reference", label: "Reference" }, + { id: "infrastructure", label: "Infrastructure" }, +]; + /** 22 open-source libraries in the NVIDIA-NeMo org (excludes the .github meta repo). */ export const NEMO_REPOS: NemoRepo[] = [ // Data @@ -51,6 +64,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Curator", description: "Scalable data preprocessing and curation for text, image, video, and audio.", stage: "data", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Curator", docsUrl: "https://docs.nvidia.com/nemo/curator/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator", @@ -60,6 +74,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "DataDesigner", description: "Generate high-quality synthetic data from scratch or from seed data.", stage: "data", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/DataDesigner", docsUrl: "https://nvidia-nemo.github.io/DataDesigner/latest/", tags: ["synthetic-data", "mcp"], @@ -68,6 +83,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "DataDesignerPlugins", description: "Plugins extending NeMo Data Designer workflows.", stage: "data", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/DataDesignerPlugins", tags: ["synthetic-data", "plugins"], }, @@ -75,6 +91,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Anonymizer", description: "Detect and protect PII through context-aware replacement and rewriting.", stage: "data", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Anonymizer", tags: ["pii", "privacy"], }, @@ -82,6 +99,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Safe-Synthesizer", description: "Create private, safe versions of sensitive tabular datasets.", stage: "data", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Safe-Synthesizer", docsUrl: "https://docs.nvidia.com/nemo/microservices/latest/generate-private-synthetic-data/", @@ -91,6 +109,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "SDG-PGMs", description: "Build probabilistic graphical models (PGMs) for synthetic data generation.", stage: "data", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/SDG-PGMs", tags: ["synthetic-data", "pgm"], }, @@ -99,6 +118,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Automodel", description: "PyTorch distributed training for LLMs/VLMs with day-0 Hugging Face support.", stage: "pretraining", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Automodel", docsUrl: "https://docs.nvidia.com/nemo/automodel/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel", @@ -108,15 +128,17 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Megatron-Bridge", description: "Megatron-based training with bidirectional Hugging Face checkpoint conversion.", stage: "pretraining", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Megatron-Bridge", docsUrl: "https://docs.nvidia.com/nemo/megatron-bridge/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["llm", "vlm", "megatron"], }, { - name: "NeMo", - description: "Speech AI (ASR, TTS) training and inference.", + name: "NeMo Speech", + description: "Speech AI (ASR, TTS) training and inference — the NeMo GitHub repo.", stage: "pretraining", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/NeMo", docsUrl: NEMO_SPEECH_DOCS_URL, tags: ["speech", "asr", "tts"], @@ -125,6 +147,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Emerging-Optimizers", description: "Collection of cutting-edge optimizers for large-scale training.", stage: "pretraining", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Emerging-Optimizers", docsUrl: "https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html", tags: ["optimizers"], @@ -133,6 +156,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "DFM", description: "Large-scale diffusion model training and inference (archived).", stage: "pretraining", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/DFM", docsUrl: "https://github.com/NVIDIA-NeMo/DFM/tree/main/docs", status: "archived", @@ -143,6 +167,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "RL", description: "Scalable post-training — SFT, DPO, GRPO, distillation, and reinforcement learning.", stage: "rl", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/RL", docsUrl: "https://docs.nvidia.com/nemo/rl/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl", @@ -152,6 +177,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Gym", description: "RL environments and benchmarks to evaluate and improve models and agents.", stage: "rl", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Gym", docsUrl: "https://docs.nvidia.com/nemo/gym/latest/index.html", tags: ["environments", "agents"], @@ -160,6 +186,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "ProRL-Agent-Server", description: "Rollout-as-a-service for multi-turn agent RL (pairs with NeMo RL and Gym).", stage: "rl", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/ProRL-Agent-Server", docsUrl: "https://github.com/NVIDIA-NeMo/ProRL-Agent-Server#readme", tags: ["agents", "rollout"], @@ -169,6 +196,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Evaluator", description: "Scalable, reproducible evaluation across 100+ benchmarks and harnesses.", stage: "inference", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Evaluator", docsUrl: "https://docs.nvidia.com/nemo/evaluator/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", @@ -178,6 +206,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Export-Deploy", description: "Export NeMo and Hugging Face models to TRT-LLM, vLLM, ONNX, and serving stacks.", stage: "inference", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Export-Deploy", docsUrl: "https://docs.nvidia.com/nemo/export-deploy/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", @@ -187,6 +216,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Guardrails", description: "Programmable guardrails for LLM-based conversational systems (Colang).", stage: "inference", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Guardrails", docsUrl: "https://docs.nvidia.com/nemo/guardrails/latest/", tags: ["safety", "agents"], @@ -196,6 +226,7 @@ export const NEMO_REPOS: NemoRepo[] = [ description: "CLI, SDK, and web UI to evaluate, harden, tune, and deploy production agents using NeMo libraries.", stage: "inference", + kind: "integration", githubUrl: "https://github.com/NVIDIA-NeMo/nemo-platform", docsUrl: "https://nvidia-nemo.github.io/nemo-platform/main/", tags: ["agents", "platform", "deployment"], @@ -205,6 +236,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Skills", description: "Reference pipelines for synthetic data generation and evaluation (math, code, science).", stage: "e2e", + kind: "reference", githubUrl: "https://github.com/NVIDIA-NeMo/Skills", docsUrl: "https://nvidia-nemo.github.io/Skills/", tags: ["sdg", "evaluation", "pipelines"], @@ -213,6 +245,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Nemotron", description: "Developer asset hub — recipes, cookbooks, datasets, and Nemotron reference examples.", stage: "e2e", + kind: "reference", githubUrl: "https://github.com/NVIDIA-NeMo/Nemotron", docsUrl: "https://github.com/NVIDIA-NeMo/Nemotron#readme", tags: ["nemotron", "recipes"], @@ -221,6 +254,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "Run", description: "Configure, launch, and manage ML experiments (local, SLURM, Kubernetes).", stage: "e2e", + kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Run", docsUrl: "https://docs.nvidia.com/nemo/run/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", @@ -230,6 +264,7 @@ export const NEMO_REPOS: NemoRepo[] = [ name: "FW-CI-templates", description: "CI/CD workflow templates shared across NeMo open-source libraries.", stage: "e2e", + kind: "infrastructure", githubUrl: "https://github.com/NVIDIA-NeMo/FW-CI-templates", tags: ["ci", "github-actions"], }, @@ -238,3 +273,7 @@ export const NEMO_REPOS: NemoRepo[] = [ export function stageLabel(stage: RepoStage): string { return REPO_STAGES.find((s) => s.id === stage)?.label ?? stage; } + +export function kindLabel(kind: RepoKind): string { + return REPO_KINDS.find((k) => k.id === kind)?.label ?? kind; +} diff --git a/fern/docs/pages/about/architecture.mdx b/fern/docs/pages/about/architecture.mdx index 349e80c..1570476 100644 --- a/fern/docs/pages/about/architecture.mdx +++ b/fern/docs/pages/about/architecture.mdx @@ -5,13 +5,44 @@ slug: about/architecture position: 3 --- -NeMo OSS is a **composable pipeline** of libraries — not a single monolithic framework. Data flows from curation through training and alignment to evaluation and deployment, with optional orchestration across stages. +**NeMo OSS** is not one monolithic codebase. It is an organization of focused repos you can use individually or together. Two named stacks sit on top of those repos: + +- **NeMo Framework** — the **model lifecycle** (data → train → align → evaluate → deploy). +- **NeMo Platform** — **agent integration** (evaluate, secure, tune, and deploy agents using selected libraries). -For positioning within NVIDIA NeMo, refer to [Ecosystem](/about/ecosystem). For definitions of terms below, refer to [Concepts](/about/concepts). +For positioning within NVIDIA NeMo, refer to [Ecosystem](/about/ecosystem). For term definitions, refer to [Concepts](/about/concepts). -## High-level pipeline +## Three layers + +| Layer | What it is | How you use it | +| --- | --- | --- | +| **Libraries** | 22 repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) — one product per repo | `pip install`, per-library docs, or standalone NGC images | +| **NeMo Framework** | Model-lifecycle stack — the pipeline below | Pick libraries by stage, or pull the multi-library Framework container | +| **NeMo Platform** | Agent product — CLI, SDK, Studio | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | + +```mermaid +flowchart TB + subgraph oss ["NeMo OSS"] + LIBS["Libraries — à la carte"] + subgraph fw ["NeMo Framework — model lifecycle"] + PIPE["Data → Pretraining → RL → Inference"] + end + subgraph plat ["NeMo Platform — agents"] + PCLI["CLI · SDK · Studio"] + end + end + + LIBS -.-> fw + PCLI -.-> PIPE +``` + +Platform **composes** libraries (Guardrails, Evaluator, Data Designer, and others). It is **not** a box in the Framework pipeline — it is a cross-cutting integration layer for agent workflows. Training still starts with Framework libraries such as AutoModel or Megatron-Bridge. + +## NeMo Framework pipeline + +The diagram shows representative libraries. The live list is [Libraries](/about/libraries) filtered by stage. ```mermaid flowchart LR @@ -53,7 +84,7 @@ flowchart LR Skills -.-> Inference ``` -Solid arrows are the typical **model lifecycle** (data → train → align → evaluate/deploy). Dotted lines show **orchestration and reference pipelines** (Run, Skills, Nemotron) that span stages. +Solid arrows are the typical **model lifecycle**. Dotted lines are **orchestration and reference assets** (Run, Skills, Nemotron) that span stages. ## Functional layers @@ -65,7 +96,7 @@ Solid arrows are the typical **model lifecycle** (data → train → align → e | **Inference** | Benchmark, export, serve, and apply guardrails | | **E2E** | Launch experiments, ship recipes, share reference assets | -Which repos sit in each layer changes over time — use [Libraries](/about/libraries) (filtered by stage) as the live catalog, not this table. +Which repos sit in each layer changes over time — use [Libraries](/about/libraries) (filtered by stage) as the live catalog. ## Training backends @@ -96,9 +127,19 @@ flowchart TB Both paths can feed the same **RL**, **Evaluator**, and **Export-Deploy** libraries downstream. -## Agent platform +## NeMo Platform overlay + +[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates NeMo libraries for **agent** workflows — not for large-scale pretraining today. + +| Platform capability | Libraries and services involved | +| --- | --- | +| Secure agents | Guardrails, Anonymizer, Auditor | +| Evaluate agents | Evaluator, Harbor-backed suites | +| Tune agents | Skill optimization, Switchyard routing | +| Build agents | NeMo Agent Toolkit (NAT), Inference Gateway | +| Synthetic data | Data Designer | -[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) sits alongside individual inference libraries: it wires Guardrails, Evaluator, Data Designer, and related services into one CLI and SDK for **agent** evaluate / optimize / deploy loops. It is not the Framework NGC container — refer to [Framework and Platform](/about/concepts#framework-and-platform). +Platform is **not** the Framework NGC container. Refer to [Framework and Platform](/about/concepts#framework-and-platform) for naming and [Ecosystem](/about/ecosystem#choose-framework-or-platform) for when to start here. @@ -107,14 +148,14 @@ Setup, CLI reference, and API for agent hardening and evaluation.
-Model evaluation, export, guardrails, and when to use Platform or standalone libraries. +Model evaluation, export, guardrails, and Platform entry points. ## NGC containers -For production-style stacks, NGC images bundle tested dependency sets: +NGC images bundle tested dependency sets. They are **delivery mechanisms** for Framework libraries — not separate products. | Image | Scope | | --- | --- | @@ -130,7 +171,7 @@ Refer to the [container catalog](/about/release-notes/containers) for pull comma -Lifecycle stages, backends, and hub and product docs. +Glossary — Framework, Platform, stages, and homonyms. diff --git a/fern/docs/pages/about/concepts.mdx b/fern/docs/pages/about/concepts.mdx index 574ecaf..1d337e9 100644 --- a/fern/docs/pages/about/concepts.mdx +++ b/fern/docs/pages/about/concepts.mdx @@ -5,11 +5,43 @@ slug: about/concepts position: 4 --- -Key concepts for navigating **NeMo OSS** — the hub, the GitHub org, and the libraries it catalogs. +Key concepts for navigating **NeMo OSS** — the hub, the GitHub org, and the catalogs it maintains. ## NeMo OSS -**NeMo OSS** refers to open source libraries in the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization, documented on **docs.nvidia.com/nemo**. It is one part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, which also includes commercial offerings not listed in this catalog. +**NeMo OSS** is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), documented on **docs.nvidia.com/nemo**. It is the **discovery layer** — this hub orients you; each library owns its own manual. + +NeMo OSS contains two major stacks: + +| Stack | Focus | Entry | +| --- | --- | --- | +| **NeMo Framework** | **Model lifecycle** — data through deployment | [Get Started by stage](/get-started) or Framework NGC container | +| **NeMo Platform** | **Agent lifecycle** — evaluate, secure, tune, deploy agents | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | + +NeMo OSS is one part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Commercial products (Customizer, NIM, microservices) live outside this catalog. + +## Framework and Platform + +| Term | Meaning | +| --- | --- | +| **NeMo Framework** (ecosystem) | The open source **model-lifecycle** library stack in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo). Composable repos from data through deployment. **Named “Framework” but not one monolithic codebase.** | +| **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:` bundling Megatron-Bridge, Evaluator, Export-Deploy, Run, and NeMo Speech — refer to [Containers and releases](#containers-and-releases) | +| **NeMo Platform** | Product repo [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — **CLI, SDK, and Studio UI** that integrate NeMo libraries for **agents**. Cross-cutting; not a pipeline stage. | + +Use **Framework** for models and the training/eval/deploy pipeline. Use **Platform** for agent integration. Platform docs: [NeMo Platform documentation](https://nvidia-nemo.github.io/nemo-platform/main/). + +When to choose each: [Ecosystem](/about/ecosystem#choose-framework-or-platform). How they connect: [Architecture](/about/architecture#three-layers). + +## Homonyms + +| Name | Means | Does not mean | +| --- | --- | --- | +| **NeMo Framework** | The library ecosystem **or** the multi-library NGC container | A single GitHub repo | +| **NeMo** ([NeMo repo](https://github.com/NVIDIA-NeMo/NeMo)) | **NeMo Speech** — ASR, TTS, speech-language models | The whole OSS org or Platform | +| **NeMo Platform** | Agent integration product (`nemo-platform`) | The Framework container or NVIDIA NeMo commercial suite | +| **NeMo OSS** | This org + hub | Every NVIDIA product with “NeMo” in the name | + +Always say **NeMo Speech** when referring to the `NeMo` repository in user-facing copy. ## Lifecycle stages @@ -23,7 +55,20 @@ Libraries are grouped into five **lifecycle stages** — the same columns as the | **Inference** | Evaluate quality, export checkpoints, deploy, or add guardrails | | **E2E** | Run multi-step recipes, orchestrate experiments, share reference pipelines | -Stage-specific get-started guides: [Get Started](/get-started) → **By lifecycle stage**. +Stage guides: [Get Started](/get-started) → **By lifecycle stage**. + +## Repo kind + +In the [Libraries](/about/libraries) catalog, each repo has a **stage** (README column) and a **kind** (role): + +| Kind | Role | Examples | +| --- | --- | --- | +| **Library** | Focused product for a stage | Curator, AutoModel, RL, Evaluator | +| **Integration** | Composes libraries into one product | NeMo Platform | +| **Reference** | Recipes, cookbooks, example pipelines | Skills, Nemotron | +| **Infrastructure** | Shared CI or meta repos | FW-CI-templates | + +Kind explains repos that do not fit neatly into one pipeline box (for example Platform under Inference for discoverability). ## Hub and library documentation @@ -39,27 +84,13 @@ The hub answers *which library and where to click next*. Library docs answer *ho | Keep here | Put in library docs | | --- | --- | -| Lifecycle stages and pipeline shape | Tutorials, APIs, configuration | +| Lifecycle stages, pipeline shape, Framework vs Platform | Tutorials, APIs, configuration | | Choosing AutoModel or Megatron-Bridge | Model-specific recipes and scripts | | Catalogs (`repos.ts`, container catalog) | Per-release install pins and changelogs | | Framework container tags and cross-component known issues | Library-only bugs and workarounds | Hub content should help a **decision** and have a **long lifespan**. If it goes stale every release, link out instead. -## Framework and Platform - -Three related terms show up across NeMo OSS docs: - -| Term | Meaning | -| --- | --- | -| **NeMo Framework** (ecosystem) | The open source **library pipeline** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) — composable repos for data through deployment, documented on this hub | -| **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:` bundling a tested multi-library training and inference stack — refer to [Containers and releases](#containers-and-releases) | -| **NeMo Platform** | Product repo [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — **CLI, SDK, and Studio UI** that integrate NeMo libraries to evaluate, harden, tune, and deploy **agents** | - -Use **Framework** when you mean the library ecosystem or the NGC bundle. Use **Platform** when you mean the agent integration layer. Platform docs live on [NeMo Platform documentation](https://nvidia-nemo.github.io/nemo-platform/main/); training and export details stay on each library's site. - -Positioning and when to choose each: [Ecosystem](/about/ecosystem#nemo-framework-and-nemo-platform). - ## Training backends | Term | Meaning | @@ -92,11 +123,11 @@ A repo has one stage; tags help you find libraries that span concerns (for examp -NeMo OSS within NVIDIA NeMo, Framework and Platform, and AutoModel or Megatron-Bridge. +NeMo OSS within NVIDIA NeMo and Framework or Platform choices. -Pipeline diagram, layers, and container bundling. +Three layers, pipeline diagram, backends, and containers. diff --git a/fern/docs/pages/about/ecosystem.mdx b/fern/docs/pages/about/ecosystem.mdx index 7ac29fd..b587307 100644 --- a/fern/docs/pages/about/ecosystem.mdx +++ b/fern/docs/pages/about/ecosystem.mdx @@ -8,12 +8,40 @@ position: 2 **NeMo OSS** is the open source side of [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) — the public GitHub organization [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and the documentation hub you are reading now. Commercial NeMo products, enterprise services, and NIM microservices live outside this catalog. -This page explains **where NeMo OSS fits**. For searchable repo listings, use [Libraries](/about/libraries). For how stages connect technically, refer to [Architecture](/about/architecture). +This page explains **where NeMo OSS fits** and **what to choose**. For definitions, refer to [Concepts](/about/concepts). For structure, refer to [Architecture](/about/architecture). -## NeMo OSS within NVIDIA NeMo +## What NeMo OSS includes + +NeMo OSS is not a single product. It is an **organization of repos** plus two named ways to adopt them: + +```mermaid +flowchart TB + subgraph nemooss ["NeMo OSS"] + subgraph framework ["NeMo Framework — models"] + STAGES["Data · Pretraining · RL · Inference · E2E"] + end + subgraph platform ["NeMo Platform — agents"] + AGENT["CLI · SDK · Studio"] + end + REPOS["22 catalog repos"] + end + COMM["Commercial NVIDIA NeMo"] + + REPOS --> framework + AGENT --> REPOS + nemooss -.-> COMM +``` + +| You are building… | Start with… | +| --- | --- | +| **Models** (train, align, evaluate, deploy checkpoints) | **NeMo Framework** — libraries by stage or Framework NGC container | +| **Agents** (evaluate, secure, tune, deploy in production) | **NeMo Platform** — or individual inference libraries à la carte | +| **Not sure yet** | [Libraries](/about/libraries) catalog or [decision guide](/get-started#decision-guide) | + +## Framework lifecycle stages -NVIDIA NeMo spans data, training, alignment, evaluation, and deployment for generative AI. **NeMo OSS** delivers that pipeline as composable open source libraries you can adopt individually or together: +**NeMo Framework** delivers the model pipeline as composable open source libraries: | Stage | Role in the pipeline | | --- | --- | @@ -25,28 +53,39 @@ NVIDIA NeMo spans data, training, alignment, evaluation, and deployment for gene Repo names and counts change — [Libraries](/about/libraries) is the searchable catalog. Each library's docs site has install steps and tutorials. -## NeMo Framework and NeMo Platform +## Choose Framework or Platform -These names sound similar but point at different layers of NeMo OSS: +These names sound similar but address different jobs: | | **NeMo Framework** | **NeMo Platform** | | --- | --- | --- | -| **What it is** | The composable **library pipeline** in this org — data, training, RL, evaluation, export, guardrails | An integrated **CLI, Python SDK, and web UI** for shipping **agents** | -| **You adopt it when…** | You want individual libraries (or the multi-library NGC stack) for the **model lifecycle** | You want one local setup to **evaluate, secure, tune, and deploy agents** using those libraries | -| **Typical entry** | Pick a stage → library docs, or pull the Framework container | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | +| **What it is** | Model-lifecycle **libraries** — data, training, RL, evaluation, export, guardrails | Agent **integration product** — CLI, Python SDK, and web UI | +| **You adopt it when…** | You are training, aligning, evaluating, or deploying **models** | You are shipping **agents** and want evaluate / secure / tune / deploy in one setup | +| **Typical entry** | Stage guide → library docs, or Framework NGC container | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | | **Docs** | This hub + per-library Fern sites | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | +| **Relationship** | Libraries are the building blocks | Composes Guardrails, Evaluator, Data Designer, and others — does not replace AutoModel or Megatron-Bridge for training | -**NeMo Framework** also names the multi-library NGC container (`nvcr.io/nvidia/nemo`) — refer to [Concepts](/about/concepts#framework-and-platform) for how the terms relate. Platform composes libraries such as Guardrails, Evaluator, and Data Designer; it does not replace training libraries like AutoModel or Megatron-Bridge. +**NeMo Framework** also names the multi-library NGC container (`nvcr.io/nvidia/nemo`). Refer to [Concepts](/about/concepts#framework-and-platform) for all uses of “Framework.” Many teams train with **AutoModel or Megatron-Bridge**, align with **NeMo RL**, benchmark with **Evaluator**, and serve with **Export-Deploy**. Agent builders can start from **NeMo Platform** instead of wiring those pieces manually. +## Commercial NeMo boundary + +| In NeMo OSS hub | Outside this catalog | +| --- | --- | +| 22 public GitHub repos in NVIDIA-NeMo | Customizer, NIM, enterprise services | +| Framework container release notes | Per-tenant managed offerings | +| Open source docs on docs.nvidia.com/nemo | Microservice docs under `docs.nvidia.com/nemo/microservices` | + +For the full suite, refer to [NVIDIA NeMo (commercial)](https://www.nvidia.com/en-us/ai-data-science/products/nemo/). + ## What this hub covers and product docs | | **NeMo OSS hub** (this site) | **Per-library docs** | | --- | --- | --- | -| **Audience** | Choosing a library, comparing stages, container releases | Using one product deeply | +| **Audience** | Choosing Framework vs Platform, stage, or library | Using one product deeply | | **Content** | Catalog, get-started by stage, release notes | APIs, tutorials, recipes | -| **Scope** | 22 public GitHub repos in NVIDIA-NeMo | One library at a time | +| **Scope** | Orientation across NeMo OSS | One library or Platform at a time | ## Choosing AutoModel or Megatron-Bridge @@ -59,18 +98,22 @@ Both train large language models (LLMs) and vision language models (VLMs) on NVI | **Checkpoint flow** | HF models day-0 | HF ↔ Megatron conversion | | **Best for** | Fine-tuning, research, rapid iteration | Large-scale pretraining and SFT | -Many teams use **AutoModel or Megatron-Bridge for training**, then **NeMo RL for alignment**, **Evaluator for benchmarks**, and **Export-Deploy for serving**. Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) directly. +Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) directly — the [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo is speech-only today. ## Related entry points - -Search and filter all 22 NVIDIA-NeMo repositories. + +Glossary — OSS, Framework, Platform, homonyms. -How lifecycle stages and backends fit together. +Three layers, pipeline, backends, containers. + + + +Search and filter all 22 NVIDIA-NeMo repositories. @@ -81,8 +124,4 @@ Quickstart, installation, and guides by stage. Agent evaluate, harden, tune, and deploy — CLI, SDK, and Studio. - -Broader NeMo software suite beyond this OSS catalog. - - diff --git a/fern/docs/pages/about/libraries.mdx b/fern/docs/pages/about/libraries.mdx index ce44128..d7711d9 100644 --- a/fern/docs/pages/about/libraries.mdx +++ b/fern/docs/pages/about/libraries.mdx @@ -7,20 +7,31 @@ position: 5 import RepoCatalog from "@/components/RepoCatalog"; -Browse all **22 open source libraries** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) by lifecycle stage. For how NeMo OSS fits in NVIDIA NeMo, refer to [Ecosystem](/about/ecosystem). For pipeline structure, refer to [Architecture](/about/architecture). +Browse **22 open source repositories** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo). Most are **NeMo Framework** libraries for the model lifecycle; [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) is an **integration** product that composes several of them for agents. + +For positioning, refer to [Ecosystem](/about/ecosystem). For pipeline structure, refer to [Architecture](/about/architecture). -## How libraries are grouped +## How repos are grouped + +Each card shows a **stage** (org README lifecycle column) and a **kind** (role in the catalog): -Stages match the [NVIDIA-NeMo org README](https://github.com/NVIDIA-NeMo) lifecycle table. Use **tags** on each card (or the search box) for cross-cutting facets like `speech`, `evaluation`, or `agents`. +| Kind | Role | Examples | +| --- | --- | --- | +| **Library** | Focused product for a stage | Curator, AutoModel, RL, Evaluator | +| **Integration** | Composes libraries into one product | NeMo Platform | +| **Reference** | Recipes, cookbooks, pipelines | Skills, Nemotron | +| **Infrastructure** | Shared CI or meta repos | FW-CI-templates | | Stage | What lives here | | --- | --- | | **Data** | Curation, synthetic data, PII handling, SDG pipelines | -| **Pretraining** | Model training and fine-tuning (AutoModel, Megatron-Bridge, Speech, optimizers) | +| **Pretraining** | Model training and fine-tuning (AutoModel, Megatron-Bridge, NeMo Speech, optimizers) | | **RL** | Post-training alignment, environments, agent rollout | -| **Inference** | Evaluation, export, serving, guardrails, agent platform | -| **E2E** | Reference pipelines, recipes, experiment orchestration, CI templates | +| **Inference** | Evaluation, export, serving, guardrails (and Platform for agent integration) | +| **E2E** | Reference pipelines, recipes, experiment orchestration | + +Use **tags** on each card (or the search box) for cross-cutting facets like `speech`, `evaluation`, or `agents`. Libraries without a published docs site link to GitHub README or microservice docs. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/). diff --git a/fern/docs/pages/get-started/index.mdx b/fern/docs/pages/get-started/index.mdx index b851570..846ef32 100644 --- a/fern/docs/pages/get-started/index.mdx +++ b/fern/docs/pages/get-started/index.mdx @@ -5,7 +5,21 @@ slug: get-started position: 1 --- -Pick how you want to begin — a fast quickstart, installation options, or guidance by **lifecycle stage** (the same columns as the [org README](https://github.com/NVIDIA-NeMo) and [Libraries](/about/libraries) catalog). +Pick how you want to begin. **NeMo Framework** paths cover the model lifecycle; **NeMo Platform** covers agents. Stages below match the [org README](https://github.com/NVIDIA-NeMo) and [Libraries](/about/libraries) catalog. + + + + +Quickstart and installation for model training and deployment. + + + +Agent evaluate, secure, tune, and deploy — CLI, SDK, and Studio. + + + + +## Install and quickstart @@ -19,7 +33,7 @@ pip, NGC containers, scale, and backend choice. -## By lifecycle stage +## By lifecycle stage (Framework) diff --git a/fern/docs/pages/get-started/inference.mdx b/fern/docs/pages/get-started/inference.mdx index 2ba62e1..1c9caf4 100644 --- a/fern/docs/pages/get-started/inference.mdx +++ b/fern/docs/pages/get-started/inference.mdx @@ -7,7 +7,7 @@ position: 13 import StageGuide from "@/components/StageGuide"; -Libraries for **Inference** — benchmarking, export to serving stacks, guardrails, and agent platforms. +Libraries for **Inference** in the **NeMo Framework** model lifecycle — benchmarking, export, serving, and guardrails. For **agents**, see [NeMo Platform](#shipping-agents) below or start at [Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/). ```mermaid flowchart LR @@ -39,4 +39,4 @@ If you are building **agents** (not just serving a fine-tuned checkpoint), [NeMo Setup, CLI, and docs — evaluate, secure, and optimize agents with NeMo libraries. -For related guidance, refer to [Framework and Platform](/about/concepts#framework-and-platform) and [Ecosystem](/about/ecosystem#nemo-framework-and-nemo-platform). +For related guidance, refer to [Framework and Platform](/about/concepts#framework-and-platform) and [Ecosystem](/about/ecosystem#choose-framework-or-platform). diff --git a/fern/docs/pages/index.mdx b/fern/docs/pages/index.mdx index 3bb7e3f..7ab8511 100644 --- a/fern/docs/pages/index.mdx +++ b/fern/docs/pages/index.mdx @@ -4,44 +4,59 @@ subtitle: Open source libraries from the NVIDIA-NeMo GitHub organization slug: "" --- -**NeMo OSS** is the hub for NVIDIA's public, open source NeMo libraries — the repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) on GitHub. Scale pretraining, post-training, and reinforcement learning from a single GPU to thousand-node clusters with Hugging Face/PyTorch and Megatron backends. For production **agents**, [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates evaluation, guardrails, and tuning in one CLI and SDK. +**NeMo OSS** is the hub for NVIDIA's public, open source NeMo work — the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization and **docs.nvidia.com/nemo**. It is a discovery layer, not a single product: focused libraries you can adopt individually, plus two named stacks: -These projects are part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, which also includes commercial products and services beyond this open source catalog. +- **NeMo Framework** — model lifecycle (data → train → align → evaluate → deploy) +- **NeMo Platform** — agent integration (evaluate, secure, tune, deploy agents) -## Choose your path +These projects are part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, which also includes commercial products beyond this catalog. + +## Choose your stack - -NeMo OSS within NVIDIA NeMo and how libraries relate. + +Train and deploy models — quickstart, install, and guides by lifecycle stage. - -Pipeline layers, backends, and containers. + +Ship agents — CLI, SDK, and Studio UI over NeMo libraries. - -Quickstart, installation, and guides by lifecycle stage. + +Browse all 22 repos — search by stage, kind, or tag. - -Search and filter all 22 NVIDIA-NeMo libraries by category. + + +## Understand NeMo OSS + + + + +Glossary — Framework, Platform, stages, homonyms. - -Latest NGC container announcements and version metadata. + +Where NeMo OSS fits in NVIDIA NeMo and what to choose. - -Overview, container releases, and known issues. + +Three layers, pipeline diagram, backends, and NGC containers. - -Cross-component container issues by release tag. + + +## Releases and community + + + + +Framework NGC tags, pull commands, and component versions. - -Evaluate, harden, tune, and deploy agents — CLI, SDK, and Studio UI. + +Cross-component Framework container issues by tag. @@ -49,7 +64,3 @@ Discussions, announcements, and how to contribute. - -## Pipeline overview - -Refer to [Architecture](/about/architecture) for the full pipeline diagram, training backends, and how NGC containers bundle libraries. diff --git a/profile/README.md b/profile/README.md index 37d000d..a22df82 100644 --- a/profile/README.md +++ b/profile/README.md @@ -7,7 +7,7 @@ SPDX-License-Identifier: Apache-2.0 **Train Llama 3.3 · Qwen 2.5 · Mistral · DeepSeek · Gemma · Nemotron on NVIDIA GPUs** -Open source GPU libraries for data, training, alignment, evaluation, deployment, and agents. Scale from one GPU to 10,000+ nodes with Hugging Face or Megatron backends. Part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite — this org is the public GitHub home for NeMo OSS. +Open source GPU libraries for data, training, alignment, evaluation, deployment, and agents. Scale from one GPU to 10,000+ nodes with Hugging Face or Megatron backends. Part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite — this org is the public GitHub home for **NeMo OSS** (Framework libraries + [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) for agents). ## Libraries by stage From e717987af191e2e15d6452cab6fc802bc84c2da5 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 14:53:25 -0400 Subject: [PATCH 10/18] Rework NeMo OSS docs orientation --- fern/README.md | 182 ++++++++-------- fern/TAXONOMY.md | 33 ++- fern/docs.yml | 22 +- fern/docs/pages/about/architecture.mdx | 42 ++-- fern/docs/pages/about/concepts.mdx | 137 ------------ .../concepts/containers-and-installs.mdx | 62 ++++++ .../about/concepts/documentation-surfaces.mdx | 53 +++++ .../about/concepts/framework-and-platform.mdx | 79 +++++++ fern/docs/pages/about/concepts/index.mdx | 104 +++++++++ .../pages/about/concepts/lifecycle-stages.mdx | 66 ++++++ .../about/concepts/repository-catalog.mdx | 63 ++++++ .../training-backends-and-checkpoints.mdx | 62 ++++++ fern/docs/pages/about/ecosystem.mdx | 44 ++-- fern/docs/pages/about/libraries.mdx | 40 +++- .../pages/about/release-notes/containers.mdx | 6 +- fern/docs/pages/about/release-notes/index.mdx | 18 +- .../about/release-notes/known-issues.mdx | 46 +++- fern/docs/pages/get-started/data.mdx | 6 +- fern/docs/pages/get-started/e2e.mdx | 10 +- fern/docs/pages/get-started/index.mdx | 26 ++- fern/docs/pages/get-started/inference.mdx | 14 +- fern/docs/pages/get-started/installation.mdx | 27 ++- fern/docs/pages/get-started/pretraining.mdx | 6 +- fern/docs/pages/get-started/quickstart.mdx | 16 +- fern/docs/pages/get-started/rl.mdx | 6 +- .../pages/get-started/runtime-chooser.mdx | 78 +++++++ fern/docs/pages/get-started/task-map.mdx | 51 +++++ fern/docs/pages/index.mdx | 52 +++-- fern/docs/pages/resources/community.mdx | 28 ++- .../pages/resources/external-learning.mdx | 95 ++++++++ fern/docs/pages/resources/glossary.mdx | 203 ++++++++++++++++++ fern/docs/pages/resources/index.mdx | 31 +++ profile/README.md | 42 ++-- 33 files changed, 1381 insertions(+), 369 deletions(-) delete mode 100644 fern/docs/pages/about/concepts.mdx create mode 100644 fern/docs/pages/about/concepts/containers-and-installs.mdx create mode 100644 fern/docs/pages/about/concepts/documentation-surfaces.mdx create mode 100644 fern/docs/pages/about/concepts/framework-and-platform.mdx create mode 100644 fern/docs/pages/about/concepts/index.mdx create mode 100644 fern/docs/pages/about/concepts/lifecycle-stages.mdx create mode 100644 fern/docs/pages/about/concepts/repository-catalog.mdx create mode 100644 fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx create mode 100644 fern/docs/pages/get-started/runtime-chooser.mdx create mode 100644 fern/docs/pages/get-started/task-map.mdx create mode 100644 fern/docs/pages/resources/external-learning.mdx create mode 100644 fern/docs/pages/resources/glossary.mdx create mode 100644 fern/docs/pages/resources/index.mdx diff --git a/fern/README.md b/fern/README.md index 1c84b82..539abea 100644 --- a/fern/README.md +++ b/fern/README.md @@ -1,137 +1,139 @@ -# NeMo OSS hub documentation (Fern) +# NeMo OSS Hub -Hub site for open source [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub repositories. Routes visitors to each library's documentation using the shared NVIDIA Fern global theme from [fern-components](https://github.com/NVIDIA/fern-components). Commercial NeMo products live outside this catalog — refer to [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/). +Fern source for the **NeMo OSS** documentation hub: the lightweight entry point for open source repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo). -**Canonical taxonomy:** [TAXONOMY.md](./TAXONOMY.md) — NeMo OSS, Framework, Platform, stages, and repo kinds. +The hub helps readers answer three questions: -## Information architecture +- **What should I use?** Task map, lifecycle stages, Framework vs Platform, and the library catalog. +- **How should I run it?** Runtime chooser, install overview, container catalog, and release notes. +- **Where do I go next?** Per-library docs, GitHub repos, community links, glossary, and external learning resources. -This hub follows the NVIDIA canonical doc IA from [`tpl-new-site`](https://gitlab-master.nvidia.com/tech-docs/template-library) (`::tpl site`), adapted for an **ecosystem catalog** rather than a single-product manual: +Per-library docs own commands, APIs, tutorials, model support, and version-specific behavior. This hub owns orientation, routing, stable concepts, and cross-component release metadata. -| Canonical section | Hub page | URL | +## Quick Links + +| Need | Source | Published Route | | --- | --- | --- | -| **About → overview** | Home (section index) | `/` | -| **About → ecosystem** | Ecosystem | `/about/ecosystem` | -| **About → architecture** | Architecture | `/about/architecture` | -| **About → concepts** | Concepts | `/about/concepts` | -| **About → libraries** | Libraries catalog | `/about/libraries` | -| **About → release-notes → index** | Release notes overview | `/about/release-notes` | -| **About → release-notes → containers** | Container releases | `/about/release-notes/containers` | -| **About → release-notes → known-issues** | Known issues | `/about/release-notes/known-issues` | -| **Get Started** | Hub, quickstart, install | `/get-started`, `/get-started/quickstart`, `/get-started/installation` | -| **Get Started → stage** | Data · Pretraining · RL · Inference · E2E | `/get-started/data`, … | -| **Resources → community** | Community | `/resources/community` | +| Home | [docs/pages/index.mdx](./docs/pages/index.mdx) | `/` | +| Task-first routing | [docs/pages/get-started/task-map.mdx](./docs/pages/get-started/task-map.mdx) | `/get-started/task-map` | +| Runtime choice | [docs/pages/get-started/runtime-chooser.mdx](./docs/pages/get-started/runtime-chooser.mdx) | `/get-started/runtime-chooser` | +| Library catalog page | [docs/pages/about/libraries.mdx](./docs/pages/about/libraries.mdx) | `/about/libraries` | +| Library catalog data | [components/repos.ts](./components/repos.ts) | Catalog component source | +| Container catalog page | [docs/pages/about/release-notes/containers.mdx](./docs/pages/about/release-notes/containers.mdx) | `/about/release-notes/containers` | +| Container catalog data | [components/containers.ts](./components/containers.ts) | Catalog component source | +| Concepts | [docs/pages/about/concepts/index.mdx](./docs/pages/about/concepts/index.mdx) | `/about/concepts` | +| Glossary | [docs/pages/resources/glossary.mdx](./docs/pages/resources/glossary.mdx) | `/resources/glossary` | +| External learning | [docs/pages/resources/external-learning.mdx](./docs/pages/resources/external-learning.mdx) | `/resources/external-learning` | +| Taxonomy | [TAXONOMY.md](./TAXONOMY.md) | Maintainer reference | +| GitHub topics guidance | [GH-TOPICS.MD](./GH-TOPICS.MD) | Maintainer reference | +| Navigation and redirects | [docs.yml](./docs.yml) | Site config | + +Published targets are configured in [docs.yml](./docs.yml): -Per-library docs (Curator, AutoModel, Megatron-Bridge, and so on) stay on their own Fern sites. This hub orients readers and links out — it does not duplicate product manuals. +- Preview: `nemo-framework.docs.buildwithfern.com/nemo` +- Production: `docs.nvidia.com/nemo` -## Content policy +## Site Shape -**Keep on the hub** when it helps a reader **choose** at the umbrella level and is likely to stay valid for a long time: +The hub follows the NVIDIA template-library IA, adapted for an ecosystem catalog: -- Lifecycle stages, pipeline shape, Framework vs Platform, and homonyms (refer to [Concepts](/about/concepts) and [TAXONOMY.md](./TAXONOMY.md)) -- AutoModel vs Megatron-Bridge and similar **stable forks** -- Catalogs driven from `repos.ts` / `containers.ts` (not hand-maintained repo lists) -- Container release metadata and cross-component known issues for Framework tags +- **About**: overview, ecosystem, architecture, concepts, libraries, release notes. +- **Get Started**: task map, quickstart, installation, runtime chooser, and stage guides. +- **Resources**: glossary, external learning, and community. -**Push downstream** to a library's own docs when it is: +Concept pages explain durable relationships. The glossary is lookup-oriented. The task map routes user intent to the owning library. The runtime chooser routes setup decisions to containers, installs, source checkout, or Platform setup. -- Install steps, API usage, tutorials, or model-specific recipes -- Version-pinned commands, example scripts, or benchmark numbers -- Anything that must track frequent product releases +## Content Rules -When in doubt: one paragraph plus a link beats copying content that library teams already own. +Keep content here when it is stable and helps a reader choose: -## Directory structure +- Framework vs Platform, lifecycle stages, and repo roles. +- Task-to-library routing. +- Runtime path decisions. +- Catalog metadata from `repos.ts` and `containers.ts`. +- Framework container release metadata and cross-component known issues. +- Terminology and curated external learning links. -``` -fern/ -├── fern.config.json -├── docs.yml -├── components/ -│ ├── repos.ts # canonical org repo list -│ ├── RepoCatalog.tsx # searchable library catalog UI -│ ├── containers.ts # NGC container list + Framework recent releases -│ ├── ContainerCatalog.tsx # searchable container catalog UI -│ └── StageGuide.tsx # per-stage library cards for Get Started pages -└── docs/pages/ - ├── index.mdx # About → overview - ├── about/ - │ ├── ecosystem.mdx - │ ├── architecture.mdx - │ ├── concepts.mdx - │ ├── libraries.mdx # searchable repo catalog - │ └── release-notes/ - │ ├── index.mdx # release-notes overview - │ ├── containers.mdx # NGC container announcements - │ └── known-issues.mdx # cross-component container issues - ├── get-started/ - │ ├── index.mdx # Get Started hub - │ ├── quickstart.mdx - │ ├── installation.mdx - │ ├── data.mdx # lifecycle stage guides (+ StageGuide) - │ ├── pretraining.mdx - │ ├── rl.mdx - │ ├── inference.mdx - │ └── e2e.mdx - └── resources/ - └── community.mdx # Resources → community -``` +Send content downstream when it changes quickly: -When NVIDIA-NeMo adds or archives a repo, update `components/repos.ts`: -- Set **`stage`** to match the org README lifecycle column (Data · Pretraining · RL · Inference · E2E). -- Set **`kind`** to `library`, `integration`, `reference`, or `infrastructure` — see [TAXONOMY.md](./TAXONOMY.md). -- Add **`tags`** for search facets (modality, technique, role). See `GH-TOPICS.MD` for optional GitHub topic alignment. +- Install commands, API usage, tutorials, and examples. +- Model-specific recipes, benchmarks, and support matrices. +- Version pins and library-only workarounds. -When a new **NeMo Framework** NGC container ships: +Rule of thumb: add one orienting paragraph and a link instead of copying a library team's material. -1. Update `latestTag` and `FRAMEWORK_RECENT_RELEASES` in `components/containers.ts` (keep the last three releases). -2. Add any cross-component known issues to `about/release-notes/known-issues.mdx`. +## Common Updates -Add new standalone NGC images to `NEMO_CONTAINERS` in `components/containers.ts` when they publish. +### Add or Update a Repo -Custom React components (e.g. `RepoCatalog`, `ContainerCatalog`) must be **imported** in MDX — bare JSX tags are not auto-registered: +Update [components/repos.ts](./components/repos.ts). -```mdx -import RepoCatalog from "@/components/RepoCatalog"; +- Set `stage` to the org README lifecycle column: `data`, `pretraining`, `rl`, `inference`, or `e2e`. +- Set `kind` to `library`, `integration`, `reference`, or `infrastructure`. +- Add durable `tags` for search facets such as modality, technique, or role. +- Add `docsUrl` and `containerUrl` only when stable public targets exist. - -``` +See [TAXONOMY.md](./TAXONOMY.md) for the canonical vocabulary. + +### Add or Update a Container + +Update [components/containers.ts](./components/containers.ts). + +- Keep `latestTag` current. +- Keep `FRAMEWORK_RECENT_RELEASES` to a small recent set. +- Add standalone images to `NEMO_CONTAINERS` when they publish. +- Add cross-component release notes in [known-issues.mdx](./docs/pages/about/release-notes/known-issues.mdx). + +### Add a Concept + +Add a page under [docs/pages/about/concepts](./docs/pages/about/concepts). + +Concept pages should explain relationships, tradeoffs, or decision models. Put short term definitions in [glossary.mdx](./docs/pages/resources/glossary.mdx). -## Local development +### Add External Learning -### Prerequisites +Add durable third-party resources to [external-learning.mdx](./docs/pages/resources/external-learning.mdx). + +Prefer sources that reveal how external users frame tasks or confusion. Avoid copying commands or version-specific steps. + +## Local Development + +Prerequisites: - Node.js 22+ -- Fern CLI (`npm install -g fern-api`) +- Fern CLI: `npm install -g fern-api` -### Preview +Run checks and preview: ```bash cd fern -fern login # once, for global theme fetch +fern login fern check fern docs dev ``` Open [http://localhost:3000](http://localhost:3000). +Custom React components must be imported in MDX: + +```mdx +import RepoCatalog from "@/components/RepoCatalog"; + + +``` + ## Publish -Publishing uses the NVIDIA Fern organization token (`DOCS_FERN_TOKEN` org secret). +Publishing uses the NVIDIA Fern organization token: `DOCS_FERN_TOKEN`. ```bash git tag docs/v0.1.0 && git push origin docs/v0.1.0 ``` -Or run the **Publish Fern Docs** workflow from the Actions tab. - -Target URLs (configure in `docs.yml`): - -- Preview: `nemo-framework.docs.buildwithfern.com/nemo` -- Production: `docs.nvidia.com/nemo` +You can also run the **Publish Fern Docs** workflow from GitHub Actions. ## Theme -This site uses `global-theme: nvidia`. Theme assets are owned by the fern-components control repo — do not copy logos, CSS, or footer components here. Update branding in fern-components and re-upload the theme. +This site uses `global-theme: nvidia`. Theme assets are owned by [fern-components](https://github.com/NVIDIA/fern-components); keep logos, global CSS, and footer changes there. -**GitHub org link:** `navbar-links` in `docs.yml` points to [github.com/NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) (top-right header button). `footer-links.github` duplicates it in the footer. After publish, verify the header link renders — the NVIDIA global theme owns `navbar-links` and may override child values; if missing, add the org URL to the theme or request a site-specific override from the docs platform team. +The GitHub org link is configured in `navbar-links` and `footer-links` in [docs.yml](./docs.yml). After publishing, verify that the NVIDIA global theme renders the header link. diff --git a/fern/TAXONOMY.md b/fern/TAXONOMY.md index 1d92f21..3947fe7 100644 --- a/fern/TAXONOMY.md +++ b/fern/TAXONOMY.md @@ -16,13 +16,13 @@ NVIDIA NeMo (commercial suite — OSS + microservices + NIM + services) | Term | Meaning | | --- | --- | -| **NVIDIA NeMo** | Full software suite. Includes commercial products not listed on this hub. | -| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and documentation on **docs.nvidia.com/nemo**. Discovery layer — not a single product. | -| **NeMo Framework** | Named **model-lifecycle** stack: composable libraries from data through deployment. **Not one codebase.** | +| **NVIDIA NeMo** | Full software suite spanning open source libraries, commercial products, NIM, microservices, and services. | +| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and documentation on **docs.nvidia.com/nemo**. Entry point for choosing a stack, stage, library, or container. | +| **NeMo Framework** | Named **model-lifecycle** stack: composable libraries from data through deployment, each with its own source and docs. | | **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:`. Bundles Megatron-Bridge, Evaluator, Export-Deploy, Run, and NeMo Speech. | -| **NeMo Platform** | [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — CLI, SDK, and Studio for **agent** evaluate / secure / tune / deploy. Composes libraries; not a pipeline stage. | +| **NeMo Platform** | [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — CLI, SDK, and Studio for **agent** evaluate / secure / tune / deploy. Composes libraries into an agent integration experience. | | **Library** | A focused repo with its own docs and release cadence (Curator, AutoModel, RL, …). | -| **NeMo Speech** | The [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo — speech AI only. Do not use “NeMo” alone for the whole ecosystem. | +| **NeMo Speech** | The [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo — use this wording for speech AI in user-facing copy. | ## Repo `kind` (catalog metadata) @@ -43,11 +43,28 @@ Data · Pretraining · RL · Inference · E2E — same columns as [profile/READM | Page | Job | | --- | --- | -| **Concepts** | Glossary | +| **Concepts** | Core mental models and relationships. Use concept pages for explanatory topics, not term lookup. | | **Ecosystem** | Positioning and choices (Framework vs Platform, commercial boundary) | | **Architecture** | Structure — pipeline, backends, containers, Platform overlay | | **Libraries** | Inventory from `repos.ts` | +| **Task map** | Task-first routing from user intent to library, runtime path, and owning docs | +| **Runtime chooser** | Setup-path decision guide for containers, pip/source installs, and Platform setup | +| **Glossary** | Lookup-oriented definitions for terms, acronyms, and product names | +| **External learning** | Curated third-party blogs, videos, and partner examples with freshness caveats | -## Out of scope for this hub +## Concepts section -Customizer, NIM, and other commercial NeMo microservices — link from Ecosystem only; do not duplicate product docs. +Concepts is a directory, not a glossary. Keep pages focused on stable relationships that help readers reason across repos: + +| Concept page | Job | +| --- | --- | +| **Framework and Platform** | Distinguish model-lifecycle work from integrated agent workflows | +| **Lifecycle stages** | Explain Data, Pretraining, RL, Inference, and E2E as workflow stages | +| **Repository catalog model** | Explain stage, kind, and tags | +| **Training backends and checkpoints** | Explain AutoModel, Megatron-Bridge, and checkpoint flow at a decision level | +| **Containers and installs** | Explain Framework container, standalone containers, and library installs | +| **Documentation surfaces** | Explain what the hub, library docs, repos, release notes, and glossary own | + +## Broader suite references + +Customizer, NIM, and other commercial NeMo microservices have their own product documentation. Link to them from Ecosystem when they help readers understand the full suite. diff --git a/fern/docs.yml b/fern/docs.yml index b832e37..f292015 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -63,14 +63,16 @@ navigation: - page: Architecture path: docs/pages/about/architecture.mdx icon: fa-duotone fa-diagram-project - - page: Concepts - path: docs/pages/about/concepts.mdx + - folder: ./docs/pages/about/concepts + title: Concepts + slug: about/concepts icon: fa-duotone fa-lightbulb + title-source: frontmatter - page: Libraries path: docs/pages/about/libraries.mdx icon: fa-duotone fa-grid-2 - folder: ./docs/pages/about/release-notes - title: Release notes + title: Release Notes slug: about/release-notes icon: fa-duotone fa-tag title-source: frontmatter @@ -78,12 +80,18 @@ navigation: path: docs/pages/get-started/index.mdx icon: fa-duotone fa-rocket contents: + - page: Task Map + path: docs/pages/get-started/task-map.mdx + icon: fa-duotone fa-map - page: Quickstart path: docs/pages/get-started/quickstart.mdx icon: fa-duotone fa-bolt - page: Installation path: docs/pages/get-started/installation.mdx icon: fa-duotone fa-download + - page: Runtime Chooser + path: docs/pages/get-started/runtime-chooser.mdx + icon: fa-duotone fa-box - section: By lifecycle stage contents: - page: Data @@ -102,7 +110,15 @@ navigation: path: docs/pages/get-started/e2e.mdx icon: fa-duotone fa-diagram-project - section: Resources + path: docs/pages/resources/index.mdx + icon: fa-duotone fa-book-open contents: + - page: Glossary + path: docs/pages/resources/glossary.mdx + icon: fa-duotone fa-book + - page: External Learning + path: docs/pages/resources/external-learning.mdx + icon: fa-duotone fa-arrow-up-right-from-square - page: Community path: docs/pages/resources/community.mdx icon: fa-duotone fa-comments diff --git a/fern/docs/pages/about/architecture.mdx b/fern/docs/pages/about/architecture.mdx index 1570476..6d5f937 100644 --- a/fern/docs/pages/about/architecture.mdx +++ b/fern/docs/pages/about/architecture.mdx @@ -1,24 +1,26 @@ --- title: Architecture -subtitle: How NeMo OSS libraries fit together +subtitle: How NeMo OSS Libraries Fit Together slug: about/architecture position: 3 --- -**NeMo OSS** is not one monolithic codebase. It is an organization of focused repos you can use individually or together. Two named stacks sit on top of those repos: +**NeMo OSS** is an ecosystem of focused repositories that you can use individually or together. Two named stacks organize those repos around common workflows: - **NeMo Framework** — the **model lifecycle** (data → train → align → evaluate → deploy). - **NeMo Platform** — **agent integration** (evaluate, secure, tune, and deploy agents using selected libraries). -For positioning within NVIDIA NeMo, refer to [Ecosystem](/about/ecosystem). For term definitions, refer to [Concepts](/about/concepts). +For positioning within NVIDIA NeMo, refer to [Ecosystem](/about/ecosystem). For the organizing model, refer to [Concepts](/about/concepts). -## Three layers +## Three Layers + +The architecture is easiest to read as libraries, the Framework stack, and the Platform integration layer. | Layer | What it is | How you use it | | --- | --- | --- | -| **Libraries** | 22 repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) — one product per repo | `pip install`, per-library docs, or standalone NGC images | +| **Libraries** | 22 repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), each with its own source, releases, and docs | `pip install`, per-library docs, or standalone NGC images | | **NeMo Framework** | Model-lifecycle stack — the pipeline below | Pick libraries by stage, or pull the multi-library Framework container | | **NeMo Platform** | Agent product — CLI, SDK, Studio | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | @@ -38,9 +40,9 @@ flowchart TB PCLI -.-> PIPE ``` -Platform **composes** libraries (Guardrails, Evaluator, Data Designer, and others). It is **not** a box in the Framework pipeline — it is a cross-cutting integration layer for agent workflows. Training still starts with Framework libraries such as AutoModel or Megatron-Bridge. +Platform **composes** libraries such as Guardrails, Evaluator, and Data Designer into agent workflows. For model training, start with Framework libraries such as AutoModel or Megatron-Bridge. -## NeMo Framework pipeline +## NeMo Framework Pipeline The diagram shows representative libraries. The live list is [Libraries](/about/libraries) filtered by stage. @@ -86,7 +88,9 @@ flowchart LR Solid arrows are the typical **model lifecycle**. Dotted lines are **orchestration and reference assets** (Run, Skills, Nemotron) that span stages. -## Functional layers +## Functional Layers + +Functional layers describe the role each stage plays in model development. | Layer | Purpose | | --- | --- | @@ -98,7 +102,7 @@ Solid arrows are the typical **model lifecycle**. Dotted lines are **orchestrati Which repos sit in each layer changes over time — use [Libraries](/about/libraries) (filtered by stage) as the live catalog. -## Training backends +## Training Backends Two primary training paths coexist: @@ -127,9 +131,9 @@ flowchart TB Both paths can feed the same **RL**, **Evaluator**, and **Export-Deploy** libraries downstream. -## NeMo Platform overlay +## NeMo Platform Overlay -[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates NeMo libraries for **agent** workflows — not for large-scale pretraining today. +[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates NeMo libraries for **agent** workflows. Use Framework libraries such as AutoModel, Megatron-Bridge, and NeMo RL for model training and alignment. | Platform capability | Libraries and services involved | | --- | --- | @@ -139,23 +143,23 @@ Both paths can feed the same **RL**, **Evaluator**, and **Export-Deploy** librar | Build agents | NeMo Agent Toolkit (NAT), Inference Gateway | | Synthetic data | Data Designer | -Platform is **not** the Framework NGC container. Refer to [Framework and Platform](/about/concepts#framework-and-platform) for naming and [Ecosystem](/about/ecosystem#choose-framework-or-platform) for when to start here. +Platform has its own CLI, SDK, and Studio experience. Refer to [Framework and Platform](/about/concepts/framework-and-platform) for naming and [Ecosystem](/about/ecosystem#choose-framework-or-platform) for when to start here. - + Setup, CLI reference, and API for agent hardening and evaluation. - + Model evaluation, export, guardrails, and Platform entry points. -## NGC containers +## NGC Containers -NGC images bundle tested dependency sets. They are **delivery mechanisms** for Framework libraries — not separate products. +NGC images bundle tested dependency sets for Framework libraries. | Image | Scope | | --- | --- | @@ -166,12 +170,14 @@ NGC images bundle tested dependency sets. They are **delivery mechanisms** for F Refer to the [container catalog](/about/release-notes/containers) for pull commands and recent Framework tags. -## Where to go next +## Where to Go Next + +Use these pages when you are ready to choose an entry point or browse the catalog. -Glossary — Framework, Platform, stages, and homonyms. +How Framework, Platform, stages, libraries, and containers fit together. diff --git a/fern/docs/pages/about/concepts.mdx b/fern/docs/pages/about/concepts.mdx deleted file mode 100644 index 1d337e9..0000000 --- a/fern/docs/pages/about/concepts.mdx +++ /dev/null @@ -1,137 +0,0 @@ ---- -title: Concepts -subtitle: Terms and ideas used across NeMo OSS -slug: about/concepts -position: 4 ---- - -Key concepts for navigating **NeMo OSS** — the hub, the GitHub org, and the catalogs it maintains. - -## NeMo OSS - -**NeMo OSS** is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), documented on **docs.nvidia.com/nemo**. It is the **discovery layer** — this hub orients you; each library owns its own manual. - -NeMo OSS contains two major stacks: - -| Stack | Focus | Entry | -| --- | --- | --- | -| **NeMo Framework** | **Model lifecycle** — data through deployment | [Get Started by stage](/get-started) or Framework NGC container | -| **NeMo Platform** | **Agent lifecycle** — evaluate, secure, tune, deploy agents | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | - -NeMo OSS is one part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Commercial products (Customizer, NIM, microservices) live outside this catalog. - -## Framework and Platform - -| Term | Meaning | -| --- | --- | -| **NeMo Framework** (ecosystem) | The open source **model-lifecycle** library stack in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo). Composable repos from data through deployment. **Named “Framework” but not one monolithic codebase.** | -| **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:` bundling Megatron-Bridge, Evaluator, Export-Deploy, Run, and NeMo Speech — refer to [Containers and releases](#containers-and-releases) | -| **NeMo Platform** | Product repo [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — **CLI, SDK, and Studio UI** that integrate NeMo libraries for **agents**. Cross-cutting; not a pipeline stage. | - -Use **Framework** for models and the training/eval/deploy pipeline. Use **Platform** for agent integration. Platform docs: [NeMo Platform documentation](https://nvidia-nemo.github.io/nemo-platform/main/). - -When to choose each: [Ecosystem](/about/ecosystem#choose-framework-or-platform). How they connect: [Architecture](/about/architecture#three-layers). - -## Homonyms - -| Name | Means | Does not mean | -| --- | --- | --- | -| **NeMo Framework** | The library ecosystem **or** the multi-library NGC container | A single GitHub repo | -| **NeMo** ([NeMo repo](https://github.com/NVIDIA-NeMo/NeMo)) | **NeMo Speech** — ASR, TTS, speech-language models | The whole OSS org or Platform | -| **NeMo Platform** | Agent integration product (`nemo-platform`) | The Framework container or NVIDIA NeMo commercial suite | -| **NeMo OSS** | This org + hub | Every NVIDIA product with “NeMo” in the name | - -Always say **NeMo Speech** when referring to the `NeMo` repository in user-facing copy. - -## Lifecycle stages - -Libraries are grouped into five **lifecycle stages** — the same columns as the [org README](https://github.com/NVIDIA-NeMo): - -| Stage | You typically… | -| --- | --- | -| **Data** | Curate, filter, synthesize, or anonymize training data | -| **Pretraining** | Pretrain, fine-tune, or adapt foundation models | -| **RL** | Align models with SFT, DPO, GRPO, or reinforcement learning | -| **Inference** | Evaluate quality, export checkpoints, deploy, or add guardrails | -| **E2E** | Run multi-step recipes, orchestrate experiments, share reference pipelines | - -Stage guides: [Get Started](/get-started) → **By lifecycle stage**. - -## Repo kind - -In the [Libraries](/about/libraries) catalog, each repo has a **stage** (README column) and a **kind** (role): - -| Kind | Role | Examples | -| --- | --- | --- | -| **Library** | Focused product for a stage | Curator, AutoModel, RL, Evaluator | -| **Integration** | Composes libraries into one product | NeMo Platform | -| **Reference** | Recipes, cookbooks, example pipelines | Skills, Nemotron | -| **Infrastructure** | Shared CI or meta repos | FW-CI-templates | - -Kind explains repos that do not fit neatly into one pipeline box (for example Platform under Inference for discoverability). - -## Hub and library documentation - -| Term | Meaning | -| --- | --- | -| **Hub** | This site — orientation, catalog, containers, known issues, get-started by stage | -| **Library docs** | Each product's Fern or docs site (for example `docs.nvidia.com/nemo/curator`) | -| **Repo** | GitHub source in NVIDIA-NeMo — issues, PRs, and `CONTRIBUTING.md` live there | - -The hub answers *which library and where to click next*. Library docs answer *how to use that library*. - -## What belongs on this hub - -| Keep here | Put in library docs | -| --- | --- | -| Lifecycle stages, pipeline shape, Framework vs Platform | Tutorials, APIs, configuration | -| Choosing AutoModel or Megatron-Bridge | Model-specific recipes and scripts | -| Catalogs (`repos.ts`, container catalog) | Per-release install pins and changelogs | -| Framework container tags and cross-component known issues | Library-only bugs and workarounds | - -Hub content should help a **decision** and have a **long lifespan**. If it goes stale every release, link out instead. - -## Training backends - -| Term | Meaning | -| --- | --- | -| **PyTorch / HF path** | Hugging Face models and trainers with [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) — best default for ≤1K GPUs | -| **Megatron-Core path** | Large-scale training with [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) and HF ↔ Megatron checkpoint conversion | -| **Recipe** | Scripted training or alignment configuration (often in library repos or [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)) | - -## Containers and releases - -| Term | Meaning | -| --- | --- | -| **NeMo Framework container** | Multi-library NGC image `nvcr.io/nvidia/nemo:` | -| **Standalone container** | Single-library images (AutoModel, RL, Curator) | -| **Component versions** | Pinned packages inside a Framework container — refer to [software component versions](https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html) | - -Release metadata: [Release notes](/about/release-notes). - -## Tags and stages - -In the [Libraries](/about/libraries) catalog: - -- **Stage** — primary lifecycle column (one per repo). -- **Tags** — cross-cutting search facets (`speech`, `evaluation`, `agents`, and so on). - -A repo has one stage; tags help you find libraries that span concerns (for example Evaluator tagged for benchmarks). - -## Related pages - - - - -NeMo OSS within NVIDIA NeMo and Framework or Platform choices. - - - -Three layers, pipeline diagram, backends, and containers. - - - -Searchable catalog of all repos. - - - diff --git a/fern/docs/pages/about/concepts/containers-and-installs.mdx b/fern/docs/pages/about/concepts/containers-and-installs.mdx new file mode 100644 index 0000000..7f3e9a8 --- /dev/null +++ b/fern/docs/pages/about/concepts/containers-and-installs.mdx @@ -0,0 +1,62 @@ +--- +title: Containers and Installs +subtitle: Runtime Choices for NeMo OSS Workflows +slug: about/concepts/containers-and-installs +position: 5 +--- + +NeMo OSS supports several runtime paths. Choose the path based on the workflow you want to run, then follow the linked library docs for current commands. + +## Runtime Choices + +Choose the runtime path that matches your workflow scope and development mode. + +| Runtime path | Use it for | +| --- | --- | +| **Framework container** | Multi-library workflows with `nvcr.io/nvidia/nemo:`. | +| **Standalone containers** | Single-library workflows such as AutoModel, RL, or Curator. | +| **Per-library installs** | Lightweight local development when a library recommends pip or source install. | +| **Source checkout** | Contributing, debugging, or running examples that require repository files. | + +```mermaid +flowchart TB + Goal["Workflow goal"] --> Multi["Cross-library Framework workflow"] + Goal --> Single["One library"] + Goal --> Dev["Local development"] + Multi --> FWC["Framework container"] + Single --> Standalone["Standalone container or library install"] + Dev --> Source["Source checkout"] +``` + +## Why Containers Cause Confusion + +The name **NeMo Framework** can mean the model-lifecycle stack or the `nvcr.io/nvidia/nemo` container. The container packages a tested set of components for selected Framework workflows. Standalone images and per-library installs remain the better fit for many single-library workflows. + +Standalone images exist for focused library workflows. Library docs cover extras, optional dependencies, local development, and release-specific compatibility. + +## Where to Check Versions + +Use the release or library page for details that change by version. + +| Need | Go to | +| --- | --- | +| Recent Framework container tags | [Container releases](/about/release-notes/containers) | +| Cross-component Framework container known issues | [Known issues](/about/release-notes/known-issues) | +| Library-specific install commands | The library docs linked from [Libraries](/about/libraries) | +| GitHub source and issues | The repo linked from [Libraries](/about/libraries) | + +## Continue + +Use these pages when you are ready to turn runtime choice into setup steps. + + + + +Compare the main setup paths. + + + +See Framework tags and component versions. + + + diff --git a/fern/docs/pages/about/concepts/documentation-surfaces.mdx b/fern/docs/pages/about/concepts/documentation-surfaces.mdx new file mode 100644 index 0000000..7ae6d4f --- /dev/null +++ b/fern/docs/pages/about/concepts/documentation-surfaces.mdx @@ -0,0 +1,53 @@ +--- +title: Where to Find Information +subtitle: Setup, Releases, Examples, Terminology, and Support +slug: about/concepts/documentation-surfaces +position: 6 +--- + +NeMo OSS information lives in a few places. Use this page to jump to the right source for the question you have. + +## Information Map + +Each source is best for a different kind of question. + +| Source | Use it for | +| --- | --- | +| **NeMo OSS Hub** | Choose a stack, library, stage, or container. | +| **Get Started** | Start by task, lifecycle stage, or runtime. | +| **Library Docs** | Learn install details, APIs, tutorials, configuration, examples, and current support. | +| **Release Notes** | Check Framework container tags, component versions, and cross-component known issues. | +| **GitHub** | Read source, file issues, open PRs, and follow contribution instructions. | +| **Glossary** | Look up terms, acronyms, and product names. | +| **External Learning** | Find community writeups and videos, then verify current steps in library docs. | + +## Quick Routing + +Use the first form of your question to choose a starting point. + +| If your question starts with... | Start here | +| --- | --- | +| "Which NeMo repo should I use?" | [Libraries](/about/libraries) | +| "I know the task, not the repo name." | [Task map](/get-started/task-map) | +| "Do I need Framework or Platform?" | [Framework and Platform](/about/concepts/framework-and-platform) | +| "Which stage am I in?" | [Lifecycle stages](/about/concepts/lifecycle-stages) | +| "Which container or install path?" | [Containers and installs](/about/concepts/containers-and-installs) | +| "What does this term mean?" | [Glossary](/resources/glossary) | +| "Has someone written or recorded an example?" | [External learning](/resources/external-learning) | +| "How do I run this exact example?" | The linked library docs or GitHub repo | + +## Continue + +Use these resources when your next step is lookup, community context, or examples. + + + + +Community links, glossary, releases, and support entry points. + + + +Look up product names, repo roles, and common terms. + + + diff --git a/fern/docs/pages/about/concepts/framework-and-platform.mdx b/fern/docs/pages/about/concepts/framework-and-platform.mdx new file mode 100644 index 0000000..c69a8f5 --- /dev/null +++ b/fern/docs/pages/about/concepts/framework-and-platform.mdx @@ -0,0 +1,79 @@ +--- +title: Framework and Platform +subtitle: Two Adoption Paths for Model and Agent Work +slug: about/concepts/framework-and-platform +position: 1 +--- + +Most NeMo OSS users start from one of two paths: **NeMo Framework** for model lifecycle work, or **NeMo Platform** for integrated agent workflows. They share libraries, but they answer different developer questions. + +## The Main Distinction + +Choose the path based on whether your work centers on models or integrated agents. + +| Your goal | Start with | Why | +| --- | --- | --- | +| Train, fine-tune, align, evaluate, export, or deploy **models** | **NeMo Framework** | It organizes libraries around the model lifecycle from data through deployment. | +| Evaluate, secure, tune, or deploy **agents** | **NeMo Platform** | It composes selected libraries behind a CLI, SDK, and Studio UI for agent workflows. | +| Explore before choosing | [Libraries](/about/libraries) | The catalog lets you filter by stage, kind, and tags. | + +**NeMo Framework** is the model-lifecycle stack: data, training, RL, evaluation, export, and guardrails. It is also the name of the multi-library NGC image (`nvcr.io/nvidia/nemo:`). When the distinction matters, these docs use **Framework libraries** for the stack and **Framework container** for the image. + +**NeMo Platform** is the agent integration product in the [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) repo. It brings together libraries such as Guardrails, Evaluator, and Data Designer for agent use cases. + +```mermaid +flowchart TB + subgraph Framework["NeMo Framework - model lifecycle"] + Data["Data"] + Train["Pretraining"] + Align["RL"] + Deploy["Inference"] + Data --> Train --> Align --> Deploy + end + + subgraph Platform["NeMo Platform - agent workflows"] + CLI["CLI"] + SDK["SDK"] + Studio["Studio"] + end + + Platform -. uses .-> Deploy + Platform -. uses .-> Data +``` + +## When the Paths Meet + +The two paths often meet downstream. A team might curate data with Curator, train with AutoModel or Megatron-Bridge, align with NeMo RL, benchmark with Evaluator, and then use Platform when the output becomes part of an agent workflow. + +Framework describes the model pipeline. Platform describes an integrated agent experience built from selected pieces of that pipeline. + +## Common Naming Traps + +These names are close enough that readers often need a quick disambiguation. + +| Name | Meaning | +| --- | --- | +| **NeMo OSS** | The GitHub organization, documentation, and related OSS containers. | +| **NeMo Framework** | Model lifecycle libraries and, in container contexts, the `nvcr.io/nvidia/nemo` image. | +| **NeMo Platform** | Agent integration product with CLI, SDK, and Studio. | +| **NeMo Speech** | Speech AI docs and source in the `NVIDIA-NeMo/NeMo` repo. | + +## Continue + +Use these pages when you need broader positioning or structural context. + + + + +Follow the model pipeline from data through inference. + + + +See how OSS, Platform, and the broader NVIDIA NeMo suite relate. + + + +See the structural diagram and library layers. + + + diff --git a/fern/docs/pages/about/concepts/index.mdx b/fern/docs/pages/about/concepts/index.mdx new file mode 100644 index 0000000..b2e19a0 --- /dev/null +++ b/fern/docs/pages/about/concepts/index.mdx @@ -0,0 +1,104 @@ +--- +title: Concepts +subtitle: Mental Models for Working Across NeMo OSS +slug: about/concepts +position: 4 +--- + +Use these pages to understand how **NeMo OSS** is organized: how repos are grouped, how workflows move across libraries, and how containers relate to install paths. + +For short term definitions and acronyms, use the [Glossary](/resources/glossary). + +## Concept Map + +NeMo OSS is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), documentation on **docs.nvidia.com/nemo**, and the NGC containers that package common runtime paths. + +Use this site to choose a stack, stage, library, or container, then follow the linked library docs for usage details. + +```mermaid +flowchart TB + Hub["NeMo OSS"] + Catalog["Libraries"] + Framework["NeMo Framework\nmodel lifecycle"] + Platform["NeMo Platform\nagent workflows"] + LibraryDocs["Library docs"] + Repos["GitHub"] + Containers["NGC containers"] + + Hub --> Catalog + Catalog --> Framework + Catalog --> Platform + Framework --> LibraryDocs + Platform --> LibraryDocs + LibraryDocs --> Repos + Framework --> Containers +``` + +NeMo OSS is one part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, alongside commercial products such as Customizer, NIM, and NeMo microservices. + +## Pages + +These concept pages split the main mental model into focused explanations. + + + + +Choose between the model-lifecycle stack and the agent integration product. + + + +Understand Data, Pretraining, RL, Inference, and E2E as workflow stages. + + + +Read stage, kind, and tags without treating repo placement as a rigid boundary. + + + +Compare the PyTorch / Hugging Face and Megatron-Core paths, including checkpoint flow. + + + +Choose between Framework containers, standalone images, and per-library installs. + + + +Choose the right page for setup, releases, examples, terminology, and support. + + + + +## What These Pages Clarify + +Use the Concepts section when you need durable context before following a library-specific guide: + +| Question | Concept response | +| --- | --- | +| Which name, stack, repo, or container should I start with? | [Framework and Platform](/about/concepts/framework-and-platform), [Containers and installs](/about/concepts/containers-and-installs) | +| How do workflows move across libraries and stages? | [Lifecycle stages](/about/concepts/lifecycle-stages), [Repository catalog model](/about/concepts/repository-catalog) | +| How do training backend and checkpoint choices affect downstream work? | [Training backends and checkpoints](/about/concepts/training-backends-and-checkpoints) | +| Where should I go for setup, releases, examples, or support? | [Where to find information](/about/concepts/documentation-surfaces) | + +## Related Pages + +These pages provide the surrounding orientation, structure, catalog, and terminology. + + + + +Choose Framework, Platform, or broader NVIDIA NeMo entry points. + + + +See the layers, pipeline diagram, backends, and containers. + + + +Search and filter all open source repos. + + + +Look up terms, acronyms, and product names. + + + diff --git a/fern/docs/pages/about/concepts/lifecycle-stages.mdx b/fern/docs/pages/about/concepts/lifecycle-stages.mdx new file mode 100644 index 0000000..d00b357 --- /dev/null +++ b/fern/docs/pages/about/concepts/lifecycle-stages.mdx @@ -0,0 +1,66 @@ +--- +title: Lifecycle Stages +subtitle: How NeMo Framework Groups Model Work +slug: about/concepts/lifecycle-stages +position: 2 +--- + +NeMo Framework groups libraries by the stage of model development they usually support. Stages help you find the right entry point, especially when a workflow spans several repos. + +## Stage Model + +Stages describe the usual position of each library in the model lifecycle. + +| Stage | You typically... | Examples | +| --- | --- | --- | +| **Data** | Curate, filter, synthesize, anonymize, or govern datasets | Curator, Data Designer, Anonymizer | +| **Pretraining** | Pretrain, fine-tune, adapt models, or convert training formats | AutoModel, Megatron-Bridge, NeMo Speech | +| **RL** | Align models with SFT, DPO, GRPO, distillation, or reinforcement learning | NeMo RL, Gym, ProRL Agent Server | +| **Inference** | Evaluate quality, export checkpoints, deploy, or add guardrails | Evaluator, Export-Deploy, Guardrails | +| **E2E** | Run multi-step recipes, orchestrate experiments, or share reference pipelines | Run, Skills, Nemotron | + +```mermaid +flowchart LR + Data["Data"] --> Pretraining["Pretraining"] + Pretraining --> RL["RL"] + RL --> Inference["Inference"] + Pretraining --> Inference + E2E["E2E recipes and orchestration"] -. spans .-> Data + E2E -. spans .-> Pretraining + E2E -. spans .-> RL + E2E -. spans .-> Inference +``` + +The arrows show a common model lifecycle, not a required sequence. You can use Evaluator without training a model in NeMo, or use Curator for data that feeds a non-NeMo training stack. + +## Stage Guides and Cross-Stage Work + +Use stage guides when you know the part of the lifecycle you are working on: + +- [Data](/get-started/data) +- [Pretraining](/get-started/pretraining) +- [RL](/get-started/rl) +- [Inference](/get-started/inference) +- [E2E](/get-started/e2e) + +Use [Libraries](/about/libraries) when your work crosses stages. Agent evaluation, deployment, speech, and checkpoint conversion often involve several repos even when one repo is the main starting point. + +## What Stays Stable + +Stage pages help you choose an entry point. For supported models, commands, and detailed recipes, follow the linked library docs. + +## Continue + +Use these pages when you need to connect stage selection to repo selection or training flow. + + + + +Learn how stage, kind, and tags work together. + + + +Choose a training path and understand checkpoint flow. + + + diff --git a/fern/docs/pages/about/concepts/repository-catalog.mdx b/fern/docs/pages/about/concepts/repository-catalog.mdx new file mode 100644 index 0000000..5395c83 --- /dev/null +++ b/fern/docs/pages/about/concepts/repository-catalog.mdx @@ -0,0 +1,63 @@ +--- +title: Repository Catalog Model +subtitle: How Stage, Kind, and Tags Describe Each Repo +slug: about/concepts/repository-catalog +position: 3 +--- + +The [Libraries](/about/libraries) catalog uses three signals to describe each GitHub repo: **stage**, **kind**, and **tags**. Together, they answer where a repo fits, what role it plays, and which cross-cutting workflows it supports. + +## Three Catalog Signals + +The catalog uses three signals so a repo can be found by lifecycle position, adoption role, and cross-cutting topic. + +| Signal | What it answers | Example | +| --- | --- | --- | +| **Stage** | Where the repo usually sits in the lifecycle | Data, Pretraining, RL, Inference, E2E | +| **Kind** | The role the repo plays | Library, integration, reference, infrastructure | +| **Tags** | Cross-cutting concerns | `speech`, `evaluation`, `agents`, `deployment` | + +Stages match the NVIDIA-NeMo org README columns. Kind keeps the catalog honest when stage alone would be misleading. Tags help users find workflows that do not fit neatly in one lifecycle stage. + +## Kind + +Kind explains the adoption role of a repo when stage alone is too broad. + +| Kind | Role | Examples | +| --- | --- | --- | +| **Library** | Focused product for a lifecycle stage | Curator, AutoModel, RL, Evaluator | +| **Integration** | Composes libraries into one product surface | NeMo Platform | +| **Reference** | Recipes, cookbooks, datasets, and example pipelines | Skills, Nemotron | +| **Infrastructure** | Shared project infrastructure | FW-CI-templates | + +For example, Platform appears under Inference because agent builders often arrive through evaluation, guardrails, and deployment. Its kind still tells you that it is an integration product. + +## Tags + +Tags are search facets, not a second taxonomy. They help connect repos by theme: + +| Tag theme | Why it helps | +| --- | --- | +| **Modality** | Find speech, vision-language, tabular, or multimodal work. | +| **Technique** | Find GRPO, DPO, synthetic data, curation, or checkpoint conversion work. | +| **Role** | Find evaluation, deployment, guardrails, agents, privacy, or orchestration work. | + +## When Repo Placement Looks Surprising + +Some repos support more than one stage. The catalog places each repo where most readers are likely to begin, then uses kind and tags to show its broader role. + +## Continue + +Use these pages when you need to move from repo placement to a specific next step. + + + + +Search and filter the live catalog. + + + +Find setup, releases, examples, terminology, and support links. + + + diff --git a/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx b/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx new file mode 100644 index 0000000..6a6a72a --- /dev/null +++ b/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx @@ -0,0 +1,62 @@ +--- +title: Training Backends and Checkpoints +subtitle: How the Main Training Paths Connect Downstream +slug: about/concepts/training-backends-and-checkpoints +position: 4 +--- + +NeMo Framework supports two primary training paths. The right path depends on your scale, source model format, and checkpoint flow. + +## Two Training Paths + +Choose the training path based on scale, source model format, and downstream checkpoint needs. + +| Path | Start with | Typical fit | +| --- | --- | --- | +| **PyTorch / Hugging Face** | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | Fine-tuning, research iteration, and training up to roughly 1,000 GPUs. | +| **Megatron-Core** | [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | Large-scale pretraining, SFT, and Hugging Face to Megatron checkpoint conversion. | + +Both paths can feed downstream RL, evaluation, and export workflows. + +```mermaid +flowchart TB + Data["Curated data"] --> AM["AutoModel\nPyTorch / HF"] + Data --> MB["Megatron-Bridge\nMegatron-Core"] + AM --> RL["NeMo RL"] + MB --> RL + AM --> Eval["Evaluator"] + MB --> Eval + RL --> Eval + Eval --> Export["Export-Deploy"] +``` + +## Checkpoint Flow + +Checkpoint format affects the rest of the workflow. Check whether downstream RL, evaluation, or deployment expects Hugging Face, Megatron, NeMo, or another serving format before choosing a training path. + +| Question | Good next page | +| --- | --- | +| Which training library should I start with? | [Pretraining guide](/get-started/pretraining) | +| How do I convert between Hugging Face and Megatron checkpoints? | [Megatron-Bridge docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Can my trained model run through RL or evaluation? | [RL guide](/get-started/rl), [Inference guide](/get-started/inference) | +| How do I export or serve a model? | [Export-Deploy docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | + +## Next Step + +After choosing a path, use the linked library docs for model support, conversion scripts, commands, and compatibility notes. + +## Continue + +Use these pages when you are ready to choose a training entry point or runtime. + + + + +Choose AutoModel, Megatron-Bridge, NeMo Speech, or optimizer libraries. + + + +Choose a runtime path before running examples. + + + diff --git a/fern/docs/pages/about/ecosystem.mdx b/fern/docs/pages/about/ecosystem.mdx index b587307..97f7d3d 100644 --- a/fern/docs/pages/about/ecosystem.mdx +++ b/fern/docs/pages/about/ecosystem.mdx @@ -1,19 +1,19 @@ --- title: Ecosystem -subtitle: NeMo OSS within the NVIDIA NeMo software suite +subtitle: NeMo OSS Within the NVIDIA NeMo Software Suite slug: about/ecosystem position: 2 --- -**NeMo OSS** is the open source side of [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) — the public GitHub organization [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and the documentation hub you are reading now. Commercial NeMo products, enterprise services, and NIM microservices live outside this catalog. +**NeMo OSS** is the public, open source entry point for [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/): the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization, and the libraries and containers that support model and agent development. -This page explains **where NeMo OSS fits** and **what to choose**. For definitions, refer to [Concepts](/about/concepts). For structure, refer to [Architecture](/about/architecture). +This page explains **where NeMo OSS fits** and **what to choose**. For the organizing model, refer to [Concepts](/about/concepts). For structure, refer to [Architecture](/about/architecture). -## What NeMo OSS includes +## What You Can Build -NeMo OSS is not a single product. It is an **organization of repos** plus two named ways to adopt them: +NeMo OSS is organized around two adoption paths: ```mermaid flowchart TB @@ -39,9 +39,9 @@ flowchart TB | **Agents** (evaluate, secure, tune, deploy in production) | **NeMo Platform** — or individual inference libraries à la carte | | **Not sure yet** | [Libraries](/about/libraries) catalog or [decision guide](/get-started#decision-guide) | -## Framework lifecycle stages +## Framework Lifecycle Stages -**NeMo Framework** delivers the model pipeline as composable open source libraries: +NeMo Framework delivers the model pipeline as composable open source libraries: | Stage | Role in the pipeline | | --- | --- | @@ -55,23 +55,25 @@ Repo names and counts change — [Libraries](/about/libraries) is the searchable ## Choose Framework or Platform -These names sound similar but address different jobs: +These names sound similar because they share libraries, but they serve different developer jobs: | | **NeMo Framework** | **NeMo Platform** | | --- | --- | --- | | **What it is** | Model-lifecycle **libraries** — data, training, RL, evaluation, export, guardrails | Agent **integration product** — CLI, Python SDK, and web UI | | **You adopt it when…** | You are training, aligning, evaluating, or deploying **models** | You are shipping **agents** and want evaluate / secure / tune / deploy in one setup | | **Typical entry** | Stage guide → library docs, or Framework NGC container | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | -| **Docs** | This hub + per-library Fern sites | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | -| **Relationship** | Libraries are the building blocks | Composes Guardrails, Evaluator, Data Designer, and others — does not replace AutoModel or Megatron-Bridge for training | +| **Docs** | NeMo OSS pages and library docs | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | +| **Relationship** | Libraries are the building blocks | Composes Guardrails, Evaluator, Data Designer, and others for agent workflows | -**NeMo Framework** also names the multi-library NGC container (`nvcr.io/nvidia/nemo`). Refer to [Concepts](/about/concepts#framework-and-platform) for all uses of “Framework.” +**NeMo Framework** also names the multi-library NGC container (`nvcr.io/nvidia/nemo`). Refer to [Framework and Platform](/about/concepts/framework-and-platform) for naming details. Many teams train with **AutoModel or Megatron-Bridge**, align with **NeMo RL**, benchmark with **Evaluator**, and serve with **Export-Deploy**. Agent builders can start from **NeMo Platform** instead of wiring those pieces manually. -## Commercial NeMo boundary +## Open Source and Commercial Entry Points -| In NeMo OSS hub | Outside this catalog | +NeMo OSS focuses on open source repositories, OSS documentation, and Framework container releases. The broader NVIDIA NeMo suite also includes commercial products, enterprise services, and NIM microservices with their own product documentation. + +| NeMo OSS | Broader NVIDIA NeMo docs | | --- | --- | | 22 public GitHub repos in NVIDIA-NeMo | Customizer, NIM, enterprise services | | Framework container release notes | Per-tenant managed offerings | @@ -79,12 +81,14 @@ Many teams train with **AutoModel or Megatron-Bridge**, align with **NeMo RL**, For the full suite, refer to [NVIDIA NeMo (commercial)](https://www.nvidia.com/en-us/ai-data-science/products/nemo/). -## What this hub covers and product docs +## OSS and Product Docs + +Use this comparison to choose between open source entry points and product documentation. -| | **NeMo OSS hub** (this site) | **Per-library docs** | +| | **NeMo OSS** | **Library docs** | | --- | --- | --- | -| **Audience** | Choosing Framework vs Platform, stage, or library | Using one product deeply | -| **Content** | Catalog, get-started by stage, release notes | APIs, tutorials, recipes | +| **Audience** | Choose a stack, stage, or library | Use one library or product deeply | +| **Content** | Catalog, get-started paths, release notes | APIs, tutorials, recipes | | **Scope** | Orientation across NeMo OSS | One library or Platform at a time | ## Choosing AutoModel or Megatron-Bridge @@ -100,12 +104,14 @@ Both train large language models (LLMs) and vision language models (VLMs) on NVI Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) directly — the [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo is speech-only today. -## Related entry points +## Related Entry Points + +Use these links to move from ecosystem positioning to concepts, structure, catalog, and setup. -Glossary — OSS, Framework, Platform, homonyms. +How NeMo OSS organizes stacks, stages, libraries, and docs. diff --git a/fern/docs/pages/about/libraries.mdx b/fern/docs/pages/about/libraries.mdx index d7711d9..9cce256 100644 --- a/fern/docs/pages/about/libraries.mdx +++ b/fern/docs/pages/about/libraries.mdx @@ -1,28 +1,56 @@ --- title: Libraries -subtitle: Open source libraries in the NVIDIA-NeMo GitHub organization +subtitle: Open Source Libraries in the NVIDIA-NeMo GitHub Organization slug: about/libraries position: 5 --- import RepoCatalog from "@/components/RepoCatalog"; -Browse **22 open source repositories** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo). Most are **NeMo Framework** libraries for the model lifecycle; [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) is an **integration** product that composes several of them for agents. +Browse **22 open source repositories** in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo). Most are **NeMo Framework** libraries for the model lifecycle; [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) composes several libraries into an agent integration experience. For positioning, refer to [Ecosystem](/about/ecosystem). For pipeline structure, refer to [Architecture](/about/architecture). +## Common Routes + +Use these routes when you know the work you want to do before you know the repo name. + + + + +Curator, Data Designer, Anonymizer, Safe-Synthesizer, and Skills. + + + +AutoModel for PyTorch / Hugging Face workflows, Megatron-Bridge for Megatron-Core scale. + + + +NeMo RL, Gym, and ProRL Agent Server. + + + +Evaluator, Guardrails, Export-Deploy, and NeMo Platform. + + + +Framework container, standalone container, pip/source install, or Platform setup. + + + + -## How repos are grouped +## How Repos Are Grouped -Each card shows a **stage** (org README lifecycle column) and a **kind** (role in the catalog): +Each card shows a **stage** and a **kind**: | Kind | Role | Examples | | --- | --- | --- | | **Library** | Focused product for a stage | Curator, AutoModel, RL, Evaluator | | **Integration** | Composes libraries into one product | NeMo Platform | | **Reference** | Recipes, cookbooks, pipelines | Skills, Nemotron | -| **Infrastructure** | Shared CI or meta repos | FW-CI-templates | +| **Infrastructure** | Shared project infrastructure | FW-CI-templates | | Stage | What lives here | | --- | --- | @@ -34,4 +62,4 @@ Each card shows a **stage** (org README lifecycle column) and a **kind** (role i Use **tags** on each card (or the search box) for cross-cutting facets like `speech`, `evaluation`, or `agents`. -Libraries without a published docs site link to GitHub README or microservice docs. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/). +Some cards link directly to library docs; others link to the best available README or product documentation. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/). diff --git a/fern/docs/pages/about/release-notes/containers.mdx b/fern/docs/pages/about/release-notes/containers.mdx index f416c92..69c68da 100644 --- a/fern/docs/pages/about/release-notes/containers.mdx +++ b/fern/docs/pages/about/release-notes/containers.mdx @@ -1,6 +1,6 @@ --- -title: Container releases -subtitle: NGC container announcements and version metadata +title: Container Releases +subtitle: NGC Container Announcements and Version Metadata slug: about/release-notes/containers position: 2 --- @@ -9,6 +9,6 @@ import ContainerCatalog from "@/components/ContainerCatalog"; NGC containers bundle tested versions of NeMo libraries for training, alignment, evaluation, and deployment. Use the catalog below to compare images, filter by type or lifecycle stage, and pull the right container for your workload. -Refer to the [release notes overview](/about/release-notes) for how this hub fits together, and [known issues](/about/release-notes/known-issues) for cross-component problems on Framework releases. +Refer to the [release notes overview](/about/release-notes) for release links, and [known issues](/about/release-notes/known-issues) for cross-component problems on Framework releases. diff --git a/fern/docs/pages/about/release-notes/index.mdx b/fern/docs/pages/about/release-notes/index.mdx index e180c21..9b5ed48 100644 --- a/fern/docs/pages/about/release-notes/index.mdx +++ b/fern/docs/pages/about/release-notes/index.mdx @@ -1,21 +1,21 @@ --- -title: Release notes -subtitle: Version history and release information for NeMo OSS +title: Release Notes +subtitle: Version History and Release Information for NeMo OSS slug: about/release-notes position: 1 --- Release notes and version history for **NeMo OSS** — open source libraries in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) on GitHub. -This hub publishes **NeMo Framework container** announcements. Per-library release notes and docs live on each project's own site — find them from [Libraries](/about/libraries). +This section publishes **NeMo Framework container** announcements. Library-specific release notes and docs are linked from [Libraries](/about/libraries). - + Latest NGC tags, pull commands, component versions, and release history links. - + Cross-component container issues by release tag. @@ -25,15 +25,17 @@ Search all 22 repos — each card links to docs, GitHub, and release notes. -## Release metadata +## Release Metadata + +Use these links when you need component versions or cross-component issue notes for a Framework container release. - + PyTorch, Megatron-Core, Transformer Engine, and bundled library versions per container — canonical for 26.02+. - + Cross-component container issues by release tag. diff --git a/fern/docs/pages/about/release-notes/known-issues.mdx b/fern/docs/pages/about/release-notes/known-issues.mdx index be7b29d..c766a8b 100644 --- a/fern/docs/pages/about/release-notes/known-issues.mdx +++ b/fern/docs/pages/about/release-notes/known-issues.mdx @@ -1,6 +1,6 @@ --- -title: Known issues -subtitle: Cross-component issues for NeMo Framework containers +title: Known Issues +subtitle: Cross-Component Issues for NeMo Framework Containers slug: about/release-notes/known-issues position: 3 --- @@ -9,11 +9,11 @@ Known issues for **NeMo Framework** NGC containers (`nvcr.io/nvidia/nemo`). Find - + Recent container tags and bundled component versions. - + Pinned package versions for 26.02+ containers. @@ -39,16 +39,24 @@ See component release notes for library-specific known issues: ## 25.09 +The following issues apply to the 25.09 Framework container release. + ### AutoModel +These notes apply to AutoModel workflows in this release. + - Knowledge distillation validation has a known issue. Set `--step_scheduler.val_every_steps=9223372036854775807` to bypass the issue. ### Megatron-Bridge +These notes apply to Megatron-Bridge workflows in this release. + - Pretraining DeepSeek in subchannel FP8 precision is not working. Pretraining DeepSeek with current scaling FP8 is a workaround, but MTP loss does not converge. ## 25.07 +The following issues apply to the 25.07 Framework container release. + - DeepSeek model pretraining has a memory spike at the end of training, after the validation loop and checkpoint saving. The memory spike is linked to the cross-entropy layer. This may lead to an NCCL error at the end of training. - When fine-tuning with CP > 1, you might need to set `calculate_per_token_loss = True` for some cases. It depends on the dataset you choose. Note that this will result in slightly different loss from before, but both will lead to model convergence. - TensorRT-LLM has to be installed in order to run the ONNX export tutorial for LLM embedding models in the Finetuning Llama 3.2 Model into Embedding Model tutorial. Use the [Export-Deploy install instructions](https://github.com/NVIDIA-NeMo/Export-Deploy). @@ -57,6 +65,8 @@ See component release notes for library-specific known issues: ## 25.04.02 and 25.04.01 +The following issues apply to the 25.04.02 and 25.04.01 Framework container releases. + - **Tensor-Parallel Communication Overlap:** Functional errors may occur with specific tensor-parallel communication overlap configurations, including AllGather+GEMM overlap and the ring-exchange algorithm when `aggregate=True`. - **LayerNorm Bias Accuracy:** Training models using LayerNorm with bias (e.g., StarCoder2) might exhibit accuracy issues. A fix is available in TransformerEngine commit 1569. This fix is not yet included in the current NeMo release container. **Workaround:** Manually mount or pip install the latest TransformerEngine version in your container. - **Large Model Checkpoint NaN Errors (T5 11B, StarCoder2 7B):** Loading trained checkpoints for fine-tuning T5 (11B) and StarCoder2 (7B) models may result in NaN values. This is suspected to be a checkpoint saving/loading error. A potential fix is in Megatron Core PR 48cc46f. This fix is currently under testing. @@ -66,6 +76,8 @@ See component release notes for library-specific known issues: ## 25.04.00 +The following issues apply to the 25.04.00 Framework container release. + - Llama 4 accuracy may degrade slightly due to an issue with the order of sigmoid application in the expert routing logic. This has been fixed in [Megatron-LM](https://github.com/NVIDIA/Megatron-LM). However, the fix is not yet included in the current NeMo release container. To apply the fix, manually mount the updated Megatron Core source when building or running your container. - Resuming from local checkpoints using the `get_global_step_from_global_checkpoint_path` utility function may face challenges with auto-inserted metrics in the path. This is fixed in [NeMo#13012](https://github.com/NVIDIA/NeMo/pull/13012). However, the fix is not yet included in the current NeMo release container. - Tensor-parallel communication overlap with AllGather+GEMM overlap and the ring-exchange algorithm with `aggregate=True` may have functional errors. @@ -78,8 +90,12 @@ See component release notes for library-specific known issues: ## 25.02 +The following issues apply to the 25.02 Framework container release. + ### AutoModel +These notes apply to AutoModel workflows in this release. + - Primarily a functional release; performance improvements are planned for future versions. - For large models (e.g., > 40B) trained with FSDP2, checkpoint saving can take longer than expected. - Support for long sequences is currently limited, especially for large models > 30B. @@ -89,9 +105,13 @@ See component release notes for library-specific known issues: - Support for Context Parallelism with sequence packing + padding between sequences is currently broken (see [NeMo#12174](https://github.com/NVIDIA/NeMo/issues/12174)). Use 24.12 or upgrade to TE 2.0+ for working support. - MoE based models are seeing instability with training. Please continue to use 24.12 for MoE training until 25.02 is patched with the fix for MoE. -## 24.12 and earlier +## 24.12 and Earlier + +The following issues apply to 24.12 and earlier Framework container releases. -### Framework and training +### Framework and Training + +These notes apply to framework-level training behavior in older releases. - In 24.12, NeMo switched from `pytorch_lightning` to `lightning.pytorch`. If you have custom code that imports `pytorch_lightning`, replace the import with `lightning.pytorch`. Failing to do so results in `ValueError: Expected a parent`. - When using a 24.12 container or later with LM Evaluation Harness, upgrade LM Evaluation Harness to include the required commit. Otherwise you may see `ValueError: You selected an invalid strategy name...`. @@ -106,7 +126,9 @@ See component release notes for library-specific known issues: - A race condition in the NeMo experiment manager can occur when multiple processes or threads attempt to access and modify shared resources simultaneously. - The Mistral and Mixtral tokenizers require a Hugging Face login. -### Export and deployment +### Export and Deployment + +These notes apply to export and deployment behavior in older releases. - Exporting Gemma, Starcoder, and Falcon 7B models to TRT-LLM only works with a single GPU. If you attempt to export with multiple GPUs, no descriptive error message is shown. - Export Llama70B vLLM causes an out-of-memory issue. @@ -114,7 +136,7 @@ See component release notes for library-specific known issues: - In-framework (PyTorch level) deployment with 8 GPUs is encountering an error. - Query script under `scripts/deploy/nlp/query.py` returns the error `'output_generation_logits'` in the 24.12 container. -### Notebooks and tutorials +### Notebooks and Tutorials The following notebooks had functional issues at the time of the 24.12 release: @@ -129,15 +151,17 @@ The following notebooks had functional issues at the time of the 24.12 release: ### Multimodal +These notes apply to multimodal tutorials and workflows in older releases. + - LITA tutorial: the data preparation part in `tutorials/multimodal/LITA_Tutorial.ipynb` requires you to manually download the YouMakeup dataset instead of using the provided script. - Add `exp_manager.checkpoint_callback_params.save_nemo_on_train_end=True` to the NeVA notebook pretraining procedure to ensure an end-to-end workflow. ### ASR +These notes apply to ASR workflows in older releases. + - Timestamp misalignment occurs in FastConformer ASR models when using the ASR decoder for diarization. Related issue: [NeMo#8438](https://github.com/NVIDIA/NeMo/issues/8438). -## Report a new issue +## Report a New Issue Open an issue or discussion in the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) repository that owns the component — use [Libraries](/about/libraries) to find the right repo and docs site. For problems that span a container release, use [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) and note your container tag (for example `nvcr.io/nvidia/nemo:26.02`). - -When a new container ships, add cross-component known issues to the matching section above. diff --git a/fern/docs/pages/get-started/data.mdx b/fern/docs/pages/get-started/data.mdx index 603fcb7..aeca868 100644 --- a/fern/docs/pages/get-started/data.mdx +++ b/fern/docs/pages/get-started/data.mdx @@ -1,6 +1,6 @@ --- title: Data -subtitle: Get started with data curation and synthetic data libraries +subtitle: Get Started With Data Curation and Synthetic Data Libraries slug: get-started/data position: 10 --- @@ -23,7 +23,9 @@ flowchart LR -## Typical workflow +## Typical Workflow + +Most data workflows move from raw inputs to curated, protected, or synthetic datasets that downstream training and RL libraries can consume. 1. **Curate** raw corpora with [Curator](https://docs.nvidia.com/nemo/curator/latest/) (dedup, filtering, multimodal pipelines). 2. **Generate** synthetic data with [Data Designer](https://nvidia-nemo.github.io/DataDesigner/latest/) or domain SDG tools. diff --git a/fern/docs/pages/get-started/e2e.mdx b/fern/docs/pages/get-started/e2e.mdx index 973fb81..500db4c 100644 --- a/fern/docs/pages/get-started/e2e.mdx +++ b/fern/docs/pages/get-started/e2e.mdx @@ -1,6 +1,6 @@ --- title: E2E -subtitle: Get started with recipes, pipelines, and orchestration +subtitle: Get Started With Recipes, Pipelines, and Orchestration slug: get-started/e2e position: 14 --- @@ -25,17 +25,21 @@ flowchart LR ## Orchestration +Use orchestration libraries when you need repeatable launches across local machines, SLURM, or Kubernetes. + Configure, launch, and manage experiments on local machines, SLURM, and Kubernetes. -## Reference assets +## Reference Assets + +Use reference assets when you want complete recipes, cookbooks, datasets, or example pipelines instead of starting from a single library API. Pipelines for synthetic data generation and evaluation (math, code, science). - + Cookbooks, datasets, and reference examples for Nemotron models. diff --git a/fern/docs/pages/get-started/index.mdx b/fern/docs/pages/get-started/index.mdx index 846ef32..7bf0d5d 100644 --- a/fern/docs/pages/get-started/index.mdx +++ b/fern/docs/pages/get-started/index.mdx @@ -1,14 +1,24 @@ --- title: Get Started -subtitle: Choose your path into NeMo OSS +subtitle: Choose Your Path Into NeMo OSS slug: get-started position: 1 --- -Pick how you want to begin. **NeMo Framework** paths cover the model lifecycle; **NeMo Platform** covers agents. Stages below match the [org README](https://github.com/NVIDIA-NeMo) and [Libraries](/about/libraries) catalog. +Pick how you want to begin. Start with [Task map](/get-started/task-map) if you know the work you want to do, or [Runtime chooser](/get-started/runtime-chooser) if you already know the library and need the right setup path. + +**NeMo Framework** paths cover the model lifecycle; **NeMo Platform** covers agents. Stages below match the [org README](https://github.com/NVIDIA-NeMo) and [Libraries](/about/libraries) catalog. + +Route tasks such as curation, fine-tuning, RL, evaluation, guardrails, and deployment to the right library. + + + +Choose Framework container, standalone image, pip/source install, or Platform setup. + + Quickstart and installation for model training and deployment. @@ -19,7 +29,9 @@ Agent evaluate, secure, tune, and deploy — CLI, SDK, and Studio. -## Install and quickstart +## Install and Quickstart + +Use these pages when you are ready to install a library or run a first example. @@ -33,7 +45,9 @@ pip, NGC containers, scale, and backend choice. -## By lifecycle stage (Framework) +## By Lifecycle Stage (Framework) + +Use stage pages when you know which part of the model lifecycle you are working in. @@ -59,7 +73,9 @@ Reference pipelines, recipes, and experiment orchestration. -## Decision guide +## Decision Guide + +For a fuller routing table, use [Task map](/get-started/task-map). | I want to… | Scale | Start here | | --- | --- | --- | diff --git a/fern/docs/pages/get-started/inference.mdx b/fern/docs/pages/get-started/inference.mdx index 1c9caf4..2aad0fb 100644 --- a/fern/docs/pages/get-started/inference.mdx +++ b/fern/docs/pages/get-started/inference.mdx @@ -1,6 +1,6 @@ --- title: Inference -subtitle: Get started with evaluation, export, and deployment +subtitle: Get Started With Evaluation, Export, and Deployment slug: get-started/inference position: 13 --- @@ -23,7 +23,9 @@ flowchart LR -## Typical workflow (models) +## Typical Workflow (Models) + +Most model inference workflows move from evaluation to export or deployment, with guardrails added where application behavior needs control. 1. **Evaluate** with [Evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) across 100+ harnesses. 2. **Export** to vLLM, TensorRT-LLM, or ONNX with [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/). @@ -31,12 +33,14 @@ flowchart LR Models usually come from [Pretraining](/get-started/pretraining) or [RL](/get-started/rl). For bundled Framework container versions, refer to [Container releases](/about/release-notes/containers). -## Shipping agents +## Shipping Agents + +Agent workflows usually start with Platform when you want an integrated CLI, SDK, and Studio experience. -If you are building **agents** (not just serving a fine-tuned checkpoint), [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) integrates evaluation, guardrails, tuning, and deployment in one setup — CLI, SDK, and Studio UI. You can adopt Evaluator or Guardrails standalone; Platform is the umbrella when you want those loops wired together. +If you are building **agents**, [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) brings evaluation, guardrails, tuning, and deployment into one workflow with a CLI, SDK, and Studio UI. Use Evaluator or Guardrails directly when you want one library; start with Platform when you want those loops wired together. Setup, CLI, and docs — evaluate, secure, and optimize agents with NeMo libraries. -For related guidance, refer to [Framework and Platform](/about/concepts#framework-and-platform) and [Ecosystem](/about/ecosystem#choose-framework-or-platform). +For related guidance, refer to [Framework and Platform](/about/concepts/framework-and-platform) and [Ecosystem](/about/ecosystem#choose-framework-or-platform). diff --git a/fern/docs/pages/get-started/installation.mdx b/fern/docs/pages/get-started/installation.mdx index 5d163b4..5313a47 100644 --- a/fern/docs/pages/get-started/installation.mdx +++ b/fern/docs/pages/get-started/installation.mdx @@ -1,13 +1,15 @@ --- title: Installation -subtitle: pip, containers, and choosing a backend +subtitle: Pip, Containers, and Choosing a Backend slug: get-started/installation position: 3 --- -How to install NeMo OSS libraries and pick a backend for your GPU scale. For a minimal first run, start with [Quickstart](/get-started/quickstart). +How to install NeMo OSS libraries and pick a backend for your GPU scale. For a minimal first run, start with [Quickstart](/get-started/quickstart). For a setup decision before commands, use [Runtime chooser](/get-started/runtime-chooser). -## pip install (recommended for development) +## Pip Install (Recommended for Development) + +Use package installs for local development, notebooks, and lightweight experiments when the library docs recommend them. | Workload | Install | Docs | | --- | --- | --- | @@ -18,13 +20,24 @@ How to install NeMo OSS libraries and pick a backend for your GPU scale. For a m Each library publishes install extras and version pins in its own documentation. Use [Libraries](/about/libraries) to find the repo and docs site for your stage. -## NGC containers (recommended for production stacks) +## NGC Containers (Recommended for Production Stacks) Pre-built images bundle tested dependency sets. Use the [container catalog](/about/release-notes/containers) for current tags and pull commands. The **NeMo Framework** image (`nvcr.io/nvidia/nemo`) is the multi-library training stack. Standalone images exist for AutoModel, RL, and Curator. -## Scale and backends +| Runtime path | Typical fit | +| --- | --- | +| **Framework container** | Cross-library Framework workflows and tested component sets. | +| **Standalone container** | Focused library workflows with dedicated images. | +| **pip or source install** | Local development, notebooks, and contribution workflows. | +| **Platform setup** | Integrated agent workflows with CLI, SDK, and Studio. | + +See [Runtime chooser](/get-started/runtime-chooser) for setup guidance. + +## Scale and Backends + +Use scale and checkpoint needs to choose between the Hugging Face-native path and the Megatron-Core path. | GPUs | Libraries | Checkpoint conversion | Notes | | --- | --- | --- | --- | @@ -57,7 +70,9 @@ Skills, Nemotron recipes, NeMo Run. -## Experiment tracking +## Experiment Tracking + +Use NeMo Run when setup choices need to carry into repeatable experiment launch and tracking. Launch and track experiments on local machines, SLURM, and Kubernetes. diff --git a/fern/docs/pages/get-started/pretraining.mdx b/fern/docs/pages/get-started/pretraining.mdx index 62568d0..ab8e291 100644 --- a/fern/docs/pages/get-started/pretraining.mdx +++ b/fern/docs/pages/get-started/pretraining.mdx @@ -1,6 +1,6 @@ --- title: Pretraining -subtitle: Get started with model training and fine-tuning +subtitle: Get Started With Model Training and Fine-Tuning slug: get-started/pretraining position: 11 --- @@ -23,7 +23,9 @@ flowchart LR -## Choose a path +## Choose a Path + +Choose the pretraining path based on model format, target scale, and whether your workflow is Hugging Face-native or Megatron-based. | Goal | GPUs | Library | | --- | --- | --- | diff --git a/fern/docs/pages/get-started/quickstart.mdx b/fern/docs/pages/get-started/quickstart.mdx index 4c12ea5..9f71a46 100644 --- a/fern/docs/pages/get-started/quickstart.mdx +++ b/fern/docs/pages/get-started/quickstart.mdx @@ -1,33 +1,35 @@ --- title: Quickstart -subtitle: Fast paths to your first result on NVIDIA GPUs +subtitle: Fast Paths to Your First Result on NVIDIA GPUs slug: get-started/quickstart position: 2 --- Minimal steps to validate your setup. For install options and containers, refer to [Installation](/get-started/installation). -## Fine-tune with AutoModel +## Fine-Tune With AutoModel The fastest on-ramp for Hugging Face large language models (LLMs) and vision language models (VLMs) on one or more GPUs. Install and run the current quick start on the AutoModel docs site — model names, scripts, and cluster options change frequently. - + Local workstation and cluster launch options (canonical, kept up to date by the AutoModel team). More pretraining paths (Megatron-Bridge, recipes, scale): [Pretraining](/get-started/pretraining). -## Speech inference +## Speech Inference Use the NeMo Speech docs for install extras, model selection, and the current five-minute inference walkthrough. - + Installation, five-minute inference, model selection, and tutorials. Speech training and full speech-language workflows: [Pretraining](/get-started/pretraining). -## Next steps +## Next Steps + +Use these links when you are ready to move from a first run to setup, routing, or stage-specific docs. @@ -39,7 +41,7 @@ pip and NGC containers, GPU scale, and backend choice. Search all 22 repos by stage or tag. - + Pull tested NGC images for Framework, AutoModel, RL, and Curator. diff --git a/fern/docs/pages/get-started/rl.mdx b/fern/docs/pages/get-started/rl.mdx index a767e6e..8106b04 100644 --- a/fern/docs/pages/get-started/rl.mdx +++ b/fern/docs/pages/get-started/rl.mdx @@ -1,6 +1,6 @@ --- title: RL -subtitle: Get started with alignment and reinforcement learning +subtitle: Get Started With Alignment and Reinforcement Learning slug: get-started/rl position: 12 --- @@ -23,7 +23,9 @@ flowchart LR -## Common entry points +## Common Entry Points + +Use these entry points to choose between post-training algorithms, RL environments, and agent rollout infrastructure. | Technique | Start in docs | Library | | --- | --- | --- | diff --git a/fern/docs/pages/get-started/runtime-chooser.mdx b/fern/docs/pages/get-started/runtime-chooser.mdx new file mode 100644 index 0000000..26ff7b8 --- /dev/null +++ b/fern/docs/pages/get-started/runtime-chooser.mdx @@ -0,0 +1,78 @@ +--- +title: Runtime Chooser +subtitle: Pick the Setup Path Before Running Examples +slug: get-started/runtime-chooser +position: 4 +--- + +Choose a runtime after you know the library or task. Use this page to choose the setup family, then follow the linked library docs for commands. + +| Runtime path | Best for | Check first | +| --- | --- | --- | +| **Framework container** (`nvcr.io/nvidia/nemo:`) | Cross-library Framework workflows, Megatron-Bridge, Evaluator, Export-Deploy, Run, and Speech workflows packaged together | [Container releases](/about/release-notes/containers) | +| **Standalone container** | Focused workflows for libraries with dedicated images, such as AutoModel, RL, or Curator | [Container catalog](/about/release-notes/containers) and the library docs | +| **pip or package install** | Local development, notebooks, small experiments, and integrations with existing Python environments | The linked library install page | +| **Source checkout** | Contributing, debugging, running repo-local examples, or using unreleased code | The GitHub repo and `CONTRIBUTING.md` | +| **Platform setup** | Integrated agent workflows with CLI, SDK, and Studio | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | + +## Runtime Flow + +Start with the task you want to run. If you already know the library, start there instead. The next question is whether the workflow is cross-library, single-library, source-based, or an integrated Platform workflow. + +```mermaid +flowchart TB + Start["Task you want to run"] + Known["Library"] + Agent["Integrated agent workflow?"] + Cross["Cross-library Framework workflow?"] + Source["Need repo files or unreleased code?"] + Dedicated["Dedicated library container available?"] + Platform["Platform setup"] + Framework["Framework container"] + Checkout["Source checkout"] + Standalone["Standalone container"] + Package["pip or package install"] + + Start --> Known + Known --> Agent + Agent -->|Yes| Platform + Agent -->|No| Cross + Cross -->|Yes| Framework + Cross -->|No| Source + Source -->|Yes| Checkout + Source -->|No| Dedicated + Dedicated -->|Yes| Standalone + Dedicated -->|No| Package +``` + +## Practical Checks + +Run through these checks before following commands from a guide or example. + +| Before you run | Why it matters | +| --- | --- | +| Check the linked library docs | Install extras, optional dependencies, and support matrices can vary by library and release. | +| Check container release notes | Framework container component versions and cross-component known issues are listed by release. | +| Check whether the example needs repo files | Some examples assume a source checkout, even when the package can be installed from pip. | +| Check checkpoint format | AutoModel, Megatron-Bridge, RL, Evaluator, and Export-Deploy workflows can depend on Hugging Face, Megatron, or serving-specific formats. | +| Check data and model credentials | Hugging Face tokens, NGC access, S3 credentials, and cluster credentials are often required outside the Python package install. | + +## Related Concepts + +Use these pages when you need deeper context on installs, task routing, or setup commands. + + + + +Understand how runtime paths fit into the NeMo OSS model. + + + +Start from the work you want to do. + + + +See the main install families and backend scale guidance. + + + diff --git a/fern/docs/pages/get-started/task-map.mdx b/fern/docs/pages/get-started/task-map.mdx new file mode 100644 index 0000000..fd65a9b --- /dev/null +++ b/fern/docs/pages/get-started/task-map.mdx @@ -0,0 +1,51 @@ +--- +title: Task Map +subtitle: Start From the Work You Want to Do +slug: get-started/task-map +position: 1 +--- + +Use this map when you know the task but not the repo name. Each row points to a starting library, a runtime path to check first, and the docs with detailed commands. + +| I want to... | Stage | Start with | Runtime path | Next docs | +| --- | --- | --- | --- | --- | +| Curate text, image, video, or audio data | Data | Curator | Curator install or Curator container | [Curator docs](https://docs.nvidia.com/nemo/curator/latest/) | +| Generate synthetic data | Data / E2E | Data Designer or Skills | Library install | [Data Designer docs](https://nvidia-nemo.github.io/DataDesigner/latest/), [Skills docs](https://nvidia-nemo.github.io/Skills/) | +| Protect or anonymize sensitive data | Data | Anonymizer or Safe-Synthesizer | Library docs | [Libraries](/about/libraries) | +| Fine-tune a Hugging Face LLM or VLM | Pretraining | AutoModel | AutoModel container or pip | [AutoModel docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| Train at Megatron scale | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Convert Hugging Face and Megatron checkpoints | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge conversion docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Build speech AI workflows | Pretraining | NeMo Speech | NeMo Speech install | [NeMo Speech docs](https://docs.nvidia.com/nemo/speech/nightly/) | +| Run SFT, DPO, GRPO, distillation, or RL | RL | NeMo RL | NeMo RL container or source checkout | [NeMo RL docs](https://docs.nvidia.com/nemo/rl/latest/) | +| Build RL environments for models or agents | RL | Gym | Gym install | [Gym docs](https://docs.nvidia.com/nemo/gym/latest/index.html) | +| Run multi-turn agent rollouts | RL | ProRL Agent Server | Source checkout | [ProRL Agent Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server#readme) | +| Evaluate a model or agent | Inference | Evaluator | Framework container or library install | [Evaluator docs](https://docs.nvidia.com/nemo/evaluator/latest/) | +| Export or serve a model | Inference | Export-Deploy | Framework container | [Export-Deploy docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | +| Add guardrails to an app or agent | Inference | Guardrails | Guardrails install | [Guardrails docs](https://docs.nvidia.com/nemo/guardrails/latest/) | +| Build integrated agent workflows | Inference | NeMo Platform | Platform setup | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | +| Launch experiments on local, SLURM, or Kubernetes | E2E | NeMo Run | Framework container or library install | [NeMo Run docs](https://docs.nvidia.com/nemo/run/latest/) | +| Follow reference recipes and cookbooks | E2E | Skills or Nemotron | Repo README and recipe docs | [Skills docs](https://nvidia-nemo.github.io/Skills/), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron#readme) | + +## Choosing the Runtime + +After you pick a library, check [Runtime chooser](/get-started/runtime-chooser). Runtime choice is often the difference between a quick local experiment, a containerized Framework workflow, and a source checkout. + +## If the Row Is Close but Not Exact + +Use [Libraries](/about/libraries) to search by stage, kind, and tag. Use [Where to Find Information](/about/concepts/documentation-surfaces) for setup, releases, examples, terminology, and support links. + + + + +Pick Framework container, standalone container, pip/source install, or Platform setup. + + + +Search all catalog repos by stage, kind, and tag. + + + +Find third-party writeups and videos for additional perspective. + + + diff --git a/fern/docs/pages/index.mdx b/fern/docs/pages/index.mdx index 7ab8511..8bb311a 100644 --- a/fern/docs/pages/index.mdx +++ b/fern/docs/pages/index.mdx @@ -1,26 +1,28 @@ --- title: NeMo OSS -subtitle: Open source libraries from the NVIDIA-NeMo GitHub organization +subtitle: Open Source Libraries From the NVIDIA-NeMo GitHub Organization slug: "" --- -**NeMo OSS** is the hub for NVIDIA's public, open source NeMo work — the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization and **docs.nvidia.com/nemo**. It is a discovery layer, not a single product: focused libraries you can adopt individually, plus two named stacks: +**NeMo OSS** brings NVIDIA's open source NeMo work together in one place: the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization and **docs.nvidia.com/nemo**. Use these pages to find the right library, stack, or container for your workflow: -- **NeMo Framework** — model lifecycle (data → train → align → evaluate → deploy) -- **NeMo Platform** — agent integration (evaluate, secure, tune, deploy agents) +- **NeMo Framework** — build and run the model lifecycle: data, training, alignment, evaluation, and deployment. +- **NeMo Platform** — integrate agent workflows: evaluate, secure, tune, and deploy agents. -These projects are part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite, which also includes commercial products beyond this catalog. +These open source projects are part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. -## Choose your stack +## Start From Your Task + +Use these entry points when you know the job you want to do and need the right repo, runtime, or catalog view. - -Train and deploy models — quickstart, install, and guides by lifecycle stage. + +Find the right repo from the work you want to do: curate data, train, align, evaluate, deploy, or build agents. - -Ship agents — CLI, SDK, and Studio UI over NeMo libraries. + +Pick Framework container, standalone container, pip/source install, or Platform setup. @@ -29,16 +31,34 @@ Browse all 22 repos — search by stage, kind, or tag. +## Choose Your Stack + +Use these cards when you already know whether you are building model workflows or agent workflows. + + + + +Train and deploy models — quickstart, install, and guides by lifecycle stage. + + + +Ship agents — CLI, SDK, and Studio UI over NeMo libraries. + + + + ## Understand NeMo OSS +Use these pages when you need the mental model, ecosystem position, or architecture before choosing a library. + -Glossary — Framework, Platform, stages, homonyms. +How Framework, Platform, libraries, and containers fit together. -Where NeMo OSS fits in NVIDIA NeMo and what to choose. +How NeMo OSS fits in NVIDIA NeMo and which path to choose. @@ -47,15 +67,17 @@ Three layers, pipeline diagram, backends, and NGC containers. -## Releases and community +## Releases and Community + +Use these pages to check releases, known issues, and community entry points. - + Framework NGC tags, pull commands, and component versions. - + Cross-component Framework container issues by tag. diff --git a/fern/docs/pages/resources/community.mdx b/fern/docs/pages/resources/community.mdx index 63e754b..d879c41 100644 --- a/fern/docs/pages/resources/community.mdx +++ b/fern/docs/pages/resources/community.mdx @@ -1,10 +1,12 @@ --- title: Community -subtitle: Discuss, contribute, and stay up to date +subtitle: Discuss, Contribute, and Stay Up to Date slug: resources/community --- -## Get involved +## Get Involved + +Use these links to find discussions, repositories, releases, and contribution paths. @@ -12,19 +14,19 @@ slug: resources/community Questions, ideas, and announcements across the org. - + Browse and star projects in the NVIDIA-NeMo organization. - + Latest NGC container tags, component versions, and known issues. - + Cross-component container issues by release tag. - + Container releases and known issues for NeMo Framework. @@ -34,28 +36,40 @@ Each repository includes its own `CONTRIBUTING.md` and issue templates. Open iss ## NeMo Assist +Use NeMo Assist when you want a guided answer across documentation and code. + Chat with NeMo documentation and code — try NeMo Assist for guided answers across the open source libraries. -## Recent highlights +## Recent Highlights + +Use these links to catch up on recent discussions and announcements from selected libraries. ### AutoModel +These AutoModel highlights point to recent training and Hugging Face workflow discussions. + - [Enabling PyTorch native pipeline parallelism for HF models](https://github.com/NVIDIA-NeMo/Automodel/discussions/589) - [Day-0 Hugging Face support](https://github.com/NVIDIA-NeMo/Automodel/discussions/477) - [Gemma 3n multimodal fine-tuning](https://github.com/NVIDIA-NeMo/Automodel/discussions/494) ### NeMo RL +These NeMo RL highlights point to recent post-training and optimization discussions. + - [On-policy distillation](https://github.com/NVIDIA-NeMo/RL/discussions/1445) - [FP8 quantization](https://github.com/NVIDIA-NeMo/RL/discussions/1216) - [10× MoE weight transfer](https://github.com/NVIDIA-NeMo/RL/discussions/1189) ### NeMo Speech +These NeMo Speech highlights point to recent speech fine-tuning discussions. + - [Fine-tune NeMo models with Granary data](https://github.com/NVIDIA-NeMo/NeMo/discussions/14758) ## License +This section summarizes the top-level license expectation for the OSS repos. + Apache 2.0. Third-party attributions are documented in each repository. diff --git a/fern/docs/pages/resources/external-learning.mdx b/fern/docs/pages/resources/external-learning.mdx new file mode 100644 index 0000000..dd34a9c --- /dev/null +++ b/fern/docs/pages/resources/external-learning.mdx @@ -0,0 +1,95 @@ +--- +title: External Learning +subtitle: Community Writeups, Videos, and Partner Examples +slug: resources/external-learning +--- + +These third-party resources can help you see how developers approach NeMo OSS in the wild. Treat commands and version pins as examples, and verify current setup steps in the linked library docs. + +## Guardrails + +These resources show how external authors frame Guardrails for safety, RAG, and application control. + + + + +Practical walkthrough of safety rails, deterministic dialogue, RAG, and tool use. + + + +Cloud integration example for adding Guardrails around LLM applications. + + + +Short video listing for a hands-on Guardrails demo. + + + + +Common themes: Colang, rail types, LangChain/RAG integration, provider setup, latency and cost, and how Guardrails compares with model-based classifiers such as Llama Guard. + +## Data Curation + +These resources show how external authors approach Curator pipelines, modality setup, and training-data preparation. + + + + +Walkthrough of a practical text curation flow with NeMo Curator. + + + +Task-oriented view of web-data curation, deduplication, and training-ready shards. + + + +Generated map of Curator tutorial structure and source-linked examples. + + + + +Common themes: modality-specific setup, Ray execution, GPU and FFmpeg prerequisites, cluster scale, and the move from raw data to training-ready outputs. + +## Training, RL, and Checkpoints + +These resources show how external authors approach post-training, backend choice, rollout infrastructure, and checkpoint flow. + + + + +Beginner-oriented framing of SFT, DPO, GRPO, and RL for LLMs. + + + +Cloud launch example for NeMo RL training jobs. + + + +Large-scale example using Megatron-Bridge and verl for LoRA RL. + + + +Example of using NeMo Gym environments with TRL. + + + + +Common themes: backend choice, rollout and training separation, Hugging Face tokens, checkpoint formats, and multi-node orchestration. + +## Hugging Face Ecosystem + +These resources show how AutoModel appears in Hugging Face-native workflows. + + + + +Shows AutoModel as a Hugging Face-compatible training path for LLMs and VLMs. + + + +Shows AutoModel for diffusion fine-tuning with Diffusers-format models. + + + + +Common themes: Hugging Face-native workflows, no checkpoint conversion for AutoModel paths, model coverage, data preprocessing, and distributed launch configuration. diff --git a/fern/docs/pages/resources/glossary.mdx b/fern/docs/pages/resources/glossary.mdx new file mode 100644 index 0000000..d12fa6c --- /dev/null +++ b/fern/docs/pages/resources/glossary.mdx @@ -0,0 +1,203 @@ +--- +title: Glossary +subtitle: Terms Used Across NeMo OSS Documentation +slug: resources/glossary +--- + +Use this glossary for quick lookup of NeMo OSS terminology. For relationship-level explanations, use [Concepts](/about/concepts). + +## A + +Terms in this section begin with A. + +AutoModel +: PyTorch distributed training library for LLMs and VLMs with Hugging Face-native workflows. Use it when you want a PyTorch / Hugging Face training path. + +## C + +Terms in this section begin with C. + +Catalog +: The searchable library inventory on [Libraries](/about/libraries), organized by stage, kind, and tags. + +Checkpoint conversion +: The process of moving model weights between formats, such as Hugging Face and Megatron. Megatron-Bridge provides the primary Hugging Face to Megatron conversion path. + +Colang +: The modeling language used by NeMo Guardrails to define conversational flows, rails, and actions. + +Container +: An NGC image that packages a tested runtime. NeMo OSS uses both the multi-library Framework container and standalone containers for selected libraries. + +## D + +Terms in this section begin with D. + +Data +: The lifecycle stage for curating, filtering, synthesizing, anonymizing, or governing datasets before training or evaluation. + +Dask +: A distributed Python computing framework. Older Curator workflows used Dask; current Curator docs describe the Ray-based pipeline architecture. + +## E + +Terms in this section begin with E. + +E2E +: End-to-end. The lifecycle stage for recipes, orchestration, experiment launch, and reference assets that span multiple stages. + +Evaluator +: Library for scalable, reproducible model and agent evaluation across benchmarks and harnesses. + +Export-Deploy +: Library for exporting and deploying NeMo and Hugging Face models to serving stacks such as TensorRT-LLM, vLLM, and ONNX-based paths. + +## F + +Terms in this section begin with F. + +Framework container +: The multi-library NGC image `nvcr.io/nvidia/nemo:`. It packages selected Framework components for tested workflows. + +Framework libraries +: Libraries that participate in the NeMo Framework model lifecycle from data through deployment. + +## G + +Terms in this section begin with G. + +GRPO +: Group Relative Policy Optimization, a reinforcement learning method used in post-training workflows. + +Guardrails +: Library for programmable guardrails for LLM-based conversational systems. + +Gym +: Library for RL environments and benchmarks used to evaluate and improve models and agents. + +## H + +Terms in this section begin with H. + +Hugging Face format +: Model artifacts that follow Hugging Face `transformers` or related ecosystem conventions, such as configs, tokenizers, and safetensors weights. AutoModel uses Hugging Face-native paths; Megatron-Bridge converts between Hugging Face and Megatron formats. + +## I + +Terms in this section begin with I. + +Inference +: The lifecycle stage for evaluation, export, serving, deployment, and guardrails. + +Integration +: A repo kind for products that compose several libraries behind one product surface. NeMo Platform is the primary example. + +## K + +Terms in this section begin with K. + +Kind +: The role a repo plays in the catalog: library, integration, reference, or infrastructure. + +## L + +Terms in this section begin with L. + +Library +: A focused open source repo with its own source, docs, releases, install paths, and issue tracker. + +Lifecycle stage +: The primary workflow stage assigned to a repo: Data, Pretraining, RL, Inference, or E2E. + +## M + +Terms in this section begin with M. + +Megatron-Bridge +: Megatron-Core training library with bidirectional Hugging Face checkpoint conversion. + +Megatron format +: Model artifacts organized for Megatron-Core training and parallelism. Use Megatron-Bridge when a workflow needs to move between Hugging Face and Megatron formats. + +## N + +Terms in this section begin with N. + +NeMo Framework +: The model-lifecycle stack for data, training, RL, evaluation, export, and deployment. In container contexts, the name can also refer to the `nvcr.io/nvidia/nemo` Framework container. + +NeMo OSS +: The public open source side of NVIDIA NeMo: the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization, documentation, and related OSS containers. + +NeMo Platform +: Agent integration product with CLI, SDK, and Studio for evaluating, securing, tuning, and deploying agents using selected NeMo libraries. + +NeMo Speech +: Speech AI documentation and source in the [NVIDIA-NeMo/NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo. + +NGC +: NVIDIA GPU Cloud, the catalog where NVIDIA publishes container images and model assets. + +## P + +Terms in this section begin with P. + +Pretraining +: The lifecycle stage for training, fine-tuning, adapting models, and converting training formats. + +## R + +Terms in this section begin with R. + +RAG +: Retrieval-augmented generation. A pattern where an application retrieves relevant context before generating an answer. + +Rail +: A programmable Guardrails behavior that can guide, block, transform, retrieve, or execute logic around an LLM interaction. + +Ray +: A distributed execution framework used by current Curator and RL workflows for scaling Python workloads. + +Reference +: A repo kind for recipes, cookbooks, datasets, and example pipelines. Skills and Nemotron are examples. + +Repo +: A GitHub repository in the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) organization. + +Reward model +: A model or scoring function used to evaluate candidate outputs during alignment or reinforcement learning workflows. + +RL +: Reinforcement learning. In NeMo OSS, the RL stage includes post-training workflows such as SFT, DPO, GRPO, distillation, and agent improvement. + +Rollout +: Model-generated trajectories or interactions used by RL training workflows, especially for agent or environment-based training. + +Runtime path +: The setup family used to run a workflow: Framework container, standalone container, pip/source install, source checkout, or Platform setup. + +## S + +Terms in this section begin with S. + +Stage +: The primary lifecycle bucket used by the library catalog. + +Standalone container +: A library-specific NGC image, such as AutoModel, RL, or Curator, used for workflows that fit a focused runtime. + +## T + +Terms in this section begin with T. + +Tags +: Catalog search facets for modality, technique, or role, such as `speech`, `agents`, `evaluation`, or `deployment`. + +## Related Documentation + +Use these pages when a term needs more context than a glossary definition. + +- [Concepts](/about/concepts) +- [Ecosystem](/about/ecosystem) +- [Architecture](/about/architecture) +- [Libraries](/about/libraries) diff --git a/fern/docs/pages/resources/index.mdx b/fern/docs/pages/resources/index.mdx new file mode 100644 index 0000000..383a851 --- /dev/null +++ b/fern/docs/pages/resources/index.mdx @@ -0,0 +1,31 @@ +--- +title: Resources +subtitle: Reference Links, Community Paths, and Terminology +slug: resources +--- + +Use Resources when you need lookup material or a next step outside the conceptual flow. + + + + +Lookup definitions for NeMo OSS terms, acronyms, and product names. + + + +Third-party blogs, videos, partner examples, and recurring themes. + + + +GitHub discussions, repositories, release links, and contribution entry points. + + + +Search the catalog of NVIDIA-NeMo repositories. + + + +Framework container releases, component versions, and known issues. + + + diff --git a/profile/README.md b/profile/README.md index a22df82..1a031aa 100644 --- a/profile/README.md +++ b/profile/README.md @@ -5,27 +5,45 @@ SPDX-License-Identifier: Apache-2.0 # NeMo OSS -**Train Llama 3.3 · Qwen 2.5 · Mistral · DeepSeek · Gemma · Nemotron on NVIDIA GPUs** +Open source NVIDIA NeMo libraries for building models and agents on NVIDIA GPUs: data curation, training, alignment, evaluation, deployment, guardrails, and end-to-end recipes. -Open source GPU libraries for data, training, alignment, evaluation, deployment, and agents. Scale from one GPU to 10,000+ nodes with Hugging Face or Megatron backends. Part of the [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite — this org is the public GitHub home for **NeMo OSS** (Framework libraries + [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) for agents). +NeMo OSS is part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Use this GitHub organization for source code and issues; use the [NeMo OSS docs](https://docs.nvidia.com/nemo) to choose the right library, runtime, or workflow. -## Libraries by stage +## Start Here -| Data | Pretraining | RL | Inference | E2E | -| --- | --- | --- | --- | --- | -| [Curator](https://github.com/NVIDIA-NeMo/Curator)
[Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer)
[Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner)
[Safe Synthesizer](https://github.com/NVIDIA-NeMo/Safe-Synthesizer)
[SDG-PGMs](https://github.com/NVIDIA-NeMo/SDG-PGMs) | [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge)
[AutoModel](https://github.com/NVIDIA-NeMo/Automodel)
[Speech](https://github.com/NVIDIA-NeMo/NeMo)
[Emerging Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | [RL](https://github.com/NVIDIA-NeMo/RL)
[Gym](https://github.com/NVIDIA-NeMo/Gym)
[ProRL-Agent-Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server) | [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails)
[Evaluator](https://github.com/NVIDIA-NeMo/Evaluator)
[Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy)
[NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) | [Skills](https://github.com/NVIDIA-NeMo/Skills)
[Nemotron](https://github.com/NVIDIA-NeMo/Nemotron)
[Run](https://github.com/NVIDIA-NeMo/Run) | +| If you want to... | Go to | +| --- | --- | +| Find the right repo for a task | [Task Map](https://docs.nvidia.com/nemo/get-started/task-map) | +| Choose between containers, pip, source, or Platform setup | [Runtime Chooser](https://docs.nvidia.com/nemo/get-started/runtime-chooser) | +| Browse all NeMo OSS libraries | [Library Catalog](https://docs.nvidia.com/nemo/about/libraries) | +| Understand Framework, Platform, stages, and containers | [Concepts](https://docs.nvidia.com/nemo/about/concepts) | +| Check Framework container releases and known issues | [Release Notes](https://docs.nvidia.com/nemo/about/release-notes) | +| Ask questions or follow community updates | [Community](https://docs.nvidia.com/nemo/resources/community) | -**[docs.nvidia.com/nemo](https://docs.nvidia.com/nemo)** — NeMo OSS hub: decision guide, all libraries, recipes, and community links. +## Choose by Workflow -```bash -pip install nemo-automodel -``` +| Workflow | Use for | Starting Points | +| --- | --- | --- | +| **Data** | Curate, synthesize, anonymize, and prepare datasets | [Curator](https://github.com/NVIDIA-NeMo/Curator), [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner), [Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer) | +| **Pretraining** | Train, fine-tune, adapt, and convert model checkpoints | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel), [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | +| **RL** | Run SFT, DPO, GRPO, RL, environments, and rollouts | [NeMo RL](https://github.com/NVIDIA-NeMo/RL), [Gym](https://github.com/NVIDIA-NeMo/Gym), [ProRL Agent Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server) | +| **Inference** | Evaluate, export, serve, and add guardrails | [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator), [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy), [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | +| **Agents** | Evaluate, secure, tune, and deploy agent workflows | [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) | +| **E2E** | Launch experiments, follow recipes, and use reference assets | [Run](https://github.com/NVIDIA-NeMo/Run), [Skills](https://github.com/NVIDIA-NeMo/Skills), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | + +## NeMo Framework and NeMo Platform + +- **NeMo Framework** is the model-lifecycle stack: data, training, RL, evaluation, export, and deployment libraries. +- **NeMo Platform** is the agent workflow entry point: CLI, SDK, and Studio for evaluating, securing, tuning, and deploying agents. + +For naming, runtime, and workflow guidance, see [Concepts](https://docs.nvidia.com/nemo/about/concepts) and [Get Started](https://docs.nvidia.com/nemo/get-started). ## Community - [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) -- [All repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) +- [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) +- [External Learning](https://docs.nvidia.com/nemo/resources/external-learning) ## License -Apache 2.0. Third-party attributions in each repository. +Apache 2.0. Third-party attributions are documented in each repository. From 65665251df8bc75929aa3b158d1c6813c6284cc0 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 14:57:22 -0400 Subject: [PATCH 11/18] Apply NVIDIA style to profile README --- profile/README.md | 34 ++++++++++++++++++++++------------ 1 file changed, 22 insertions(+), 12 deletions(-) diff --git a/profile/README.md b/profile/README.md index 1a031aa..fb90c66 100644 --- a/profile/README.md +++ b/profile/README.md @@ -5,33 +5,39 @@ SPDX-License-Identifier: Apache-2.0 # NeMo OSS -Open source NVIDIA NeMo libraries for building models and agents on NVIDIA GPUs: data curation, training, alignment, evaluation, deployment, guardrails, and end-to-end recipes. +Build generative AI models and agents on NVIDIA GPUs with open source NVIDIA NeMo libraries for data curation, training, alignment, evaluation, deployment, guardrails, and end-to-end recipes. -NeMo OSS is part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Use this GitHub organization for source code and issues; use the [NeMo OSS docs](https://docs.nvidia.com/nemo) to choose the right library, runtime, or workflow. +NeMo OSS is part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Use this GitHub organization for source code and issues, and use the [NeMo OSS documentation](https://docs.nvidia.com/nemo) to choose the right library, workflow, or runtime path. -## Start Here +## Get Started -| If you want to... | Go to | +Use these links to choose where to start and move into the full documentation set. + +| Goal | Resource | | --- | --- | -| Find the right repo for a task | [Task Map](https://docs.nvidia.com/nemo/get-started/task-map) | -| Choose between containers, pip, source, or Platform setup | [Runtime Chooser](https://docs.nvidia.com/nemo/get-started/runtime-chooser) | -| Browse all NeMo OSS libraries | [Library Catalog](https://docs.nvidia.com/nemo/about/libraries) | -| Understand Framework, Platform, stages, and containers | [Concepts](https://docs.nvidia.com/nemo/about/concepts) | +| Find the best starting repository for a task | [Task Map](https://docs.nvidia.com/nemo/get-started/task-map) | +| Choose a runtime path: container, pip, source checkout, or Platform setup | [Runtime Chooser](https://docs.nvidia.com/nemo/get-started/runtime-chooser) | +| Browse NeMo OSS libraries | [Library Catalog](https://docs.nvidia.com/nemo/about/libraries) | +| Learn how Framework, Platform, stages, and containers fit together | [Concepts](https://docs.nvidia.com/nemo/about/concepts) | | Check Framework container releases and known issues | [Release Notes](https://docs.nvidia.com/nemo/about/release-notes) | -| Ask questions or follow community updates | [Community](https://docs.nvidia.com/nemo/resources/community) | +| Ask questions and follow community updates | [Community](https://docs.nvidia.com/nemo/resources/community) | ## Choose by Workflow -| Workflow | Use for | Starting Points | +Use this table to move from a workflow area to a starting repository. + +| Workflow | Use For | Start With | | --- | --- | --- | | **Data** | Curate, synthesize, anonymize, and prepare datasets | [Curator](https://github.com/NVIDIA-NeMo/Curator), [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner), [Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer) | | **Pretraining** | Train, fine-tune, adapt, and convert model checkpoints | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel), [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | | **RL** | Run SFT, DPO, GRPO, RL, environments, and rollouts | [NeMo RL](https://github.com/NVIDIA-NeMo/RL), [Gym](https://github.com/NVIDIA-NeMo/Gym), [ProRL Agent Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server) | | **Inference** | Evaluate, export, serve, and add guardrails | [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator), [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy), [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | | **Agents** | Evaluate, secure, tune, and deploy agent workflows | [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) | -| **E2E** | Launch experiments, follow recipes, and use reference assets | [Run](https://github.com/NVIDIA-NeMo/Run), [Skills](https://github.com/NVIDIA-NeMo/Skills), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | +| **End-to-End** | Launch experiments, follow recipes, and use reference assets | [Run](https://github.com/NVIDIA-NeMo/Run), [Skills](https://github.com/NVIDIA-NeMo/Skills), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | + +## Choose NeMo Framework or NeMo Platform -## NeMo Framework and NeMo Platform +Use NeMo Framework for model lifecycle work, and use NeMo Platform for integrated agent workflows. - **NeMo Framework** is the model-lifecycle stack: data, training, RL, evaluation, export, and deployment libraries. - **NeMo Platform** is the agent workflow entry point: CLI, SDK, and Studio for evaluating, securing, tuning, and deploying agents. @@ -40,10 +46,14 @@ For naming, runtime, and workflow guidance, see [Concepts](https://docs.nvidia.c ## Community +Use these links to ask questions, browse repositories, and find community learning resources. + - [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) - [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) - [External Learning](https://docs.nvidia.com/nemo/resources/external-learning) ## License +Review the license terms before using or contributing to NeMo OSS repositories. + Apache 2.0. Third-party attributions are documented in each repository. From 866626a323a2165ad09ddbcce51f6e2e2550f449 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:00:46 -0400 Subject: [PATCH 12/18] Update NeMo OSS documentation URLs --- README.md | 4 +- fern/README.md | 2 +- fern/TAXONOMY.md | 4 +- fern/components/containers.ts | 10 ++--- fern/components/repos.ts | 24 +++++------ fern/docs.yml | 2 +- fern/docs/pages/about/concepts/index.mdx | 2 +- .../training-backends-and-checkpoints.mdx | 8 ++-- fern/docs/pages/about/ecosystem.mdx | 4 +- fern/docs/pages/about/libraries.mdx | 2 +- fern/docs/pages/about/release-notes/index.mdx | 2 +- .../about/release-notes/known-issues.mdx | 20 +++++----- fern/docs/pages/get-started/data.mdx | 2 +- fern/docs/pages/get-started/e2e.mdx | 2 +- fern/docs/pages/get-started/inference.mdx | 6 +-- fern/docs/pages/get-started/installation.mdx | 10 ++--- fern/docs/pages/get-started/pretraining.mdx | 8 ++-- fern/docs/pages/get-started/quickstart.mdx | 4 +- fern/docs/pages/get-started/rl.mdx | 4 +- fern/docs/pages/get-started/task-map.mdx | 22 +++++----- fern/docs/pages/index.mdx | 2 +- nemo-fw-presentation-outline.md | 40 +++++++++---------- nemo-fw-product-walkthrough.md | 34 ++++++++-------- profile/README.md | 18 ++++----- 24 files changed, 118 insertions(+), 118 deletions(-) diff --git a/README.md b/README.md index 14bc4f5..73496db 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # NVIDIA-NeMo/.github -GitHub organization profile and NeMo OSS hub documentation (`docs.nvidia.com/nemo`). +GitHub organization profile and NeMo OSS hub documentation (`docs.nvidia.com/nemo-oss`). - **Org profile** — `profile/README.md` (shown on [github.com/NVIDIA-NeMo](https://github.com/NVIDIA-NeMo)) -- **Hub docs (Fern)** — `fern/` → [docs.nvidia.com/nemo](https://docs.nvidia.com/nemo) when published +- **Hub docs (Fern)** — `fern/` → [docs.nvidia.com/nemo-oss](https://docs.nvidia.com/nemo-oss) when published See [fern/README.md](fern/README.md) for local preview and publish steps. diff --git a/fern/README.md b/fern/README.md index 539abea..135b577 100644 --- a/fern/README.md +++ b/fern/README.md @@ -31,7 +31,7 @@ Per-library docs own commands, APIs, tutorials, model support, and version-speci Published targets are configured in [docs.yml](./docs.yml): - Preview: `nemo-framework.docs.buildwithfern.com/nemo` -- Production: `docs.nvidia.com/nemo` +- Production: `docs.nvidia.com/nemo-oss` ## Site Shape diff --git a/fern/TAXONOMY.md b/fern/TAXONOMY.md index 3947fe7..8a488e3 100644 --- a/fern/TAXONOMY.md +++ b/fern/TAXONOMY.md @@ -1,6 +1,6 @@ # NeMo OSS taxonomy -Canonical vocabulary for the Fern hub (`docs.nvidia.com/nemo`), org README, and `components/repos.ts`. When copy disagrees, this file wins. +Canonical vocabulary for the Fern hub (`docs.nvidia.com/nemo-oss`), org README, and `components/repos.ts`. When copy disagrees, this file wins. ## Top-level map @@ -17,7 +17,7 @@ NVIDIA NeMo (commercial suite — OSS + microservices + NIM + services) | Term | Meaning | | --- | --- | | **NVIDIA NeMo** | Full software suite spanning open source libraries, commercial products, NIM, microservices, and services. | -| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and documentation on **docs.nvidia.com/nemo**. Entry point for choosing a stack, stage, library, or container. | +| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and documentation on **docs.nvidia.com/nemo-oss**. Entry point for choosing a stack, stage, library, or container. | | **NeMo Framework** | Named **model-lifecycle** stack: composable libraries from data through deployment, each with its own source and docs. | | **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:`. Bundles Megatron-Bridge, Evaluator, Export-Deploy, Run, and NeMo Speech. | | **NeMo Platform** | [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — CLI, SDK, and Studio for **agent** evaluate / secure / tune / deploy. Composes libraries into an agent integration experience. | diff --git a/fern/components/containers.ts b/fern/components/containers.ts index 998015d..212938a 100644 --- a/fern/components/containers.ts +++ b/fern/components/containers.ts @@ -66,7 +66,7 @@ export const FRAMEWORK_RECENT_RELEASES: FrameworkRelease[] = [ ]; export const SOFTWARE_VERSIONS_URL = - "https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html"; + "https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/releases/software-versions.html"; export const NGC_NEMO_TEAM_URL = "https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/containers"; @@ -82,7 +82,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ kind: "multi-library", stages: ["pretraining", "rl", "inference", "e2e"], latestTag: "26.02", - docsUrl: "https://docs.nvidia.com/nemo/megatron-bridge/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/", bundledLibraries: ["Megatron-Bridge", "Evaluator", "Export-Deploy", "Run", "NeMo Speech"], tags: ["llm", "vlm", "speech", "megatron"], }, @@ -93,7 +93,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ description: "PyTorch-native distributed training for LLMs and VLMs with Hugging Face day-0 support.", kind: "standalone", stages: ["pretraining"], - docsUrl: "https://docs.nvidia.com/nemo/automodel/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/automodel/latest/", tags: ["llm", "vlm", "huggingface", "pytorch"], }, { @@ -103,7 +103,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ description: "Alignment and reinforcement learning — SFT, DPO, GRPO, and distillation.", kind: "standalone", stages: ["rl"], - docsUrl: "https://docs.nvidia.com/nemo/rl/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/rl/latest/", tags: ["dpo", "grpo", "alignment"], }, { @@ -113,7 +113,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ description: "Data preprocessing and curation for text, image, video, and audio at scale.", kind: "standalone", stages: ["data"], - docsUrl: "https://docs.nvidia.com/nemo/curator/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/curator/latest/", tags: ["curation", "multimodal"], }, ]; diff --git a/fern/components/repos.ts b/fern/components/repos.ts index 0f2b9ff..8833453 100644 --- a/fern/components/repos.ts +++ b/fern/components/repos.ts @@ -14,7 +14,7 @@ */ /** NeMo Speech docs — use /latest/ when published; /nightly/ is current. */ -export const NEMO_SPEECH_DOCS_URL = "https://docs.nvidia.com/nemo/speech/nightly/"; +export const NEMO_SPEECH_DOCS_URL = "https://docs.nvidia.com/nemo-oss/speech/nightly/"; /** Lifecycle stage — matches profile/README.md "Libraries by stage" columns. */ export type RepoStage = "data" | "pretraining" | "rl" | "inference" | "e2e"; @@ -66,7 +66,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "data", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Curator", - docsUrl: "https://docs.nvidia.com/nemo/curator/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/curator/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator", tags: ["multimodal", "curation"], }, @@ -102,7 +102,7 @@ export const NEMO_REPOS: NemoRepo[] = [ kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Safe-Synthesizer", docsUrl: - "https://docs.nvidia.com/nemo/microservices/latest/generate-private-synthetic-data/", + "https://docs.nvidia.com/nemo-oss/microservices/latest/generate-private-synthetic-data/", tags: ["privacy", "tabular"], }, { @@ -120,7 +120,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "pretraining", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Automodel", - docsUrl: "https://docs.nvidia.com/nemo/automodel/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/automodel/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel", tags: ["llm", "vlm", "huggingface", "pytorch"], }, @@ -130,7 +130,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "pretraining", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Megatron-Bridge", - docsUrl: "https://docs.nvidia.com/nemo/megatron-bridge/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["llm", "vlm", "megatron"], }, @@ -149,7 +149,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "pretraining", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Emerging-Optimizers", - docsUrl: "https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html", + docsUrl: "https://docs.nvidia.com/nemo-oss/emerging-optimizers/latest/index.html", tags: ["optimizers"], }, { @@ -169,7 +169,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "rl", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/RL", - docsUrl: "https://docs.nvidia.com/nemo/rl/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/rl/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl", tags: ["dpo", "grpo", "alignment", "agents"], }, @@ -179,7 +179,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "rl", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Gym", - docsUrl: "https://docs.nvidia.com/nemo/gym/latest/index.html", + docsUrl: "https://docs.nvidia.com/nemo-oss/gym/latest/index.html", tags: ["environments", "agents"], }, { @@ -198,7 +198,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "inference", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Evaluator", - docsUrl: "https://docs.nvidia.com/nemo/evaluator/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/evaluator/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["evaluation", "benchmarks"], }, @@ -208,7 +208,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "inference", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Export-Deploy", - docsUrl: "https://docs.nvidia.com/nemo/export-deploy/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/export-deploy/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["deployment", "serving", "vllm"], }, @@ -218,7 +218,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "inference", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Guardrails", - docsUrl: "https://docs.nvidia.com/nemo/guardrails/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/guardrails/latest/", tags: ["safety", "agents"], }, { @@ -256,7 +256,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "e2e", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Run", - docsUrl: "https://docs.nvidia.com/nemo/run/latest/", + docsUrl: "https://docs.nvidia.com/nemo-oss/run/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["orchestration", "experiments"], }, diff --git a/fern/docs.yml b/fern/docs.yml index f292015..0dfa64d 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -2,7 +2,7 @@ instances: - url: nemo-framework.docs.buildwithfern.com/nemo - custom-domain: docs.nvidia.com/nemo + custom-domain: docs.nvidia.com/nemo-oss title: NeMo OSS diff --git a/fern/docs/pages/about/concepts/index.mdx b/fern/docs/pages/about/concepts/index.mdx index b2e19a0..c0fa1ef 100644 --- a/fern/docs/pages/about/concepts/index.mdx +++ b/fern/docs/pages/about/concepts/index.mdx @@ -11,7 +11,7 @@ For short term definitions and acronyms, use the [Glossary](/resources/glossary) ## Concept Map -NeMo OSS is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), documentation on **docs.nvidia.com/nemo**, and the NGC containers that package common runtime paths. +NeMo OSS is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), documentation on **docs.nvidia.com/nemo-oss**, and the NGC containers that package common runtime paths. Use this site to choose a stack, stage, library, or container, then follow the linked library docs for usage details. diff --git a/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx b/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx index 6a6a72a..32b89fb 100644 --- a/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx +++ b/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx @@ -13,8 +13,8 @@ Choose the training path based on scale, source model format, and downstream che | Path | Start with | Typical fit | | --- | --- | --- | -| **PyTorch / Hugging Face** | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | Fine-tuning, research iteration, and training up to roughly 1,000 GPUs. | -| **Megatron-Core** | [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | Large-scale pretraining, SFT, and Hugging Face to Megatron checkpoint conversion. | +| **PyTorch / Hugging Face** | [AutoModel](https://docs.nvidia.com/nemo-oss/automodel/latest/) | Fine-tuning, research iteration, and training up to roughly 1,000 GPUs. | +| **Megatron-Core** | [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | Large-scale pretraining, SFT, and Hugging Face to Megatron checkpoint conversion. | Both paths can feed downstream RL, evaluation, and export workflows. @@ -37,9 +37,9 @@ Checkpoint format affects the rest of the workflow. Check whether downstream RL, | Question | Good next page | | --- | --- | | Which training library should I start with? | [Pretraining guide](/get-started/pretraining) | -| How do I convert between Hugging Face and Megatron checkpoints? | [Megatron-Bridge docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| How do I convert between Hugging Face and Megatron checkpoints? | [Megatron-Bridge docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | | Can my trained model run through RL or evaluation? | [RL guide](/get-started/rl), [Inference guide](/get-started/inference) | -| How do I export or serve a model? | [Export-Deploy docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | +| How do I export or serve a model? | [Export-Deploy docs](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | ## Next Step diff --git a/fern/docs/pages/about/ecosystem.mdx b/fern/docs/pages/about/ecosystem.mdx index 97f7d3d..4fd4bbe 100644 --- a/fern/docs/pages/about/ecosystem.mdx +++ b/fern/docs/pages/about/ecosystem.mdx @@ -77,7 +77,7 @@ NeMo OSS focuses on open source repositories, OSS documentation, and Framework c | --- | --- | | 22 public GitHub repos in NVIDIA-NeMo | Customizer, NIM, enterprise services | | Framework container release notes | Per-tenant managed offerings | -| Open source docs on docs.nvidia.com/nemo | Microservice docs under `docs.nvidia.com/nemo/microservices` | +| Open source docs on docs.nvidia.com/nemo-oss | Microservice docs under `docs.nvidia.com/nemo-oss/microservices` | For the full suite, refer to [NVIDIA NeMo (commercial)](https://www.nvidia.com/en-us/ai-data-science/products/nemo/). @@ -102,7 +102,7 @@ Both train large language models (LLMs) and vision language models (VLMs) on NVI | **Checkpoint flow** | HF models day-0 | HF ↔ Megatron conversion | | **Best for** | Fine-tuning, research, rapid iteration | Large-scale pretraining and SFT | -Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) directly — the [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo is speech-only today. +Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) directly — the [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo is speech-only today. ## Related Entry Points diff --git a/fern/docs/pages/about/libraries.mdx b/fern/docs/pages/about/libraries.mdx index 9cce256..75b6ffe 100644 --- a/fern/docs/pages/about/libraries.mdx +++ b/fern/docs/pages/about/libraries.mdx @@ -62,4 +62,4 @@ Each card shows a **stage** and a **kind**: Use **tags** on each card (or the search box) for cross-cutting facets like `speech`, `evaluation`, or `agents`. -Some cards link directly to library docs; others link to the best available README or product documentation. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/). +Some cards link directly to library docs; others link to the best available README or product documentation. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/). diff --git a/fern/docs/pages/about/release-notes/index.mdx b/fern/docs/pages/about/release-notes/index.mdx index 9b5ed48..30ecec7 100644 --- a/fern/docs/pages/about/release-notes/index.mdx +++ b/fern/docs/pages/about/release-notes/index.mdx @@ -31,7 +31,7 @@ Use these links when you need component versions or cross-component issue notes - + PyTorch, Megatron-Core, Transformer Engine, and bundled library versions per container — canonical for 26.02+. diff --git a/fern/docs/pages/about/release-notes/known-issues.mdx b/fern/docs/pages/about/release-notes/known-issues.mdx index c766a8b..6743de7 100644 --- a/fern/docs/pages/about/release-notes/known-issues.mdx +++ b/fern/docs/pages/about/release-notes/known-issues.mdx @@ -13,7 +13,7 @@ Known issues for **NeMo Framework** NGC containers (`nvcr.io/nvidia/nemo`). Find Recent container tags and bundled component versions. - + Pinned package versions for 26.02+ containers. @@ -23,19 +23,19 @@ Pinned package versions for 26.02+ containers. See component release notes for library-specific known issues: -- [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) -- [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) -- [Run](https://docs.nvidia.com/nemo/run/latest/) -- [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) +- [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) +- [Export-Deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) +- [Run](https://docs.nvidia.com/nemo-oss/run/latest/) +- [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) ## 25.11 See component release notes for library-specific known issues: -- [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) -- [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) -- [Run](https://docs.nvidia.com/nemo/run/latest/) -- [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) +- [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) +- [Export-Deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) +- [Run](https://docs.nvidia.com/nemo-oss/run/latest/) +- [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) ## 25.09 @@ -121,7 +121,7 @@ These notes apply to framework-level training behavior in older releases. - The Megatron Core Distributed Optimizer currently lacks memory capacity optimization, resulting in higher model state memory usage at small data parallel sizes. - The overlap of the data-parallel parameter AllGather with `optimizer.step` (`overlap_param_gather_with_optimizer=true`) does not work with distributed checkpointing. - Support for converting models from NeMo 2.0 to 1.0 is not yet available. -- Transformer Engine changed checkpoint metadata after v1.10, which can cause checkpoint incompatibilities. **Workaround:** use `model.dist_ckpt_load_strictness=log_all` when working with Transformer Engine v1.10 or higher. See [software component versions](https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html) for TE versions per container. +- Transformer Engine changed checkpoint metadata after v1.10, which can cause checkpoint incompatibilities. **Workaround:** use `model.dist_ckpt_load_strictness=log_all` when working with Transformer Engine v1.10 or higher. See [software component versions](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/releases/software-versions.html) for TE versions per container. - For data preparation of GPT models, use your own dataset or an online dataset legally approved by your organization. - A race condition in the NeMo experiment manager can occur when multiple processes or threads attempt to access and modify shared resources simultaneously. - The Mistral and Mixtral tokenizers require a Hugging Face login. diff --git a/fern/docs/pages/get-started/data.mdx b/fern/docs/pages/get-started/data.mdx index aeca868..907857c 100644 --- a/fern/docs/pages/get-started/data.mdx +++ b/fern/docs/pages/get-started/data.mdx @@ -27,7 +27,7 @@ flowchart LR Most data workflows move from raw inputs to curated, protected, or synthetic datasets that downstream training and RL libraries can consume. -1. **Curate** raw corpora with [Curator](https://docs.nvidia.com/nemo/curator/latest/) (dedup, filtering, multimodal pipelines). +1. **Curate** raw corpora with [Curator](https://docs.nvidia.com/nemo-oss/curator/latest/) (dedup, filtering, multimodal pipelines). 2. **Generate** synthetic data with [Data Designer](https://nvidia-nemo.github.io/DataDesigner/latest/) or domain SDG tools. 3. **Protect** sensitive fields with [Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer) before sharing or training. diff --git a/fern/docs/pages/get-started/e2e.mdx b/fern/docs/pages/get-started/e2e.mdx index 500db4c..870ef4a 100644 --- a/fern/docs/pages/get-started/e2e.mdx +++ b/fern/docs/pages/get-started/e2e.mdx @@ -27,7 +27,7 @@ flowchart LR Use orchestration libraries when you need repeatable launches across local machines, SLURM, or Kubernetes. - + Configure, launch, and manage experiments on local machines, SLURM, and Kubernetes. diff --git a/fern/docs/pages/get-started/inference.mdx b/fern/docs/pages/get-started/inference.mdx index 2aad0fb..8bcf376 100644 --- a/fern/docs/pages/get-started/inference.mdx +++ b/fern/docs/pages/get-started/inference.mdx @@ -27,9 +27,9 @@ flowchart LR Most model inference workflows move from evaluation to export or deployment, with guardrails added where application behavior needs control. -1. **Evaluate** with [Evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) across 100+ harnesses. -2. **Export** to vLLM, TensorRT-LLM, or ONNX with [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/). -3. **Guard** production apps with [Guardrails](https://docs.nvidia.com/nemo/guardrails/latest/). +1. **Evaluate** with [Evaluator](https://docs.nvidia.com/nemo-oss/evaluator/latest/) across 100+ harnesses. +2. **Export** to vLLM, TensorRT-LLM, or ONNX with [Export-Deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/). +3. **Guard** production apps with [Guardrails](https://docs.nvidia.com/nemo-oss/guardrails/latest/). Models usually come from [Pretraining](/get-started/pretraining) or [RL](/get-started/rl). For bundled Framework container versions, refer to [Container releases](/about/release-notes/containers). diff --git a/fern/docs/pages/get-started/installation.mdx b/fern/docs/pages/get-started/installation.mdx index 5313a47..19bbc8d 100644 --- a/fern/docs/pages/get-started/installation.mdx +++ b/fern/docs/pages/get-started/installation.mdx @@ -13,10 +13,10 @@ Use package installs for local development, notebooks, and lightweight experimen | Workload | Install | Docs | | --- | --- | --- | -| Hugging Face large language model (LLM) and vision language model (VLM) training | `pip install nemo-automodel` | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | -| Alignment (DPO, GRPO, SFT) | NeMo RL repo | [NeMo RL](https://docs.nvidia.com/nemo/rl/latest/) | -| Speech ASR/TTS | `pip install nemo_toolkit[asr,tts]` | [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) | -| Data curation | Curator repo | [Curator](https://docs.nvidia.com/nemo/curator/latest/) | +| Hugging Face large language model (LLM) and vision language model (VLM) training | `pip install nemo-automodel` | [AutoModel](https://docs.nvidia.com/nemo-oss/automodel/latest/) | +| Alignment (DPO, GRPO, SFT) | NeMo RL repo | [NeMo RL](https://docs.nvidia.com/nemo-oss/rl/latest/) | +| Speech ASR/TTS | `pip install nemo_toolkit[asr,tts]` | [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) | +| Data curation | Curator repo | [Curator](https://docs.nvidia.com/nemo-oss/curator/latest/) | Each library publishes install extras and version pins in its own documentation. Use [Libraries](/about/libraries) to find the repo and docs site for your stage. @@ -74,7 +74,7 @@ Skills, Nemotron recipes, NeMo Run. Use NeMo Run when setup choices need to carry into repeatable experiment launch and tracking. - + Launch and track experiments on local machines, SLURM, and Kubernetes. diff --git a/fern/docs/pages/get-started/pretraining.mdx b/fern/docs/pages/get-started/pretraining.mdx index ab8e291..3deb9db 100644 --- a/fern/docs/pages/get-started/pretraining.mdx +++ b/fern/docs/pages/get-started/pretraining.mdx @@ -29,12 +29,12 @@ Choose the pretraining path based on model format, target scale, and whether you | Goal | GPUs | Library | | --- | --- | --- | -| Fine-tune Hugging Face LLMs and VLMs | 1–1,000 | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | -| Large-scale pretrain / SFT | 1,000+ | [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| Speech ASR, TTS, speech-LM | Any | [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) | +| Fine-tune Hugging Face LLMs and VLMs | 1–1,000 | [AutoModel](https://docs.nvidia.com/nemo-oss/automodel/latest/) | +| Large-scale pretrain / SFT | 1,000+ | [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | +| Speech ASR, TTS, speech-LM | Any | [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) | Fastest first run: [Quickstart](/get-started/quickstart). Install details: [Installation](/get-started/installation). -Model recipes, example configs, and supported architectures live on each library's docs site — for example [AutoModel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples) and [Megatron-Bridge recipes](https://docs.nvidia.com/nemo/megatron-bridge/latest/). +Model recipes, example configs, and supported architectures live on each library's docs site — for example [AutoModel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples) and [Megatron-Bridge recipes](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/). Post-training alignment: [RL](/get-started/rl). diff --git a/fern/docs/pages/get-started/quickstart.mdx b/fern/docs/pages/get-started/quickstart.mdx index 9f71a46..9212136 100644 --- a/fern/docs/pages/get-started/quickstart.mdx +++ b/fern/docs/pages/get-started/quickstart.mdx @@ -11,7 +11,7 @@ Minimal steps to validate your setup. For install options and containers, refer The fastest on-ramp for Hugging Face large language models (LLMs) and vision language models (VLMs) on one or more GPUs. Install and run the current quick start on the AutoModel docs site — model names, scripts, and cluster options change frequently. - + Local workstation and cluster launch options (canonical, kept up to date by the AutoModel team). @@ -21,7 +21,7 @@ More pretraining paths (Megatron-Bridge, recipes, scale): [Pretraining](/get-sta Use the NeMo Speech docs for install extras, model selection, and the current five-minute inference walkthrough. - + Installation, five-minute inference, model selection, and tutorials. diff --git a/fern/docs/pages/get-started/rl.mdx b/fern/docs/pages/get-started/rl.mdx index 8106b04..a1b1bf4 100644 --- a/fern/docs/pages/get-started/rl.mdx +++ b/fern/docs/pages/get-started/rl.mdx @@ -29,8 +29,8 @@ Use these entry points to choose between post-training algorithms, RL environmen | Technique | Start in docs | Library | | --- | --- | --- | -| GRPO, DPO, SFT | [NeMo RL examples](https://docs.nvidia.com/nemo/rl/latest/) | NeMo RL | -| RL environments | [NeMo Gym](https://docs.nvidia.com/nemo/gym/latest/index.html) | Gym | +| GRPO, DPO, SFT | [NeMo RL examples](https://docs.nvidia.com/nemo-oss/rl/latest/) | NeMo RL | +| RL environments | [NeMo Gym](https://docs.nvidia.com/nemo-oss/gym/latest/index.html) | Gym | Train base models first through [Pretraining](/get-started/pretraining), then align here. Evaluate with [Inference](/get-started/inference) libraries. diff --git a/fern/docs/pages/get-started/task-map.mdx b/fern/docs/pages/get-started/task-map.mdx index fd65a9b..bc9894c 100644 --- a/fern/docs/pages/get-started/task-map.mdx +++ b/fern/docs/pages/get-started/task-map.mdx @@ -9,21 +9,21 @@ Use this map when you know the task but not the repo name. Each row points to a | I want to... | Stage | Start with | Runtime path | Next docs | | --- | --- | --- | --- | --- | -| Curate text, image, video, or audio data | Data | Curator | Curator install or Curator container | [Curator docs](https://docs.nvidia.com/nemo/curator/latest/) | +| Curate text, image, video, or audio data | Data | Curator | Curator install or Curator container | [Curator docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | | Generate synthetic data | Data / E2E | Data Designer or Skills | Library install | [Data Designer docs](https://nvidia-nemo.github.io/DataDesigner/latest/), [Skills docs](https://nvidia-nemo.github.io/Skills/) | | Protect or anonymize sensitive data | Data | Anonymizer or Safe-Synthesizer | Library docs | [Libraries](/about/libraries) | -| Fine-tune a Hugging Face LLM or VLM | Pretraining | AutoModel | AutoModel container or pip | [AutoModel docs](https://docs.nvidia.com/nemo/automodel/latest/) | -| Train at Megatron scale | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| Convert Hugging Face and Megatron checkpoints | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge conversion docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| Build speech AI workflows | Pretraining | NeMo Speech | NeMo Speech install | [NeMo Speech docs](https://docs.nvidia.com/nemo/speech/nightly/) | -| Run SFT, DPO, GRPO, distillation, or RL | RL | NeMo RL | NeMo RL container or source checkout | [NeMo RL docs](https://docs.nvidia.com/nemo/rl/latest/) | -| Build RL environments for models or agents | RL | Gym | Gym install | [Gym docs](https://docs.nvidia.com/nemo/gym/latest/index.html) | +| Fine-tune a Hugging Face LLM or VLM | Pretraining | AutoModel | AutoModel container or pip | [AutoModel docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | +| Train at Megatron scale | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | +| Convert Hugging Face and Megatron checkpoints | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge conversion docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | +| Build speech AI workflows | Pretraining | NeMo Speech | NeMo Speech install | [NeMo Speech docs](https://docs.nvidia.com/nemo-oss/speech/nightly/) | +| Run SFT, DPO, GRPO, distillation, or RL | RL | NeMo RL | NeMo RL container or source checkout | [NeMo RL docs](https://docs.nvidia.com/nemo-oss/rl/latest/) | +| Build RL environments for models or agents | RL | Gym | Gym install | [Gym docs](https://docs.nvidia.com/nemo-oss/gym/latest/index.html) | | Run multi-turn agent rollouts | RL | ProRL Agent Server | Source checkout | [ProRL Agent Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server#readme) | -| Evaluate a model or agent | Inference | Evaluator | Framework container or library install | [Evaluator docs](https://docs.nvidia.com/nemo/evaluator/latest/) | -| Export or serve a model | Inference | Export-Deploy | Framework container | [Export-Deploy docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | -| Add guardrails to an app or agent | Inference | Guardrails | Guardrails install | [Guardrails docs](https://docs.nvidia.com/nemo/guardrails/latest/) | +| Evaluate a model or agent | Inference | Evaluator | Framework container or library install | [Evaluator docs](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | +| Export or serve a model | Inference | Export-Deploy | Framework container | [Export-Deploy docs](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | +| Add guardrails to an app or agent | Inference | Guardrails | Guardrails install | [Guardrails docs](https://docs.nvidia.com/nemo-oss/guardrails/latest/) | | Build integrated agent workflows | Inference | NeMo Platform | Platform setup | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | -| Launch experiments on local, SLURM, or Kubernetes | E2E | NeMo Run | Framework container or library install | [NeMo Run docs](https://docs.nvidia.com/nemo/run/latest/) | +| Launch experiments on local, SLURM, or Kubernetes | E2E | NeMo Run | Framework container or library install | [NeMo Run docs](https://docs.nvidia.com/nemo-oss/run/latest/) | | Follow reference recipes and cookbooks | E2E | Skills or Nemotron | Repo README and recipe docs | [Skills docs](https://nvidia-nemo.github.io/Skills/), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron#readme) | ## Choosing the Runtime diff --git a/fern/docs/pages/index.mdx b/fern/docs/pages/index.mdx index 8bb311a..fa769b0 100644 --- a/fern/docs/pages/index.mdx +++ b/fern/docs/pages/index.mdx @@ -4,7 +4,7 @@ subtitle: Open Source Libraries From the NVIDIA-NeMo GitHub Organization slug: "" --- -**NeMo OSS** brings NVIDIA's open source NeMo work together in one place: the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization and **docs.nvidia.com/nemo**. Use these pages to find the right library, stack, or container for your workflow: +**NeMo OSS** brings NVIDIA's open source NeMo work together in one place: the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization and **docs.nvidia.com/nemo-oss**. Use these pages to find the right library, stack, or container for your workflow: - **NeMo Framework** — build and run the model lifecycle: data, training, alignment, evaluation, and deployment. - **NeMo Platform** — integrate agent workflows: evaluate, secure, tune, and deploy agents. diff --git a/nemo-fw-presentation-outline.md b/nemo-fw-presentation-outline.md index 524ac87..e1b7c07 100644 --- a/nemo-fw-presentation-outline.md +++ b/nemo-fw-presentation-outline.md @@ -59,7 +59,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Audio:** ASR transcription, WER filtering, quality assessment. - Powered by **NVIDIA RAPIDS** (cuDF, cuML, cuGraph) + Ray for multi-node scaling. - Proven results: 16x faster fuzzy dedup on 8 TB dataset; 40% lower TCO vs CPU. -- **Docs:** [docs.nvidia.com/nemo/curator](https://docs.nvidia.com/nemo/curator/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/curator](https://docs.nvidia.com/nemo-oss/curator/latest/) - **Container:** [NGC NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) ### 3b. NeMo Data Designer @@ -111,7 +111,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - Actively developed — new model support weekly (MiniMax-M2, DeepSeek V3.2, Step 3.5-flash in Feb 2026). - Install: `pip install nemo-automodel` or `uv sync`. - **Launch options:** `torchrun`, `automodel` CLI (interactive + SLURM), Kubernetes (coming). -- **Docs:** [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/automodel](https://docs.nvidia.com/nemo-oss/automodel/latest/) - **Container:** [NGC NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) ### 4b. NeMo Megatron-Bridge (Scale — 1K+ GPUs) @@ -126,7 +126,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Model coverage:** Llama 2–3.3, Qwen 2–3 (incl. MoE and VL), DeepSeek V2/V3, Gemma/Gemma 3 VL, Nemotron-H, Nemotron Nano v2/VL, GPT-OSS, GLM-4.5, Mistral/Ministral, Moonlight, OlMoE. - **PyTorch-native training loop** — refactored from the legacy NeMo training stack for greater flexibility. - Community adoptions: VeRL, Slime, SkyRL, Mind Lab (trained trillion-parameter GRPO LoRA on 64 H800s). -- **Docs:** [docs.nvidia.com/nemo/megatron-bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/megatron-bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) - **Container:** [NGC NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) ### 4c. NeMo Speech @@ -154,7 +154,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Repo:** [NVIDIA-NeMo/Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) - **What it does:** Collection of cutting-edge optimizers (e.g., Muon, Dion) for use across training libraries. -- **Docs:** [docs.nvidia.com/nemo/emerging-optimizers](https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html) +- **Docs:** [docs.nvidia.com/nemo-oss/emerging-optimizers](https://docs.nvidia.com/nemo-oss/emerging-optimizers/latest/index.html) --- @@ -178,7 +178,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - Used to train [Nemotron-3-Nano-30B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8). - Latest release: v0.5.0 (Jan 2026) with LoRA support for DTensor and Megatron backends. - Install: `uv venv && uv run python examples/run_grpo.py` -- **Docs:** [docs.nvidia.com/nemo/rl](https://docs.nvidia.com/nemo/rl/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/rl](https://docs.nvidia.com/nemo-oss/rl/latest/) - **Container:** [NGC NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) ### 5b. NeMo Gym @@ -197,7 +197,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - Integrates with NeMo RL and other training frameworks. - Responses API-based agent architecture. - Early development — APIs evolving. -- **Docs:** [docs.nvidia.com/nemo/gym](https://docs.nvidia.com/nemo/gym/latest/index.html) +- **Docs:** [docs.nvidia.com/nemo-oss/gym](https://docs.nvidia.com/nemo-oss/gym/latest/index.html) --- @@ -221,7 +221,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Reproducibility by default:** All configs, seeds, and software provenance captured automatically. - **Scale anywhere:** Local machine, SLURM, Lepton AI, cloud-native backends. - Install: `pip install nemo-evaluator-launcher` -- **Docs:** [docs.nvidia.com/nemo/evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/evaluator](https://docs.nvidia.com/nemo-oss/evaluator/latest/) ### 6b. NeMo Skills (Evaluation Side) @@ -244,7 +244,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Multi-GPU / Multi-instance** deployment support. - Serves as the bridge from training to production inference. - Install: `pip install nemo-export-deploy` (lightweight) or use NeMo Framework container for full features. -- **Docs:** [docs.nvidia.com/nemo/export-deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/export-deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) - **Container:** Included in [NGC NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) ### 7b. NeMo Guardrails @@ -261,7 +261,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - OpenAI-compatible server endpoint at `/v1/chat/completions`. - Published in EMNLP 2023 — academic paper available. - Latest version: 0.20.0. -- **Docs:** [docs.nvidia.com/nemo/guardrails](https://docs.nvidia.com/nemo/guardrails) +- **Docs:** [docs.nvidia.com/nemo-oss/guardrails](https://docs.nvidia.com/nemo-oss/guardrails) --- @@ -278,7 +278,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Modular:** Decouple task from executor; reuse environment configs across tasks. - Built on Fiddle (Google), TorchX, Skypilot, XManager. - Pre-release — API subject to change before v1.0. -- **Docs:** [docs.nvidia.com/nemo/run](https://docs.nvidia.com/nemo/run/latest/) +- **Docs:** [docs.nvidia.com/nemo-oss/run](https://docs.nvidia.com/nemo-oss/run/latest/) ### 8b. Nemotron (Models & Recipes) @@ -347,18 +347,18 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo | Repo | Stage | Stars | One-Liner | Docs | |------|-------|-------|-----------|------| -| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data | 1,394 | GPU-accelerated data curation (text, image, video, audio) | [link](https://docs.nvidia.com/nemo/curator/latest/) | +| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data | 1,394 | GPU-accelerated data curation (text, image, video, audio) | [link](https://docs.nvidia.com/nemo-oss/curator/latest/) | | [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Data | 698 | Synthetic data generation from scratch or seed data | [link](https://nvidia-nemo.github.io/DataDesigner/latest/) | | [Skills](https://github.com/NVIDIA-NeMo/Skills) | Data + Eval | 816 | SDG pipelines + evaluation for math, code, science | [link](https://nvidia-nemo.github.io/Skills/) | -| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Training | 288 | PyTorch DTensor-native training with HF support | [link](https://docs.nvidia.com/nemo/automodel/latest/) | -| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Training | 423 | Megatron-Core training with bidirectional HF conversion | [link](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Training | 288 | PyTorch DTensor-native training with HF support | [link](https://docs.nvidia.com/nemo-oss/automodel/latest/) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Training | 423 | Megatron-Core training with bidirectional HF conversion | [link](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | | [NeMo (Speech)](https://github.com/NVIDIA-NeMo/NeMo) | Training | — | Speech AI (ASR, TTS) on Megatron-Core | [link](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | | [DFM](https://github.com/NVIDIA-NeMo/DFM) | Training | 29 | Diffusion model training (video, image) | [link](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | -| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Training | — | Collection of cutting-edge optimizers | [link](https://docs.nvidia.com/nemo/emerging-optimizers/latest/) | -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Alignment | 1,306 | Scalable post-training (GRPO, DPO, SFT, distillation) | [link](https://docs.nvidia.com/nemo/rl/latest/) | -| [Gym](https://github.com/NVIDIA-NeMo/Gym) | Alignment | 637 | RL environments for LLM training | [link](https://docs.nvidia.com/nemo/gym/latest/) | -| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Evaluation | 195 | 100+ benchmarks across 18 harnesses | [link](https://docs.nvidia.com/nemo/evaluator/latest/) | -| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Deployment | 27 | Export to TRT-LLM/vLLM/ONNX + Triton serving | [link](https://docs.nvidia.com/nemo/export-deploy/latest/) | -| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Deployment | 5,635 | Programmable safety rails with Colang DSL | [link](https://docs.nvidia.com/nemo/guardrails) | -| [Run](https://github.com/NVIDIA-NeMo/Run) | Infra | 216 | Experiment launcher (local, SLURM, K8s) | [link](https://docs.nvidia.com/nemo/run/latest/) | +| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Training | — | Collection of cutting-edge optimizers | [link](https://docs.nvidia.com/nemo-oss/emerging-optimizers/latest/) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Alignment | 1,306 | Scalable post-training (GRPO, DPO, SFT, distillation) | [link](https://docs.nvidia.com/nemo-oss/rl/latest/) | +| [Gym](https://github.com/NVIDIA-NeMo/Gym) | Alignment | 637 | RL environments for LLM training | [link](https://docs.nvidia.com/nemo-oss/gym/latest/) | +| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Evaluation | 195 | 100+ benchmarks across 18 harnesses | [link](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | +| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Deployment | 27 | Export to TRT-LLM/vLLM/ONNX + Triton serving | [link](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | +| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Deployment | 5,635 | Programmable safety rails with Colang DSL | [link](https://docs.nvidia.com/nemo-oss/guardrails) | +| [Run](https://github.com/NVIDIA-NeMo/Run) | Infra | 216 | Experiment launcher (local, SLURM, K8s) | [link](https://docs.nvidia.com/nemo-oss/run/latest/) | | [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes | — | Nemotron model family recipes | [link](https://github.com/NVIDIA-NeMo/Nemotron#readme) | diff --git a/nemo-fw-product-walkthrough.md b/nemo-fw-product-walkthrough.md index deeb7a1..a3d4f1b 100644 --- a/nemo-fw-product-walkthrough.md +++ b/nemo-fw-product-walkthrough.md @@ -26,16 +26,16 @@ Prepare Data → Train the Model → Align / Improve → Evaluate Quality → De | # | Product | One-Line Summary | Stage | Docs | |---|---------|-----------------|-------|------| -| 1 | [AutoModel](#1-automodel) | Fine-tune AI models with minimal setup | Training | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | -| 2 | [Curator](#2-curator--video-curator) | Clean and filter training data at scale | Data | [docs](https://docs.nvidia.com/nemo/curator/latest/) | -| 3 | [Customizer](#3-customizer) | Fine-tune models via API (managed service) | Training | [docs](https://docs.nvidia.com/nemo/microservices/latest/fine-tune/index.html) | +| 1 | [AutoModel](#1-automodel) | Fine-tune AI models with minimal setup | Training | [docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | +| 2 | [Curator](#2-curator--video-curator) | Clean and filter training data at scale | Data | [docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | +| 3 | [Customizer](#3-customizer) | Fine-tune models via API (managed service) | Training | [docs](https://docs.nvidia.com/nemo-oss/microservices/latest/fine-tune/index.html) | | 4 | [Data Designer](#4-data-designer) | Generate synthetic training data | Data | [docs](https://nvidia-nemo.github.io/DataDesigner/latest/) | -| 5 | [Evaluator](#5-evaluator) | Benchmark model quality across 100+ tests | Evaluation | [docs](https://docs.nvidia.com/nemo/evaluator/latest/) | -| 6 | [Gym](#6-gym) | Build practice environments for RL training | Alignment | [docs](https://docs.nvidia.com/nemo/gym/latest/) | +| 5 | [Evaluator](#5-evaluator) | Benchmark model quality across 100+ tests | Evaluation | [docs](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | +| 6 | [Gym](#6-gym) | Build practice environments for RL training | Alignment | [docs](https://docs.nvidia.com/nemo-oss/gym/latest/) | | 7 | [MCORE](#7-mcore-megatron-core) | Low-level engine for large-scale training | Training (engine) | [docs](https://docs.nvidia.com/Megatron-Core/) | -| 8 | [Megatron-Bridge](#8-megatron-bridge) | Train at massive scale (1,000+ GPUs) | Training | [docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | -| 9 | [nvFSDP](#9-nvfsdp) | Memory-efficient training technique inside AutoModel | Training (component) | [docs](https://docs.nvidia.com/nemo/automodel/latest/) | -| 10 | [RL](#10-rl) | Improve models using reinforcement learning | Alignment | [docs](https://docs.nvidia.com/nemo/rl/latest/) | +| 8 | [Megatron-Bridge](#8-megatron-bridge) | Train at massive scale (1,000+ GPUs) | Training | [docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | +| 9 | [nvFSDP](#9-nvfsdp) | Memory-efficient training technique inside AutoModel | Training (component) | [docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | +| 10 | [RL](#10-rl) | Improve models using reinforcement learning | Alignment | [docs](https://docs.nvidia.com/nemo-oss/rl/latest/) | | 11 | [Toolkit (Speech)](#11-toolkit-speech) | Train speech recognition and text-to-speech models | Training | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | --- @@ -64,7 +64,7 @@ A short glossary for terms that come up repeatedly across products. ![NeMo AutoModel](assets/diagram-03-automodel.png) -**Repo:** [NVIDIA-NeMo/Automodel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/) +**Repo:** [NVIDIA-NeMo/Automodel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo-oss/automodel](https://docs.nvidia.com/nemo-oss/automodel/latest/) ### What Is It? @@ -101,7 +101,7 @@ AutoModel is the **recommended starting point** for most training tasks. It work ![NeMo Curator](assets/diagram-01-curator.png) -**Repo:** [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | **Docs:** [docs.nvidia.com/nemo/curator](https://docs.nvidia.com/nemo/curator/latest/) +**Repo:** [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | **Docs:** [docs.nvidia.com/nemo-oss/curator](https://docs.nvidia.com/nemo-oss/curator/latest/) ### What Is It? @@ -188,7 +188,7 @@ Data Designer sits alongside Curator in the **data preparation stage** — but t ![NeMo Evaluator](assets/diagram-07-evaluator.png) -**Repo:** [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | **Docs:** [docs.nvidia.com/nemo/evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) +**Repo:** [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | **Docs:** [docs.nvidia.com/nemo-oss/evaluator](https://docs.nvidia.com/nemo-oss/evaluator/latest/) ### What Is It? @@ -228,7 +228,7 @@ Evaluator sits **after training and alignment** — it answers "how good is this ![NeMo Gym](assets/diagram-06-nemo-gym.png) -**Repo:** [NVIDIA-NeMo/Gym](https://github.com/NVIDIA-NeMo/Gym) | **Docs:** [docs.nvidia.com/nemo/gym](https://docs.nvidia.com/nemo/gym/latest/) +**Repo:** [NVIDIA-NeMo/Gym](https://github.com/NVIDIA-NeMo/Gym) | **Docs:** [docs.nvidia.com/nemo-oss/gym](https://docs.nvidia.com/nemo-oss/gym/latest/) ### What Is It? @@ -315,7 +315,7 @@ Under the hood: PyTorch MCORE MCORE ![NeMo Megatron-Bridge](assets/diagram-04-megatron-bridge.png) -**Repo:** [NVIDIA-NeMo/Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | **Docs:** [docs.nvidia.com/nemo/megatron-bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +**Repo:** [NVIDIA-NeMo/Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | **Docs:** [docs.nvidia.com/nemo-oss/megatron-bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) ### What Is It? @@ -362,7 +362,7 @@ Megatron-Bridge is the **heavy-duty training option** — complementary to AutoM ## 9. nvFSDP -**Location:** Inside [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo/automodel](https://docs.nvidia.com/nemo/automodel/latest/) +**Location:** Inside [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo-oss/automodel](https://docs.nvidia.com/nemo-oss/automodel/latest/) ### What Is It? @@ -395,7 +395,7 @@ nvFSDP is an **implementation detail** of AutoModel. Users configure it through ![NeMo RL](assets/diagram-05-nemo-rl.png) -**Repo:** [NVIDIA-NeMo/RL](https://github.com/NVIDIA-NeMo/RL) | **Docs:** [docs.nvidia.com/nemo/rl](https://docs.nvidia.com/nemo/rl/latest/) +**Repo:** [NVIDIA-NeMo/RL](https://github.com/NVIDIA-NeMo/RL) | **Docs:** [docs.nvidia.com/nemo-oss/rl](https://docs.nvidia.com/nemo-oss/rl/latest/) ### What Is It? @@ -532,8 +532,8 @@ Not all products are documented in the same place: | Docs Host | Products | |-----------|----------| -| `docs.nvidia.com/nemo/...` | AutoModel, Megatron-Bridge, RL, Gym, Evaluator, Curator | -| `docs.nvidia.com/nemo/microservices/...` | Customizer | +| `docs.nvidia.com/nemo-oss/...` | AutoModel, Megatron-Bridge, RL, Gym, Evaluator, Curator | +| `docs.nvidia.com/nemo-oss/microservices/...` | Customizer | | `docs.nvidia.com/Megatron-Core/` | MCORE | | `nvidia-nemo.github.io/...` | Data Designer, Skills | | `docs.nvidia.com/nemo-framework/user-guide/...` | Toolkit (Speech) | diff --git a/profile/README.md b/profile/README.md index fb90c66..ea9e21d 100644 --- a/profile/README.md +++ b/profile/README.md @@ -7,7 +7,7 @@ SPDX-License-Identifier: Apache-2.0 Build generative AI models and agents on NVIDIA GPUs with open source NVIDIA NeMo libraries for data curation, training, alignment, evaluation, deployment, guardrails, and end-to-end recipes. -NeMo OSS is part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Use this GitHub organization for source code and issues, and use the [NeMo OSS documentation](https://docs.nvidia.com/nemo) to choose the right library, workflow, or runtime path. +NeMo OSS is part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Use this GitHub organization for source code and issues, and use the [NeMo OSS documentation](https://docs.nvidia.com/nemo-oss) to choose the right library, workflow, or runtime path. ## Get Started @@ -15,12 +15,12 @@ Use these links to choose where to start and move into the full documentation se | Goal | Resource | | --- | --- | -| Find the best starting repository for a task | [Task Map](https://docs.nvidia.com/nemo/get-started/task-map) | -| Choose a runtime path: container, pip, source checkout, or Platform setup | [Runtime Chooser](https://docs.nvidia.com/nemo/get-started/runtime-chooser) | -| Browse NeMo OSS libraries | [Library Catalog](https://docs.nvidia.com/nemo/about/libraries) | -| Learn how Framework, Platform, stages, and containers fit together | [Concepts](https://docs.nvidia.com/nemo/about/concepts) | -| Check Framework container releases and known issues | [Release Notes](https://docs.nvidia.com/nemo/about/release-notes) | -| Ask questions and follow community updates | [Community](https://docs.nvidia.com/nemo/resources/community) | +| Find the best starting repository for a task | [Task Map](https://docs.nvidia.com/nemo-oss/get-started/task-map) | +| Choose a runtime path: container, pip, source checkout, or Platform setup | [Runtime Chooser](https://docs.nvidia.com/nemo-oss/get-started/runtime-chooser) | +| Browse NeMo OSS libraries | [Library Catalog](https://docs.nvidia.com/nemo-oss/about/libraries) | +| Learn how Framework, Platform, stages, and containers fit together | [Concepts](https://docs.nvidia.com/nemo-oss/about/concepts) | +| Check Framework container releases and known issues | [Release Notes](https://docs.nvidia.com/nemo-oss/about/release-notes) | +| Ask questions and follow community updates | [Community](https://docs.nvidia.com/nemo-oss/resources/community) | ## Choose by Workflow @@ -42,7 +42,7 @@ Use NeMo Framework for model lifecycle work, and use NeMo Platform for integrate - **NeMo Framework** is the model-lifecycle stack: data, training, RL, evaluation, export, and deployment libraries. - **NeMo Platform** is the agent workflow entry point: CLI, SDK, and Studio for evaluating, securing, tuning, and deploying agents. -For naming, runtime, and workflow guidance, see [Concepts](https://docs.nvidia.com/nemo/about/concepts) and [Get Started](https://docs.nvidia.com/nemo/get-started). +For naming, runtime, and workflow guidance, see [Concepts](https://docs.nvidia.com/nemo-oss/about/concepts) and [Get Started](https://docs.nvidia.com/nemo-oss/get-started). ## Community @@ -50,7 +50,7 @@ Use these links to ask questions, browse repositories, and find community learni - [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) - [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) -- [External Learning](https://docs.nvidia.com/nemo/resources/external-learning) +- [External Learning](https://docs.nvidia.com/nemo-oss/resources/external-learning) ## License From 49ea81dd345223175ec87adfd635301382893e46 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:11:09 -0400 Subject: [PATCH 13/18] Focus profile README on repositories --- profile/README.md | 31 ++++++++----------------------- 1 file changed, 8 insertions(+), 23 deletions(-) diff --git a/profile/README.md b/profile/README.md index ea9e21d..d0503b8 100644 --- a/profile/README.md +++ b/profile/README.md @@ -7,26 +7,13 @@ SPDX-License-Identifier: Apache-2.0 Build generative AI models and agents on NVIDIA GPUs with open source NVIDIA NeMo libraries for data curation, training, alignment, evaluation, deployment, guardrails, and end-to-end recipes. -NeMo OSS is part of the broader [NVIDIA NeMo](https://www.nvidia.com/en-us/ai-data-science/products/nemo/) software suite. Use this GitHub organization for source code and issues, and use the [NeMo OSS documentation](https://docs.nvidia.com/nemo-oss) to choose the right library, workflow, or runtime path. +NeMo OSS is part of the broader NVIDIA NeMo software suite. Use this GitHub organization for source code, examples, issue tracking, and repository-specific documentation. -## Get Started +## Start With a Repository -Use these links to choose where to start and move into the full documentation set. +Use this table to choose a starting repository based on your workflow. -| Goal | Resource | -| --- | --- | -| Find the best starting repository for a task | [Task Map](https://docs.nvidia.com/nemo-oss/get-started/task-map) | -| Choose a runtime path: container, pip, source checkout, or Platform setup | [Runtime Chooser](https://docs.nvidia.com/nemo-oss/get-started/runtime-chooser) | -| Browse NeMo OSS libraries | [Library Catalog](https://docs.nvidia.com/nemo-oss/about/libraries) | -| Learn how Framework, Platform, stages, and containers fit together | [Concepts](https://docs.nvidia.com/nemo-oss/about/concepts) | -| Check Framework container releases and known issues | [Release Notes](https://docs.nvidia.com/nemo-oss/about/release-notes) | -| Ask questions and follow community updates | [Community](https://docs.nvidia.com/nemo-oss/resources/community) | - -## Choose by Workflow - -Use this table to move from a workflow area to a starting repository. - -| Workflow | Use For | Start With | +| Workflow | Use For | Repositories | | --- | --- | --- | | **Data** | Curate, synthesize, anonymize, and prepare datasets | [Curator](https://github.com/NVIDIA-NeMo/Curator), [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner), [Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer) | | **Pretraining** | Train, fine-tune, adapt, and convert model checkpoints | [AutoModel](https://github.com/NVIDIA-NeMo/Automodel), [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge), [NeMo Speech](https://github.com/NVIDIA-NeMo/NeMo) | @@ -39,18 +26,16 @@ Use this table to move from a workflow area to a starting repository. Use NeMo Framework for model lifecycle work, and use NeMo Platform for integrated agent workflows. -- **NeMo Framework** is the model-lifecycle stack: data, training, RL, evaluation, export, and deployment libraries. -- **NeMo Platform** is the agent workflow entry point: CLI, SDK, and Studio for evaluating, securing, tuning, and deploying agents. - -For naming, runtime, and workflow guidance, see [Concepts](https://docs.nvidia.com/nemo-oss/about/concepts) and [Get Started](https://docs.nvidia.com/nemo-oss/get-started). +- **NeMo Framework** is the model-lifecycle stack: data, training, RL, evaluation, export, and deployment libraries across the repositories listed above. +- **NeMo Platform** is the agent workflow entry point: [CLI, SDK, and Studio](https://github.com/NVIDIA-NeMo/nemo-platform) for evaluating, securing, tuning, and deploying agents. ## Community -Use these links to ask questions, browse repositories, and find community learning resources. +Use these links to ask questions, browse repositories, and follow open work. - [GitHub Discussions](https://github.com/orgs/NVIDIA-NeMo/discussions) - [All Repositories](https://github.com/orgs/NVIDIA-NeMo/repositories) -- [External Learning](https://docs.nvidia.com/nemo-oss/resources/external-learning) +- [Open Issues](https://github.com/search?q=org%3ANVIDIA-NeMo+is%3Aissue+is%3Aopen&type=issues) ## License From 72e8e8d55f078d94ebbf3a43912ee6d5a47f1f80 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:12:35 -0400 Subject: [PATCH 14/18] Add NeMo Platform to setup pages --- fern/docs/pages/get-started/index.mdx | 4 ++-- fern/docs/pages/get-started/installation.mdx | 19 ++++++++++++++++++- fern/docs/pages/get-started/quickstart.mdx | 10 ++++++++++ 3 files changed, 30 insertions(+), 3 deletions(-) diff --git a/fern/docs/pages/get-started/index.mdx b/fern/docs/pages/get-started/index.mdx index 7bf0d5d..73c6083 100644 --- a/fern/docs/pages/get-started/index.mdx +++ b/fern/docs/pages/get-started/index.mdx @@ -36,11 +36,11 @@ Use these pages when you are ready to install a library or run a first example. -Fastest paths: AutoModel fine-tuning and NeMo Speech inference. +Fastest paths: AutoModel fine-tuning, NeMo Speech inference, and NeMo Platform agents. -pip, NGC containers, scale, and backend choice. +pip, NGC containers, Platform setup, scale, and backend choice. diff --git a/fern/docs/pages/get-started/installation.mdx b/fern/docs/pages/get-started/installation.mdx index 19bbc8d..2107220 100644 --- a/fern/docs/pages/get-started/installation.mdx +++ b/fern/docs/pages/get-started/installation.mdx @@ -5,7 +5,7 @@ slug: get-started/installation position: 3 --- -How to install NeMo OSS libraries and pick a backend for your GPU scale. For a minimal first run, start with [Quickstart](/get-started/quickstart). For a setup decision before commands, use [Runtime chooser](/get-started/runtime-chooser). +How to install NeMo OSS libraries, choose Platform setup for agents, and pick a backend for your GPU scale. For a minimal first run, start with [Quickstart](/get-started/quickstart). For a setup decision before commands, use [Runtime chooser](/get-started/runtime-chooser). ## Pip Install (Recommended for Development) @@ -17,6 +17,7 @@ Use package installs for local development, notebooks, and lightweight experimen | Alignment (DPO, GRPO, SFT) | NeMo RL repo | [NeMo RL](https://docs.nvidia.com/nemo-oss/rl/latest/) | | Speech ASR/TTS | `pip install nemo_toolkit[asr,tts]` | [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) | | Data curation | Curator repo | [Curator](https://docs.nvidia.com/nemo-oss/curator/latest/) | +| Integrated agent workflows | NeMo Platform repo | [NeMo Platform](https://nvidia-nemo.github.io/nemo-platform/main/) | Each library publishes install extras and version pins in its own documentation. Use [Libraries](/about/libraries) to find the repo and docs site for your stage. @@ -35,6 +36,22 @@ The **NeMo Framework** image (`nvcr.io/nvidia/nemo`) is the multi-library traini See [Runtime chooser](/get-started/runtime-chooser) for setup guidance. +## NeMo Platform Setup + +Use NeMo Platform when you want an integrated agent workflow with CLI, SDK, and Studio rather than one standalone library. + + + + +Follow the current setup path for agent evaluation, guardrails, tuning, and deployment. + + + +Browse source, examples, issues, and release notes for the Platform entry point. + + + + ## Scale and Backends Use scale and checkpoint needs to choose between the Hugging Face-native path and the Megatron-Core path. diff --git a/fern/docs/pages/get-started/quickstart.mdx b/fern/docs/pages/get-started/quickstart.mdx index 9212136..41e16e6 100644 --- a/fern/docs/pages/get-started/quickstart.mdx +++ b/fern/docs/pages/get-started/quickstart.mdx @@ -27,6 +27,16 @@ Installation, five-minute inference, model selection, and tutorials. Speech training and full speech-language workflows: [Pretraining](/get-started/pretraining). +## Agent Workflows With NeMo Platform + +Use NeMo Platform when your first result is an agent workflow rather than a single model-lifecycle library. + + +CLI, SDK, and Studio setup for evaluating, securing, tuning, and deploying agents. + + +For direct library paths inside agent workflows, see [Inference](/get-started/inference). + ## Next Steps Use these links when you are ready to move from a first run to setup, routing, or stage-specific docs. From ac26402ee87523000b6f10dfd0db656d43a2ce21 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:16:27 -0400 Subject: [PATCH 15/18] Reframe quickstart by user outcome --- fern/docs/pages/get-started/index.mdx | 2 +- fern/docs/pages/get-started/quickstart.mdx | 46 ++++++++++++++-------- 2 files changed, 30 insertions(+), 18 deletions(-) diff --git a/fern/docs/pages/get-started/index.mdx b/fern/docs/pages/get-started/index.mdx index 73c6083..f4a949a 100644 --- a/fern/docs/pages/get-started/index.mdx +++ b/fern/docs/pages/get-started/index.mdx @@ -36,7 +36,7 @@ Use these pages when you are ready to install a library or run a first example. -Fastest paths: AutoModel fine-tuning, NeMo Speech inference, and NeMo Platform agents. +Fastest paths by outcome: training, agents, data, RL, inference, speech, and experiments. diff --git a/fern/docs/pages/get-started/quickstart.mdx b/fern/docs/pages/get-started/quickstart.mdx index 41e16e6..b238883 100644 --- a/fern/docs/pages/get-started/quickstart.mdx +++ b/fern/docs/pages/get-started/quickstart.mdx @@ -5,37 +5,49 @@ slug: get-started/quickstart position: 2 --- -Minimal steps to validate your setup. For install options and containers, refer to [Installation](/get-started/installation). +Choose the shortest path to a first useful result. For install options and containers, refer to [Installation](/get-started/installation). -## Fine-Tune With AutoModel +## Choose a First Result -The fastest on-ramp for Hugging Face large language models (LLMs) and vision language models (VLMs) on one or more GPUs. Install and run the current quick start on the AutoModel docs site — model names, scripts, and cluster options change frequently. +Use this table to start from the outcome you want, then follow the linked docs for current commands and examples. - -Local workstation and cluster launch options (canonical, kept up to date by the AutoModel team). - +| First Result | Start Here | Good Fit | +| --- | --- | --- | +| Fine-tune an LLM or VLM | [AutoModel quick start](https://docs.nvidia.com/nemo-oss/automodel/latest/launcher/local-workstation.html#quick-start-choose-your-job-launch-option) | Hugging Face models on local workstations or clusters. | +| Build or ship an agent workflow | [NeMo Platform quick start](https://nvidia-nemo.github.io/nemo-platform/main/) | Agent evaluation, guardrails, tuning, and deployment with CLI, SDK, and Studio. | +| Curate or prepare training data | [Curator docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | Text, image, video, or audio data pipelines. | +| Run post-training or RL | [NeMo RL docs](https://docs.nvidia.com/nemo-oss/rl/latest/) | SFT, DPO, GRPO, RL, and distillation workflows. | +| Evaluate or deploy a model | [Inference](/get-started/inference) | Evaluation, export, serving, and guardrails across the inference libraries. | +| Work on speech AI | [NeMo Speech docs](https://docs.nvidia.com/nemo-oss/speech/nightly/) | ASR, TTS, speech language models, and speech tutorials. | +| Launch repeatable experiments | [NeMo Run docs](https://docs.nvidia.com/nemo-oss/run/latest/) | Local, SLURM, and Kubernetes experiment orchestration. | -More pretraining paths (Megatron-Bridge, recipes, scale): [Pretraining](/get-started/pretraining). +## Follow a Stage Path -## Speech Inference +Use the stage pages when your first result spans multiple repositories or you need more context before choosing a library. -Use the NeMo Speech docs for install extras, model selection, and the current five-minute inference walkthrough. + - -Installation, five-minute inference, model selection, and tutorials. + +Curator, Data Designer, anonymization, and synthetic data generation. -Speech training and full speech-language workflows: [Pretraining](/get-started/pretraining). + +AutoModel, Megatron-Bridge, NeMo Speech, optimizers, and scale choices. + -## Agent Workflows With NeMo Platform + +NeMo RL, Gym, agent environments, and rollout infrastructure. + -Use NeMo Platform when your first result is an agent workflow rather than a single model-lifecycle library. + +Evaluator, Export-Deploy, Guardrails, and NeMo Platform for agents. + - -CLI, SDK, and Studio setup for evaluating, securing, tuning, and deploying agents. + +NeMo Run, Skills, Nemotron recipes, and reference workflows. -For direct library paths inside agent workflows, see [Inference](/get-started/inference). + ## Next Steps From 53dbb7870fca7186ca7f81653ade2dad1941bf48 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:33:06 -0400 Subject: [PATCH 16/18] Verify setup links for quickstart paths --- fern/TAXONOMY.md | 2 +- fern/components/repos.ts | 4 +--- fern/docs/pages/about/ecosystem.mdx | 4 ++-- fern/docs/pages/get-started/index.mdx | 4 ++-- fern/docs/pages/get-started/inference.mdx | 4 ++-- fern/docs/pages/get-started/installation.mdx | 14 +++++++------- fern/docs/pages/get-started/quickstart.mdx | 15 ++++++++------- fern/docs/pages/get-started/runtime-chooser.mdx | 2 +- fern/docs/pages/get-started/task-map.mdx | 2 +- nemo-fw-product-walkthrough.md | 3 +-- 10 files changed, 26 insertions(+), 28 deletions(-) diff --git a/fern/TAXONOMY.md b/fern/TAXONOMY.md index 8a488e3..a3ccc06 100644 --- a/fern/TAXONOMY.md +++ b/fern/TAXONOMY.md @@ -67,4 +67,4 @@ Concepts is a directory, not a glossary. Keep pages focused on stable relationsh ## Broader suite references -Customizer, NIM, and other commercial NeMo microservices have their own product documentation. Link to them from Ecosystem when they help readers understand the full suite. +Customizer, NIM, and other commercial NeMo microservices have their own product documentation. Mention them only when needed to explain the broader suite; do not use commercial microservice docs as OSS setup destinations. diff --git a/fern/components/repos.ts b/fern/components/repos.ts index 8833453..125cd7e 100644 --- a/fern/components/repos.ts +++ b/fern/components/repos.ts @@ -101,8 +101,6 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "data", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Safe-Synthesizer", - docsUrl: - "https://docs.nvidia.com/nemo-oss/microservices/latest/generate-private-synthetic-data/", tags: ["privacy", "tabular"], }, { @@ -228,7 +226,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "inference", kind: "integration", githubUrl: "https://github.com/NVIDIA-NeMo/nemo-platform", - docsUrl: "https://nvidia-nemo.github.io/nemo-platform/main/", + docsUrl: "https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/", tags: ["agents", "platform", "deployment"], }, // E2E diff --git a/fern/docs/pages/about/ecosystem.mdx b/fern/docs/pages/about/ecosystem.mdx index 4fd4bbe..eb71fb1 100644 --- a/fern/docs/pages/about/ecosystem.mdx +++ b/fern/docs/pages/about/ecosystem.mdx @@ -73,11 +73,11 @@ Many teams train with **AutoModel or Megatron-Bridge**, align with **NeMo RL**, NeMo OSS focuses on open source repositories, OSS documentation, and Framework container releases. The broader NVIDIA NeMo suite also includes commercial products, enterprise services, and NIM microservices with their own product documentation. -| NeMo OSS | Broader NVIDIA NeMo docs | +| NeMo OSS | Broader NVIDIA NeMo Docs | | --- | --- | | 22 public GitHub repos in NVIDIA-NeMo | Customizer, NIM, enterprise services | | Framework container release notes | Per-tenant managed offerings | -| Open source docs on docs.nvidia.com/nemo-oss | Microservice docs under `docs.nvidia.com/nemo-oss/microservices` | +| Open source hub and library documentation | Product documentation for commercial offerings | For the full suite, refer to [NVIDIA NeMo (commercial)](https://www.nvidia.com/en-us/ai-data-science/products/nemo/). diff --git a/fern/docs/pages/get-started/index.mdx b/fern/docs/pages/get-started/index.mdx index f4a949a..5010dd6 100644 --- a/fern/docs/pages/get-started/index.mdx +++ b/fern/docs/pages/get-started/index.mdx @@ -23,7 +23,7 @@ Choose Framework container, standalone image, pip/source install, or Platform se Quickstart and installation for model training and deployment. - + Agent evaluate, secure, tune, and deploy — CLI, SDK, and Studio. @@ -84,7 +84,7 @@ For a fuller routing table, use [Task map](/get-started/task-map). | Train at scale | 1K+ GPUs | [Pretraining](/get-started/pretraining) → Megatron-Bridge | | Align (DPO/GRPO) | Any | [RL](/get-started/rl) | | Evaluate or deploy a model | Any | [Inference](/get-started/inference) | -| Ship or harden agents | Any | [Inference](/get-started/inference) → [NeMo Platform](https://nvidia-nemo.github.io/nemo-platform/main/) | +| Ship or harden agents | Any | [Inference](/get-started/inference) → [NeMo Platform](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | | Run end-to-end recipes | Any | [E2E](/get-started/e2e) | | Speech AI | Any | [Pretraining](/get-started/pretraining) → NeMo Speech | diff --git a/fern/docs/pages/get-started/inference.mdx b/fern/docs/pages/get-started/inference.mdx index 8bcf376..3025e2d 100644 --- a/fern/docs/pages/get-started/inference.mdx +++ b/fern/docs/pages/get-started/inference.mdx @@ -7,7 +7,7 @@ position: 13 import StageGuide from "@/components/StageGuide"; -Libraries for **Inference** in the **NeMo Framework** model lifecycle — benchmarking, export, serving, and guardrails. For **agents**, see [NeMo Platform](#shipping-agents) below or start at [Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/). +Libraries for **Inference** in the **NeMo Framework** model lifecycle — benchmarking, export, serving, and guardrails. For **agents**, see [NeMo Platform](#shipping-agents) below or start at [Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/). ```mermaid flowchart LR @@ -39,7 +39,7 @@ Agent workflows usually start with Platform when you want an integrated CLI, SDK If you are building **agents**, [NeMo Platform](https://github.com/NVIDIA-NeMo/nemo-platform) brings evaluation, guardrails, tuning, and deployment into one workflow with a CLI, SDK, and Studio UI. Use Evaluator or Guardrails directly when you want one library; start with Platform when you want those loops wired together. - + Setup, CLI, and docs — evaluate, secure, and optimize agents with NeMo libraries. diff --git a/fern/docs/pages/get-started/installation.mdx b/fern/docs/pages/get-started/installation.mdx index 2107220..6a215e2 100644 --- a/fern/docs/pages/get-started/installation.mdx +++ b/fern/docs/pages/get-started/installation.mdx @@ -13,11 +13,11 @@ Use package installs for local development, notebooks, and lightweight experimen | Workload | Install | Docs | | --- | --- | --- | -| Hugging Face large language model (LLM) and vision language model (VLM) training | `pip install nemo-automodel` | [AutoModel](https://docs.nvidia.com/nemo-oss/automodel/latest/) | -| Alignment (DPO, GRPO, SFT) | NeMo RL repo | [NeMo RL](https://docs.nvidia.com/nemo-oss/rl/latest/) | -| Speech ASR/TTS | `pip install nemo_toolkit[asr,tts]` | [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) | -| Data curation | Curator repo | [Curator](https://docs.nvidia.com/nemo-oss/curator/latest/) | -| Integrated agent workflows | NeMo Platform repo | [NeMo Platform](https://nvidia-nemo.github.io/nemo-platform/main/) | +| Hugging Face large language model (LLM) and vision language model (VLM) training | `pip install nemo-automodel` | [AutoModel installation](https://docs.nvidia.com/nemo/automodel/latest/guides/installation.html) | +| Alignment (DPO, GRPO, SFT) | NeMo RL repo | [NeMo RL installation](https://docs.nvidia.com/nemo/rl/latest/about/installation.html) | +| Speech ASR/TTS | `pip install nemo_toolkit[asr,tts]` | [NeMo Speech installation](https://docs.nvidia.com/nemo/speech/nightly/starthere/install.html) | +| Data curation | Curator repo | [Curator installation](https://docs.nvidia.com/nemo/curator/get-started/installation) | +| Integrated agent workflows | NeMo Platform repo | [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | Each library publishes install extras and version pins in its own documentation. Use [Libraries](/about/libraries) to find the repo and docs site for your stage. @@ -42,7 +42,7 @@ Use NeMo Platform when you want an integrated agent workflow with CLI, SDK, and - + Follow the current setup path for agent evaluation, guardrails, tuning, and deployment. @@ -91,7 +91,7 @@ Skills, Nemotron recipes, NeMo Run. Use NeMo Run when setup choices need to carry into repeatable experiment launch and tracking. - + Launch and track experiments on local machines, SLURM, and Kubernetes. diff --git a/fern/docs/pages/get-started/quickstart.mdx b/fern/docs/pages/get-started/quickstart.mdx index b238883..c8d4654 100644 --- a/fern/docs/pages/get-started/quickstart.mdx +++ b/fern/docs/pages/get-started/quickstart.mdx @@ -13,13 +13,14 @@ Use this table to start from the outcome you want, then follow the linked docs f | First Result | Start Here | Good Fit | | --- | --- | --- | -| Fine-tune an LLM or VLM | [AutoModel quick start](https://docs.nvidia.com/nemo-oss/automodel/latest/launcher/local-workstation.html#quick-start-choose-your-job-launch-option) | Hugging Face models on local workstations or clusters. | -| Build or ship an agent workflow | [NeMo Platform quick start](https://nvidia-nemo.github.io/nemo-platform/main/) | Agent evaluation, guardrails, tuning, and deployment with CLI, SDK, and Studio. | -| Curate or prepare training data | [Curator docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | Text, image, video, or audio data pipelines. | -| Run post-training or RL | [NeMo RL docs](https://docs.nvidia.com/nemo-oss/rl/latest/) | SFT, DPO, GRPO, RL, and distillation workflows. | -| Evaluate or deploy a model | [Inference](/get-started/inference) | Evaluation, export, serving, and guardrails across the inference libraries. | -| Work on speech AI | [NeMo Speech docs](https://docs.nvidia.com/nemo-oss/speech/nightly/) | ASR, TTS, speech language models, and speech tutorials. | -| Launch repeatable experiments | [NeMo Run docs](https://docs.nvidia.com/nemo-oss/run/latest/) | Local, SLURM, and Kubernetes experiment orchestration. | +| Fine-tune an LLM or VLM | [AutoModel quick start](https://docs.nvidia.com/nemo/automodel/latest/launcher/local-workstation.html#quick-start-choose-your-job-launch-option) | Hugging Face models on local workstations or clusters. | +| Build or ship an agent workflow | [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | Agent evaluation, guardrails, tuning, and deployment with CLI, SDK, and Studio. | +| Curate or prepare training data | [Curator getting started](https://docs.nvidia.com/nemo/curator/get-started/) | Text, image, video, or audio data pipelines. | +| Run post-training or RL | [NeMo RL quick start](https://docs.nvidia.com/nemo/rl/latest/about/quick-start.html) | SFT, DPO, GRPO, RL, and distillation workflows. | +| Evaluate a model | [NeMo Evaluator Launcher quickstart](https://docs.nvidia.com/nemo/evaluator/latest/get-started/quickstart/launcher.html) | Fast benchmark runs with the Evaluator launcher. | +| Export or deploy a model | [Export-Deploy install and overview](https://docs.nvidia.com/nemo/export-deploy/latest/index.html) | Export paths, serving backends, and deployment setup. | +| Work on speech AI | [NeMo Speech five-minute inference](https://docs.nvidia.com/nemo/speech/nightly/starthere/ten_minutes.html) | ASR, TTS, speech language models, and speech tutorials. | +| Launch repeatable experiments | [NeMo Run quickstart](https://docs.nvidia.com/nemo/run/nightly/guides/quickstart.html) | Local, SLURM, and Kubernetes experiment orchestration. | ## Follow a Stage Path diff --git a/fern/docs/pages/get-started/runtime-chooser.mdx b/fern/docs/pages/get-started/runtime-chooser.mdx index 26ff7b8..17b385d 100644 --- a/fern/docs/pages/get-started/runtime-chooser.mdx +++ b/fern/docs/pages/get-started/runtime-chooser.mdx @@ -13,7 +13,7 @@ Choose a runtime after you know the library or task. Use this page to choose the | **Standalone container** | Focused workflows for libraries with dedicated images, such as AutoModel, RL, or Curator | [Container catalog](/about/release-notes/containers) and the library docs | | **pip or package install** | Local development, notebooks, small experiments, and integrations with existing Python environments | The linked library install page | | **Source checkout** | Contributing, debugging, running repo-local examples, or using unreleased code | The GitHub repo and `CONTRIBUTING.md` | -| **Platform setup** | Integrated agent workflows with CLI, SDK, and Studio | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | +| **Platform setup** | Integrated agent workflows with CLI, SDK, and Studio | [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | ## Runtime Flow diff --git a/fern/docs/pages/get-started/task-map.mdx b/fern/docs/pages/get-started/task-map.mdx index bc9894c..83c41a4 100644 --- a/fern/docs/pages/get-started/task-map.mdx +++ b/fern/docs/pages/get-started/task-map.mdx @@ -22,7 +22,7 @@ Use this map when you know the task but not the repo name. Each row points to a | Evaluate a model or agent | Inference | Evaluator | Framework container or library install | [Evaluator docs](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | | Export or serve a model | Inference | Export-Deploy | Framework container | [Export-Deploy docs](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | | Add guardrails to an app or agent | Inference | Guardrails | Guardrails install | [Guardrails docs](https://docs.nvidia.com/nemo-oss/guardrails/latest/) | -| Build integrated agent workflows | Inference | NeMo Platform | Platform setup | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | +| Build integrated agent workflows | Inference | NeMo Platform | Platform setup | [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | | Launch experiments on local, SLURM, or Kubernetes | E2E | NeMo Run | Framework container or library install | [NeMo Run docs](https://docs.nvidia.com/nemo-oss/run/latest/) | | Follow reference recipes and cookbooks | E2E | Skills or Nemotron | Repo README and recipe docs | [Skills docs](https://nvidia-nemo.github.io/Skills/), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron#readme) | diff --git a/nemo-fw-product-walkthrough.md b/nemo-fw-product-walkthrough.md index a3d4f1b..5f325b6 100644 --- a/nemo-fw-product-walkthrough.md +++ b/nemo-fw-product-walkthrough.md @@ -28,7 +28,7 @@ Prepare Data → Train the Model → Align / Improve → Evaluate Quality → De |---|---------|-----------------|-------|------| | 1 | [AutoModel](#1-automodel) | Fine-tune AI models with minimal setup | Training | [docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | | 2 | [Curator](#2-curator--video-curator) | Clean and filter training data at scale | Data | [docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | -| 3 | [Customizer](#3-customizer) | Fine-tune models via API (managed service) | Training | [docs](https://docs.nvidia.com/nemo-oss/microservices/latest/fine-tune/index.html) | +| 3 | [Customizer](#3-customizer) | Fine-tune models via API (managed service) | Training | Product docs | | 4 | [Data Designer](#4-data-designer) | Generate synthetic training data | Data | [docs](https://nvidia-nemo.github.io/DataDesigner/latest/) | | 5 | [Evaluator](#5-evaluator) | Benchmark model quality across 100+ tests | Evaluation | [docs](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | | 6 | [Gym](#6-gym) | Build practice environments for RL training | Alignment | [docs](https://docs.nvidia.com/nemo-oss/gym/latest/) | @@ -533,7 +533,6 @@ Not all products are documented in the same place: | Docs Host | Products | |-----------|----------| | `docs.nvidia.com/nemo-oss/...` | AutoModel, Megatron-Bridge, RL, Gym, Evaluator, Curator | -| `docs.nvidia.com/nemo-oss/microservices/...` | Customizer | | `docs.nvidia.com/Megatron-Core/` | MCORE | | `nvidia-nemo.github.io/...` | Data Designer, Skills | | `docs.nvidia.com/nemo-framework/user-guide/...` | Toolkit (Speech) | From 74349ef2e07760465cc7020b29f7ab850b4a0ad6 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:44:00 -0400 Subject: [PATCH 17/18] Audit NeMo OSS hub accuracy --- README.md | 4 +-- fern/TAXONOMY.md | 4 +-- fern/components/containers.ts | 12 ++++----- fern/components/repos.ts | 26 +++++++++---------- fern/docs/pages/about/architecture.mdx | 4 +-- fern/docs/pages/about/concepts/index.mdx | 2 +- .../training-backends-and-checkpoints.mdx | 8 +++--- fern/docs/pages/about/ecosystem.mdx | 10 +++---- fern/docs/pages/about/libraries.mdx | 2 +- fern/docs/pages/about/release-notes/index.mdx | 2 +- .../about/release-notes/known-issues.mdx | 20 +++++++------- fern/docs/pages/get-started/data.mdx | 2 +- fern/docs/pages/get-started/e2e.mdx | 2 +- fern/docs/pages/get-started/inference.mdx | 6 ++--- fern/docs/pages/get-started/pretraining.mdx | 8 +++--- fern/docs/pages/get-started/rl.mdx | 4 +-- fern/docs/pages/get-started/task-map.mdx | 22 ++++++++-------- fern/docs/pages/index.mdx | 4 +-- 18 files changed, 71 insertions(+), 71 deletions(-) diff --git a/README.md b/README.md index 73496db..4fcfc4d 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # NVIDIA-NeMo/.github -GitHub organization profile and NeMo OSS hub documentation (`docs.nvidia.com/nemo-oss`). +GitHub organization profile and staged NeMo OSS hub documentation. - **Org profile** — `profile/README.md` (shown on [github.com/NVIDIA-NeMo](https://github.com/NVIDIA-NeMo)) -- **Hub docs (Fern)** — `fern/` → [docs.nvidia.com/nemo-oss](https://docs.nvidia.com/nemo-oss) when published +- **Hub docs (Fern)** — `fern/` → planned `docs.nvidia.com/nemo-oss` publication target See [fern/README.md](fern/README.md) for local preview and publish steps. diff --git a/fern/TAXONOMY.md b/fern/TAXONOMY.md index a3ccc06..6dfee67 100644 --- a/fern/TAXONOMY.md +++ b/fern/TAXONOMY.md @@ -1,6 +1,6 @@ # NeMo OSS taxonomy -Canonical vocabulary for the Fern hub (`docs.nvidia.com/nemo-oss`), org README, and `components/repos.ts`. When copy disagrees, this file wins. +Canonical vocabulary for the staged Fern hub (`docs.nvidia.com/nemo-oss`), org README, and `components/repos.ts`. When copy disagrees, this file wins. ## Top-level map @@ -17,7 +17,7 @@ NVIDIA NeMo (commercial suite — OSS + microservices + NIM + services) | Term | Meaning | | --- | --- | | **NVIDIA NeMo** | Full software suite spanning open source libraries, commercial products, NIM, microservices, and services. | -| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) and documentation on **docs.nvidia.com/nemo-oss**. Entry point for choosing a stack, stage, library, or container. | +| **NeMo OSS** | Public open source in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), staged hub orientation, library documentation, and NGC containers. Entry point for choosing a stack, stage, library, or container. | | **NeMo Framework** | Named **model-lifecycle** stack: composable libraries from data through deployment, each with its own source and docs. | | **NeMo Framework container** | NGC image `nvcr.io/nvidia/nemo:`. Bundles Megatron-Bridge, Evaluator, Export-Deploy, Run, and NeMo Speech. | | **NeMo Platform** | [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform) — CLI, SDK, and Studio for **agent** evaluate / secure / tune / deploy. Composes libraries into an agent integration experience. | diff --git a/fern/components/containers.ts b/fern/components/containers.ts index 212938a..6bc3da9 100644 --- a/fern/components/containers.ts +++ b/fern/components/containers.ts @@ -66,7 +66,7 @@ export const FRAMEWORK_RECENT_RELEASES: FrameworkRelease[] = [ ]; export const SOFTWARE_VERSIONS_URL = - "https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/releases/software-versions.html"; + "https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html"; export const NGC_NEMO_TEAM_URL = "https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/containers"; @@ -82,7 +82,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ kind: "multi-library", stages: ["pretraining", "rl", "inference", "e2e"], latestTag: "26.02", - docsUrl: "https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/", + docsUrl: "https://docs.nvidia.com/nemo/megatron-bridge/latest/", bundledLibraries: ["Megatron-Bridge", "Evaluator", "Export-Deploy", "Run", "NeMo Speech"], tags: ["llm", "vlm", "speech", "megatron"], }, @@ -90,10 +90,10 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ name: "NeMo AutoModel", image: "nvcr.io/nvidia/nemo-automodel", ngcUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel", - description: "PyTorch-native distributed training for LLMs and VLMs with Hugging Face day-0 support.", + description: "PyTorch-native distributed training for LLMs and VLMs with Hugging Face support.", kind: "standalone", stages: ["pretraining"], - docsUrl: "https://docs.nvidia.com/nemo-oss/automodel/latest/", + docsUrl: "https://docs.nvidia.com/nemo/automodel/latest/", tags: ["llm", "vlm", "huggingface", "pytorch"], }, { @@ -103,7 +103,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ description: "Alignment and reinforcement learning — SFT, DPO, GRPO, and distillation.", kind: "standalone", stages: ["rl"], - docsUrl: "https://docs.nvidia.com/nemo-oss/rl/latest/", + docsUrl: "https://docs.nvidia.com/nemo/rl/latest/", tags: ["dpo", "grpo", "alignment"], }, { @@ -113,7 +113,7 @@ export const NEMO_CONTAINERS: NemoContainer[] = [ description: "Data preprocessing and curation for text, image, video, and audio at scale.", kind: "standalone", stages: ["data"], - docsUrl: "https://docs.nvidia.com/nemo-oss/curator/latest/", + docsUrl: "https://docs.nvidia.com/nemo/curator/latest/", tags: ["curation", "multimodal"], }, ]; diff --git a/fern/components/repos.ts b/fern/components/repos.ts index 125cd7e..455b9ff 100644 --- a/fern/components/repos.ts +++ b/fern/components/repos.ts @@ -14,7 +14,7 @@ */ /** NeMo Speech docs — use /latest/ when published; /nightly/ is current. */ -export const NEMO_SPEECH_DOCS_URL = "https://docs.nvidia.com/nemo-oss/speech/nightly/"; +export const NEMO_SPEECH_DOCS_URL = "https://docs.nvidia.com/nemo/speech/nightly/"; /** Lifecycle stage — matches profile/README.md "Libraries by stage" columns. */ export type RepoStage = "data" | "pretraining" | "rl" | "inference" | "e2e"; @@ -66,7 +66,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "data", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Curator", - docsUrl: "https://docs.nvidia.com/nemo-oss/curator/latest/", + docsUrl: "https://docs.nvidia.com/nemo/curator/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator", tags: ["multimodal", "curation"], }, @@ -114,11 +114,11 @@ export const NEMO_REPOS: NemoRepo[] = [ // Pretraining { name: "Automodel", - description: "PyTorch distributed training for LLMs/VLMs with day-0 Hugging Face support.", + description: "PyTorch distributed training for LLMs/VLMs with Hugging Face support.", stage: "pretraining", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Automodel", - docsUrl: "https://docs.nvidia.com/nemo-oss/automodel/latest/", + docsUrl: "https://docs.nvidia.com/nemo/automodel/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel", tags: ["llm", "vlm", "huggingface", "pytorch"], }, @@ -128,7 +128,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "pretraining", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Megatron-Bridge", - docsUrl: "https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/", + docsUrl: "https://docs.nvidia.com/nemo/megatron-bridge/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["llm", "vlm", "megatron"], }, @@ -147,7 +147,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "pretraining", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Emerging-Optimizers", - docsUrl: "https://docs.nvidia.com/nemo-oss/emerging-optimizers/latest/index.html", + docsUrl: "https://docs.nvidia.com/nemo/emerging-optimizers/latest/index.html", tags: ["optimizers"], }, { @@ -167,7 +167,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "rl", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/RL", - docsUrl: "https://docs.nvidia.com/nemo-oss/rl/latest/", + docsUrl: "https://docs.nvidia.com/nemo/rl/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl", tags: ["dpo", "grpo", "alignment", "agents"], }, @@ -177,7 +177,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "rl", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Gym", - docsUrl: "https://docs.nvidia.com/nemo-oss/gym/latest/index.html", + docsUrl: "https://docs.nvidia.com/nemo/gym/main/about/", tags: ["environments", "agents"], }, { @@ -192,11 +192,11 @@ export const NEMO_REPOS: NemoRepo[] = [ // Inference { name: "Evaluator", - description: "Scalable, reproducible evaluation across 100+ benchmarks and harnesses.", + description: "Scalable, reproducible evaluation across benchmark suites and harnesses.", stage: "inference", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Evaluator", - docsUrl: "https://docs.nvidia.com/nemo-oss/evaluator/latest/", + docsUrl: "https://docs.nvidia.com/nemo/evaluator/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["evaluation", "benchmarks"], }, @@ -206,7 +206,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "inference", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Export-Deploy", - docsUrl: "https://docs.nvidia.com/nemo-oss/export-deploy/latest/", + docsUrl: "https://docs.nvidia.com/nemo/export-deploy/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["deployment", "serving", "vllm"], }, @@ -216,7 +216,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "inference", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Guardrails", - docsUrl: "https://docs.nvidia.com/nemo-oss/guardrails/latest/", + docsUrl: "https://docs.nvidia.com/nemo/guardrails/latest/", tags: ["safety", "agents"], }, { @@ -254,7 +254,7 @@ export const NEMO_REPOS: NemoRepo[] = [ stage: "e2e", kind: "library", githubUrl: "https://github.com/NVIDIA-NeMo/Run", - docsUrl: "https://docs.nvidia.com/nemo-oss/run/latest/", + docsUrl: "https://docs.nvidia.com/nemo/run/latest/", containerUrl: "https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo", tags: ["orchestration", "experiments"], }, diff --git a/fern/docs/pages/about/architecture.mdx b/fern/docs/pages/about/architecture.mdx index 6d5f937..60411b5 100644 --- a/fern/docs/pages/about/architecture.mdx +++ b/fern/docs/pages/about/architecture.mdx @@ -22,7 +22,7 @@ The architecture is easiest to read as libraries, the Framework stack, and the P | --- | --- | --- | | **Libraries** | 22 repos in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), each with its own source, releases, and docs | `pip install`, per-library docs, or standalone NGC images | | **NeMo Framework** | Model-lifecycle stack — the pipeline below | Pick libraries by stage, or pull the multi-library Framework container | -| **NeMo Platform** | Agent product — CLI, SDK, Studio | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | +| **NeMo Platform** | Agent product — CLI, SDK, Studio | Follow [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | ```mermaid flowchart TB @@ -147,7 +147,7 @@ Platform has its own CLI, SDK, and Studio experience. Refer to [Framework and Pl - + Setup, CLI reference, and API for agent hardening and evaluation. diff --git a/fern/docs/pages/about/concepts/index.mdx b/fern/docs/pages/about/concepts/index.mdx index c0fa1ef..9283925 100644 --- a/fern/docs/pages/about/concepts/index.mdx +++ b/fern/docs/pages/about/concepts/index.mdx @@ -11,7 +11,7 @@ For short term definitions and acronyms, use the [Glossary](/resources/glossary) ## Concept Map -NeMo OSS is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), documentation on **docs.nvidia.com/nemo-oss**, and the NGC containers that package common runtime paths. +NeMo OSS is the public open source side of NVIDIA NeMo: repositories in [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo), library documentation, and the NGC containers that package common runtime paths. Use this site to choose a stack, stage, library, or container, then follow the linked library docs for usage details. diff --git a/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx b/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx index 32b89fb..6a6a72a 100644 --- a/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx +++ b/fern/docs/pages/about/concepts/training-backends-and-checkpoints.mdx @@ -13,8 +13,8 @@ Choose the training path based on scale, source model format, and downstream che | Path | Start with | Typical fit | | --- | --- | --- | -| **PyTorch / Hugging Face** | [AutoModel](https://docs.nvidia.com/nemo-oss/automodel/latest/) | Fine-tuning, research iteration, and training up to roughly 1,000 GPUs. | -| **Megatron-Core** | [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | Large-scale pretraining, SFT, and Hugging Face to Megatron checkpoint conversion. | +| **PyTorch / Hugging Face** | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | Fine-tuning, research iteration, and training up to roughly 1,000 GPUs. | +| **Megatron-Core** | [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | Large-scale pretraining, SFT, and Hugging Face to Megatron checkpoint conversion. | Both paths can feed downstream RL, evaluation, and export workflows. @@ -37,9 +37,9 @@ Checkpoint format affects the rest of the workflow. Check whether downstream RL, | Question | Good next page | | --- | --- | | Which training library should I start with? | [Pretraining guide](/get-started/pretraining) | -| How do I convert between Hugging Face and Megatron checkpoints? | [Megatron-Bridge docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | +| How do I convert between Hugging Face and Megatron checkpoints? | [Megatron-Bridge docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | | Can my trained model run through RL or evaluation? | [RL guide](/get-started/rl), [Inference guide](/get-started/inference) | -| How do I export or serve a model? | [Export-Deploy docs](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | +| How do I export or serve a model? | [Export-Deploy docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | ## Next Step diff --git a/fern/docs/pages/about/ecosystem.mdx b/fern/docs/pages/about/ecosystem.mdx index eb71fb1..9d3cebc 100644 --- a/fern/docs/pages/about/ecosystem.mdx +++ b/fern/docs/pages/about/ecosystem.mdx @@ -61,8 +61,8 @@ These names sound similar because they share libraries, but they serve different | --- | --- | --- | | **What it is** | Model-lifecycle **libraries** — data, training, RL, evaluation, export, guardrails | Agent **integration product** — CLI, Python SDK, and web UI | | **You adopt it when…** | You are training, aligning, evaluating, or deploying **models** | You are shipping **agents** and want evaluate / secure / tune / deploy in one setup | -| **Typical entry** | Stage guide → library docs, or Framework NGC container | Clone [nemo-platform](https://github.com/NVIDIA-NeMo/nemo-platform), run `nemo setup` | -| **Docs** | NeMo OSS pages and library docs | [NeMo Platform docs](https://nvidia-nemo.github.io/nemo-platform/main/) | +| **Typical entry** | Stage guide → library docs, or Framework NGC container | Follow [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | +| **Docs** | NeMo OSS pages and library docs | [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | | **Relationship** | Libraries are the building blocks | Composes Guardrails, Evaluator, Data Designer, and others for agent workflows | **NeMo Framework** also names the multi-library NGC container (`nvcr.io/nvidia/nemo`). Refer to [Framework and Platform](/about/concepts/framework-and-platform) for naming details. @@ -99,10 +99,10 @@ Both train large language models (LLMs) and vision language models (VLMs) on NVI | --- | --- | --- | | **Stack** | PyTorch / Hugging Face native | Megatron-Core | | **Typical scale** | 1–1,000 GPUs | 1,000+ GPUs | -| **Checkpoint flow** | HF models day-0 | HF ↔ Megatron conversion | +| **Checkpoint flow** | Hugging Face checkpoints | HF ↔ Megatron conversion | | **Best for** | Fine-tuning, research, rapid iteration | Large-scale pretraining and SFT | -Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) directly — the [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo is speech-only today. +Speech workloads often start with [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) directly — the [NeMo](https://github.com/NVIDIA-NeMo/NeMo) repo is speech-only today. ## Related Entry Points @@ -126,7 +126,7 @@ Search and filter all 22 NVIDIA-NeMo repositories. Quickstart, installation, and guides by stage. - + Agent evaluate, harden, tune, and deploy — CLI, SDK, and Studio. diff --git a/fern/docs/pages/about/libraries.mdx b/fern/docs/pages/about/libraries.mdx index 75b6ffe..9cce256 100644 --- a/fern/docs/pages/about/libraries.mdx +++ b/fern/docs/pages/about/libraries.mdx @@ -62,4 +62,4 @@ Each card shows a **stage** and a **kind**: Use **tags** on each card (or the search box) for cross-cutting facets like `speech`, `evaluation`, or `agents`. -Some cards link directly to library docs; others link to the best available README or product documentation. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/). +Some cards link directly to library docs; others link to the best available README or product documentation. Speech AI documentation is at [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/). diff --git a/fern/docs/pages/about/release-notes/index.mdx b/fern/docs/pages/about/release-notes/index.mdx index 30ecec7..9b5ed48 100644 --- a/fern/docs/pages/about/release-notes/index.mdx +++ b/fern/docs/pages/about/release-notes/index.mdx @@ -31,7 +31,7 @@ Use these links when you need component versions or cross-component issue notes - + PyTorch, Megatron-Core, Transformer Engine, and bundled library versions per container — canonical for 26.02+. diff --git a/fern/docs/pages/about/release-notes/known-issues.mdx b/fern/docs/pages/about/release-notes/known-issues.mdx index 6743de7..c766a8b 100644 --- a/fern/docs/pages/about/release-notes/known-issues.mdx +++ b/fern/docs/pages/about/release-notes/known-issues.mdx @@ -13,7 +13,7 @@ Known issues for **NeMo Framework** NGC containers (`nvcr.io/nvidia/nemo`). Find Recent container tags and bundled component versions. - + Pinned package versions for 26.02+ containers. @@ -23,19 +23,19 @@ Pinned package versions for 26.02+ containers. See component release notes for library-specific known issues: -- [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) -- [Export-Deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) -- [Run](https://docs.nvidia.com/nemo-oss/run/latest/) -- [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) +- [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +- [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) +- [Run](https://docs.nvidia.com/nemo/run/latest/) +- [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) ## 25.11 See component release notes for library-specific known issues: -- [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) -- [Export-Deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) -- [Run](https://docs.nvidia.com/nemo-oss/run/latest/) -- [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) +- [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) +- [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/) +- [Run](https://docs.nvidia.com/nemo/run/latest/) +- [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) ## 25.09 @@ -121,7 +121,7 @@ These notes apply to framework-level training behavior in older releases. - The Megatron Core Distributed Optimizer currently lacks memory capacity optimization, resulting in higher model state memory usage at small data parallel sizes. - The overlap of the data-parallel parameter AllGather with `optimizer.step` (`overlap_param_gather_with_optimizer=true`) does not work with distributed checkpointing. - Support for converting models from NeMo 2.0 to 1.0 is not yet available. -- Transformer Engine changed checkpoint metadata after v1.10, which can cause checkpoint incompatibilities. **Workaround:** use `model.dist_ckpt_load_strictness=log_all` when working with Transformer Engine v1.10 or higher. See [software component versions](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/releases/software-versions.html) for TE versions per container. +- Transformer Engine changed checkpoint metadata after v1.10, which can cause checkpoint incompatibilities. **Workaround:** use `model.dist_ckpt_load_strictness=log_all` when working with Transformer Engine v1.10 or higher. See [software component versions](https://docs.nvidia.com/nemo/megatron-bridge/latest/releases/software-versions.html) for TE versions per container. - For data preparation of GPT models, use your own dataset or an online dataset legally approved by your organization. - A race condition in the NeMo experiment manager can occur when multiple processes or threads attempt to access and modify shared resources simultaneously. - The Mistral and Mixtral tokenizers require a Hugging Face login. diff --git a/fern/docs/pages/get-started/data.mdx b/fern/docs/pages/get-started/data.mdx index 907857c..aeca868 100644 --- a/fern/docs/pages/get-started/data.mdx +++ b/fern/docs/pages/get-started/data.mdx @@ -27,7 +27,7 @@ flowchart LR Most data workflows move from raw inputs to curated, protected, or synthetic datasets that downstream training and RL libraries can consume. -1. **Curate** raw corpora with [Curator](https://docs.nvidia.com/nemo-oss/curator/latest/) (dedup, filtering, multimodal pipelines). +1. **Curate** raw corpora with [Curator](https://docs.nvidia.com/nemo/curator/latest/) (dedup, filtering, multimodal pipelines). 2. **Generate** synthetic data with [Data Designer](https://nvidia-nemo.github.io/DataDesigner/latest/) or domain SDG tools. 3. **Protect** sensitive fields with [Anonymizer](https://github.com/NVIDIA-NeMo/Anonymizer) before sharing or training. diff --git a/fern/docs/pages/get-started/e2e.mdx b/fern/docs/pages/get-started/e2e.mdx index 870ef4a..500db4c 100644 --- a/fern/docs/pages/get-started/e2e.mdx +++ b/fern/docs/pages/get-started/e2e.mdx @@ -27,7 +27,7 @@ flowchart LR Use orchestration libraries when you need repeatable launches across local machines, SLURM, or Kubernetes. - + Configure, launch, and manage experiments on local machines, SLURM, and Kubernetes. diff --git a/fern/docs/pages/get-started/inference.mdx b/fern/docs/pages/get-started/inference.mdx index 3025e2d..efd7c77 100644 --- a/fern/docs/pages/get-started/inference.mdx +++ b/fern/docs/pages/get-started/inference.mdx @@ -27,9 +27,9 @@ flowchart LR Most model inference workflows move from evaluation to export or deployment, with guardrails added where application behavior needs control. -1. **Evaluate** with [Evaluator](https://docs.nvidia.com/nemo-oss/evaluator/latest/) across 100+ harnesses. -2. **Export** to vLLM, TensorRT-LLM, or ONNX with [Export-Deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/). -3. **Guard** production apps with [Guardrails](https://docs.nvidia.com/nemo-oss/guardrails/latest/). +1. **Evaluate** with [Evaluator](https://docs.nvidia.com/nemo/evaluator/latest/) across benchmark suites and harnesses. +2. **Export** to vLLM, TensorRT-LLM, or ONNX with [Export-Deploy](https://docs.nvidia.com/nemo/export-deploy/latest/). +3. **Guard** production apps with [Guardrails](https://docs.nvidia.com/nemo/guardrails/latest/). Models usually come from [Pretraining](/get-started/pretraining) or [RL](/get-started/rl). For bundled Framework container versions, refer to [Container releases](/about/release-notes/containers). diff --git a/fern/docs/pages/get-started/pretraining.mdx b/fern/docs/pages/get-started/pretraining.mdx index 3deb9db..ab8e291 100644 --- a/fern/docs/pages/get-started/pretraining.mdx +++ b/fern/docs/pages/get-started/pretraining.mdx @@ -29,12 +29,12 @@ Choose the pretraining path based on model format, target scale, and whether you | Goal | GPUs | Library | | --- | --- | --- | -| Fine-tune Hugging Face LLMs and VLMs | 1–1,000 | [AutoModel](https://docs.nvidia.com/nemo-oss/automodel/latest/) | -| Large-scale pretrain / SFT | 1,000+ | [Megatron-Bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | -| Speech ASR, TTS, speech-LM | Any | [NeMo Speech](https://docs.nvidia.com/nemo-oss/speech/nightly/) | +| Fine-tune Hugging Face LLMs and VLMs | 1–1,000 | [AutoModel](https://docs.nvidia.com/nemo/automodel/latest/) | +| Large-scale pretrain / SFT | 1,000+ | [Megatron-Bridge](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Speech ASR, TTS, speech-LM | Any | [NeMo Speech](https://docs.nvidia.com/nemo/speech/nightly/) | Fastest first run: [Quickstart](/get-started/quickstart). Install details: [Installation](/get-started/installation). -Model recipes, example configs, and supported architectures live on each library's docs site — for example [AutoModel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples) and [Megatron-Bridge recipes](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/). +Model recipes, example configs, and supported architectures live on each library's docs site — for example [AutoModel examples](https://github.com/NVIDIA-NeMo/Automodel/tree/main/examples) and [Megatron-Bridge recipes](https://docs.nvidia.com/nemo/megatron-bridge/latest/). Post-training alignment: [RL](/get-started/rl). diff --git a/fern/docs/pages/get-started/rl.mdx b/fern/docs/pages/get-started/rl.mdx index a1b1bf4..46b46bc 100644 --- a/fern/docs/pages/get-started/rl.mdx +++ b/fern/docs/pages/get-started/rl.mdx @@ -29,8 +29,8 @@ Use these entry points to choose between post-training algorithms, RL environmen | Technique | Start in docs | Library | | --- | --- | --- | -| GRPO, DPO, SFT | [NeMo RL examples](https://docs.nvidia.com/nemo-oss/rl/latest/) | NeMo RL | -| RL environments | [NeMo Gym](https://docs.nvidia.com/nemo-oss/gym/latest/index.html) | Gym | +| GRPO, DPO, SFT | [NeMo RL examples](https://docs.nvidia.com/nemo/rl/latest/) | NeMo RL | +| RL environments | [NeMo Gym](https://docs.nvidia.com/nemo/gym/main/about/) | Gym | Train base models first through [Pretraining](/get-started/pretraining), then align here. Evaluate with [Inference](/get-started/inference) libraries. diff --git a/fern/docs/pages/get-started/task-map.mdx b/fern/docs/pages/get-started/task-map.mdx index 83c41a4..0baf041 100644 --- a/fern/docs/pages/get-started/task-map.mdx +++ b/fern/docs/pages/get-started/task-map.mdx @@ -9,21 +9,21 @@ Use this map when you know the task but not the repo name. Each row points to a | I want to... | Stage | Start with | Runtime path | Next docs | | --- | --- | --- | --- | --- | -| Curate text, image, video, or audio data | Data | Curator | Curator install or Curator container | [Curator docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | +| Curate text, image, video, or audio data | Data | Curator | Curator install or Curator container | [Curator docs](https://docs.nvidia.com/nemo/curator/latest/) | | Generate synthetic data | Data / E2E | Data Designer or Skills | Library install | [Data Designer docs](https://nvidia-nemo.github.io/DataDesigner/latest/), [Skills docs](https://nvidia-nemo.github.io/Skills/) | | Protect or anonymize sensitive data | Data | Anonymizer or Safe-Synthesizer | Library docs | [Libraries](/about/libraries) | -| Fine-tune a Hugging Face LLM or VLM | Pretraining | AutoModel | AutoModel container or pip | [AutoModel docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | -| Train at Megatron scale | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | -| Convert Hugging Face and Megatron checkpoints | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge conversion docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | -| Build speech AI workflows | Pretraining | NeMo Speech | NeMo Speech install | [NeMo Speech docs](https://docs.nvidia.com/nemo-oss/speech/nightly/) | -| Run SFT, DPO, GRPO, distillation, or RL | RL | NeMo RL | NeMo RL container or source checkout | [NeMo RL docs](https://docs.nvidia.com/nemo-oss/rl/latest/) | -| Build RL environments for models or agents | RL | Gym | Gym install | [Gym docs](https://docs.nvidia.com/nemo-oss/gym/latest/index.html) | +| Fine-tune a Hugging Face LLM or VLM | Pretraining | AutoModel | AutoModel container or pip | [AutoModel docs](https://docs.nvidia.com/nemo/automodel/latest/) | +| Train at Megatron scale | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Convert Hugging Face and Megatron checkpoints | Pretraining | Megatron-Bridge | Framework container | [Megatron-Bridge conversion docs](https://docs.nvidia.com/nemo/megatron-bridge/latest/) | +| Build speech AI workflows | Pretraining | NeMo Speech | NeMo Speech install | [NeMo Speech docs](https://docs.nvidia.com/nemo/speech/nightly/) | +| Run SFT, DPO, GRPO, distillation, or RL | RL | NeMo RL | NeMo RL container or source checkout | [NeMo RL docs](https://docs.nvidia.com/nemo/rl/latest/) | +| Build RL environments for models or agents | RL | Gym | Gym install | [Gym docs](https://docs.nvidia.com/nemo/gym/main/about/) | | Run multi-turn agent rollouts | RL | ProRL Agent Server | Source checkout | [ProRL Agent Server](https://github.com/NVIDIA-NeMo/ProRL-Agent-Server#readme) | -| Evaluate a model or agent | Inference | Evaluator | Framework container or library install | [Evaluator docs](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | -| Export or serve a model | Inference | Export-Deploy | Framework container | [Export-Deploy docs](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | -| Add guardrails to an app or agent | Inference | Guardrails | Guardrails install | [Guardrails docs](https://docs.nvidia.com/nemo-oss/guardrails/latest/) | +| Evaluate a model or agent | Inference | Evaluator | Framework container or library install | [Evaluator docs](https://docs.nvidia.com/nemo/evaluator/latest/) | +| Export or serve a model | Inference | Export-Deploy | Framework container | [Export-Deploy docs](https://docs.nvidia.com/nemo/export-deploy/latest/) | +| Add guardrails to an app or agent | Inference | Guardrails | Guardrails install | [Guardrails docs](https://docs.nvidia.com/nemo/guardrails/latest/) | | Build integrated agent workflows | Inference | NeMo Platform | Platform setup | [NeMo Platform setup](https://nvidia-nemo.github.io/nemo-platform/main/get-started/setup/) | -| Launch experiments on local, SLURM, or Kubernetes | E2E | NeMo Run | Framework container or library install | [NeMo Run docs](https://docs.nvidia.com/nemo-oss/run/latest/) | +| Launch experiments on local, SLURM, or Kubernetes | E2E | NeMo Run | Framework container or library install | [NeMo Run docs](https://docs.nvidia.com/nemo/run/latest/) | | Follow reference recipes and cookbooks | E2E | Skills or Nemotron | Repo README and recipe docs | [Skills docs](https://nvidia-nemo.github.io/Skills/), [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron#readme) | ## Choosing the Runtime diff --git a/fern/docs/pages/index.mdx b/fern/docs/pages/index.mdx index fa769b0..20b684a 100644 --- a/fern/docs/pages/index.mdx +++ b/fern/docs/pages/index.mdx @@ -4,7 +4,7 @@ subtitle: Open Source Libraries From the NVIDIA-NeMo GitHub Organization slug: "" --- -**NeMo OSS** brings NVIDIA's open source NeMo work together in one place: the [NVIDIA-NeMo](https://github.com/NVIDIA-NeMo) GitHub organization and **docs.nvidia.com/nemo-oss**. Use these pages to find the right library, stack, or container for your workflow: +**NeMo OSS** brings NVIDIA's open source NeMo work together in one staged hub: GitHub repositories, library documentation, NGC containers, and orientation pages for choosing the right workflow. Use these pages to find the right library, stack, or container: - **NeMo Framework** — build and run the model lifecycle: data, training, alignment, evaluation, and deployment. - **NeMo Platform** — integrate agent workflows: evaluate, secure, tune, and deploy agents. @@ -41,7 +41,7 @@ Use these cards when you already know whether you are building model workflows o Train and deploy models — quickstart, install, and guides by lifecycle stage. - + Ship agents — CLI, SDK, and Studio UI over NeMo libraries. From 991b6587a9db3ef83f1fb167356316583bcdb337 Mon Sep 17 00:00:00 2001 From: Lawrence Lane Date: Fri, 29 May 2026 15:47:58 -0400 Subject: [PATCH 18/18] Update NeMo OSS Fern URLs --- README.md | 2 +- fern/README.md | 4 +-- fern/TAXONOMY.md | 2 +- fern/docs.yml | 54 ++++++++++++++++----------------- nemo-fw-presentation-outline.md | 40 ++++++++++++------------ nemo-fw-product-walkthrough.md | 30 +++++++++--------- 6 files changed, 66 insertions(+), 66 deletions(-) diff --git a/README.md b/README.md index 4fcfc4d..b67130c 100644 --- a/README.md +++ b/README.md @@ -3,6 +3,6 @@ GitHub organization profile and staged NeMo OSS hub documentation. - **Org profile** — `profile/README.md` (shown on [github.com/NVIDIA-NeMo](https://github.com/NVIDIA-NeMo)) -- **Hub docs (Fern)** — `fern/` → planned `docs.nvidia.com/nemo-oss` publication target +- **Hub docs (Fern)** — `fern/` → planned `docs.nvidia.com/nemo/oss` publication target See [fern/README.md](fern/README.md) for local preview and publish steps. diff --git a/fern/README.md b/fern/README.md index 135b577..b213472 100644 --- a/fern/README.md +++ b/fern/README.md @@ -30,8 +30,8 @@ Per-library docs own commands, APIs, tutorials, model support, and version-speci Published targets are configured in [docs.yml](./docs.yml): -- Preview: `nemo-framework.docs.buildwithfern.com/nemo` -- Production: `docs.nvidia.com/nemo-oss` +- Preview: `nemo.docs.buildwithfern.com/oss` +- Production: `docs.nvidia.com/nemo/oss` ## Site Shape diff --git a/fern/TAXONOMY.md b/fern/TAXONOMY.md index 6dfee67..a06babf 100644 --- a/fern/TAXONOMY.md +++ b/fern/TAXONOMY.md @@ -1,6 +1,6 @@ # NeMo OSS taxonomy -Canonical vocabulary for the staged Fern hub (`docs.nvidia.com/nemo-oss`), org README, and `components/repos.ts`. When copy disagrees, this file wins. +Canonical vocabulary for the staged Fern hub (`docs.nvidia.com/nemo/oss`), org README, and `components/repos.ts`. When copy disagrees, this file wins. ## Top-level map diff --git a/fern/docs.yml b/fern/docs.yml index 0dfa64d..5fe02a6 100644 --- a/fern/docs.yml +++ b/fern/docs.yml @@ -1,15 +1,15 @@ # yaml-language-server: $schema=https://schema.buildwithfern.dev/docs-yml.json instances: - - url: nemo-framework.docs.buildwithfern.com/nemo - custom-domain: docs.nvidia.com/nemo-oss + - url: nemo.docs.buildwithfern.com/oss + custom-domain: docs.nvidia.com/nemo/oss title: NeMo OSS global-theme: nvidia logo: - href: /nemo + href: /oss right-text: NeMo OSS # GitHub org link in the top-right header (Fern `github` navbar button). @@ -27,30 +27,30 @@ experimental: - ./components redirects: - - source: "/nemo/repositories" - destination: "/nemo/libraries" - - source: "/nemo/libraries" - destination: "/nemo/about/libraries" - - source: "/nemo/containers" - destination: "/nemo/about/release-notes/containers" - - source: "/nemo/about/containers" - destination: "/nemo/about/release-notes/containers" - - source: "/nemo/release-notes" - destination: "/nemo/about/release-notes" - - source: "/nemo/releases" - destination: "/nemo/about/release-notes" - - source: "/nemo/getting-started" - destination: "/nemo/get-started" - - source: "/nemo/community" - destination: "/nemo/resources/community" - - source: "/nemo/index.html" - destination: "/nemo" - - source: "/nemo/index" - destination: "/nemo" - - source: "/nemo/:path*/index.html" - destination: "/nemo/:path*" - - source: "/nemo/:path*.html" - destination: "/nemo/:path*" + - source: "/oss/repositories" + destination: "/oss/libraries" + - source: "/oss/libraries" + destination: "/oss/about/libraries" + - source: "/oss/containers" + destination: "/oss/about/release-notes/containers" + - source: "/oss/about/containers" + destination: "/oss/about/release-notes/containers" + - source: "/oss/release-notes" + destination: "/oss/about/release-notes" + - source: "/oss/releases" + destination: "/oss/about/release-notes" + - source: "/oss/getting-started" + destination: "/oss/get-started" + - source: "/oss/community" + destination: "/oss/resources/community" + - source: "/oss/index.html" + destination: "/oss" + - source: "/oss/index" + destination: "/oss" + - source: "/oss/:path*/index.html" + destination: "/oss/:path*" + - source: "/oss/:path*.html" + destination: "/oss/:path*" navigation: - section: About diff --git a/nemo-fw-presentation-outline.md b/nemo-fw-presentation-outline.md index e1b7c07..9c6454e 100644 --- a/nemo-fw-presentation-outline.md +++ b/nemo-fw-presentation-outline.md @@ -59,7 +59,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Audio:** ASR transcription, WER filtering, quality assessment. - Powered by **NVIDIA RAPIDS** (cuDF, cuML, cuGraph) + Ray for multi-node scaling. - Proven results: 16x faster fuzzy dedup on 8 TB dataset; 40% lower TCO vs CPU. -- **Docs:** [docs.nvidia.com/nemo-oss/curator](https://docs.nvidia.com/nemo-oss/curator/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/curator](https://docs.nvidia.com/nemo/oss/curator/latest/) - **Container:** [NGC NeMo Curator](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-curator) ### 3b. NeMo Data Designer @@ -111,7 +111,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - Actively developed — new model support weekly (MiniMax-M2, DeepSeek V3.2, Step 3.5-flash in Feb 2026). - Install: `pip install nemo-automodel` or `uv sync`. - **Launch options:** `torchrun`, `automodel` CLI (interactive + SLURM), Kubernetes (coming). -- **Docs:** [docs.nvidia.com/nemo-oss/automodel](https://docs.nvidia.com/nemo-oss/automodel/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/automodel](https://docs.nvidia.com/nemo/oss/automodel/latest/) - **Container:** [NGC NeMo AutoModel](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-automodel) ### 4b. NeMo Megatron-Bridge (Scale — 1K+ GPUs) @@ -126,7 +126,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Model coverage:** Llama 2–3.3, Qwen 2–3 (incl. MoE and VL), DeepSeek V2/V3, Gemma/Gemma 3 VL, Nemotron-H, Nemotron Nano v2/VL, GPT-OSS, GLM-4.5, Mistral/Ministral, Moonlight, OlMoE. - **PyTorch-native training loop** — refactored from the legacy NeMo training stack for greater flexibility. - Community adoptions: VeRL, Slime, SkyRL, Mind Lab (trained trillion-parameter GRPO LoRA on 64 H800s). -- **Docs:** [docs.nvidia.com/nemo-oss/megatron-bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/megatron-bridge](https://docs.nvidia.com/nemo/oss/megatron-bridge/latest/) - **Container:** [NGC NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) ### 4c. NeMo Speech @@ -154,7 +154,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Repo:** [NVIDIA-NeMo/Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) - **What it does:** Collection of cutting-edge optimizers (e.g., Muon, Dion) for use across training libraries. -- **Docs:** [docs.nvidia.com/nemo-oss/emerging-optimizers](https://docs.nvidia.com/nemo-oss/emerging-optimizers/latest/index.html) +- **Docs:** [docs.nvidia.com/nemo/oss/emerging-optimizers](https://docs.nvidia.com/nemo/oss/emerging-optimizers/latest/index.html) --- @@ -178,7 +178,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - Used to train [Nemotron-3-Nano-30B](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8). - Latest release: v0.5.0 (Jan 2026) with LoRA support for DTensor and Megatron backends. - Install: `uv venv && uv run python examples/run_grpo.py` -- **Docs:** [docs.nvidia.com/nemo-oss/rl](https://docs.nvidia.com/nemo-oss/rl/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/rl](https://docs.nvidia.com/nemo/oss/rl/latest/) - **Container:** [NGC NeMo RL](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo-rl) ### 5b. NeMo Gym @@ -197,7 +197,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - Integrates with NeMo RL and other training frameworks. - Responses API-based agent architecture. - Early development — APIs evolving. -- **Docs:** [docs.nvidia.com/nemo-oss/gym](https://docs.nvidia.com/nemo-oss/gym/latest/index.html) +- **Docs:** [docs.nvidia.com/nemo/oss/gym](https://docs.nvidia.com/nemo/oss/gym/latest/index.html) --- @@ -221,7 +221,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Reproducibility by default:** All configs, seeds, and software provenance captured automatically. - **Scale anywhere:** Local machine, SLURM, Lepton AI, cloud-native backends. - Install: `pip install nemo-evaluator-launcher` -- **Docs:** [docs.nvidia.com/nemo-oss/evaluator](https://docs.nvidia.com/nemo-oss/evaluator/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/evaluator](https://docs.nvidia.com/nemo/oss/evaluator/latest/) ### 6b. NeMo Skills (Evaluation Side) @@ -244,7 +244,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Multi-GPU / Multi-instance** deployment support. - Serves as the bridge from training to production inference. - Install: `pip install nemo-export-deploy` (lightweight) or use NeMo Framework container for full features. -- **Docs:** [docs.nvidia.com/nemo-oss/export-deploy](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/export-deploy](https://docs.nvidia.com/nemo/oss/export-deploy/latest/) - **Container:** Included in [NGC NeMo Framework](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo) ### 7b. NeMo Guardrails @@ -261,7 +261,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - OpenAI-compatible server endpoint at `/v1/chat/completions`. - Published in EMNLP 2023 — academic paper available. - Latest version: 0.20.0. -- **Docs:** [docs.nvidia.com/nemo-oss/guardrails](https://docs.nvidia.com/nemo-oss/guardrails) +- **Docs:** [docs.nvidia.com/nemo/oss/guardrails](https://docs.nvidia.com/nemo/oss/guardrails) --- @@ -278,7 +278,7 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo - **Modular:** Decouple task from executor; reuse environment configs across tasks. - Built on Fiddle (Google), TorchX, Skypilot, XManager. - Pre-release — API subject to change before v1.0. -- **Docs:** [docs.nvidia.com/nemo-oss/run](https://docs.nvidia.com/nemo-oss/run/latest/) +- **Docs:** [docs.nvidia.com/nemo/oss/run](https://docs.nvidia.com/nemo/oss/run/latest/) ### 8b. Nemotron (Models & Recipes) @@ -347,18 +347,18 @@ Data ──▶ Training ──▶ Alignment ──▶ Evaluation ──▶ Deplo | Repo | Stage | Stars | One-Liner | Docs | |------|-------|-------|-----------|------| -| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data | 1,394 | GPU-accelerated data curation (text, image, video, audio) | [link](https://docs.nvidia.com/nemo-oss/curator/latest/) | +| [Curator](https://github.com/NVIDIA-NeMo/Curator) | Data | 1,394 | GPU-accelerated data curation (text, image, video, audio) | [link](https://docs.nvidia.com/nemo/oss/curator/latest/) | | [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) | Data | 698 | Synthetic data generation from scratch or seed data | [link](https://nvidia-nemo.github.io/DataDesigner/latest/) | | [Skills](https://github.com/NVIDIA-NeMo/Skills) | Data + Eval | 816 | SDG pipelines + evaluation for math, code, science | [link](https://nvidia-nemo.github.io/Skills/) | -| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Training | 288 | PyTorch DTensor-native training with HF support | [link](https://docs.nvidia.com/nemo-oss/automodel/latest/) | -| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Training | 423 | Megatron-Core training with bidirectional HF conversion | [link](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | +| [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | Training | 288 | PyTorch DTensor-native training with HF support | [link](https://docs.nvidia.com/nemo/oss/automodel/latest/) | +| [Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | Training | 423 | Megatron-Core training with bidirectional HF conversion | [link](https://docs.nvidia.com/nemo/oss/megatron-bridge/latest/) | | [NeMo (Speech)](https://github.com/NVIDIA-NeMo/NeMo) | Training | — | Speech AI (ASR, TTS) on Megatron-Core | [link](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | | [DFM](https://github.com/NVIDIA-NeMo/DFM) | Training | 29 | Diffusion model training (video, image) | [link](https://github.com/NVIDIA-NeMo/DFM/tree/main/docs) | -| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Training | — | Collection of cutting-edge optimizers | [link](https://docs.nvidia.com/nemo-oss/emerging-optimizers/latest/) | -| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Alignment | 1,306 | Scalable post-training (GRPO, DPO, SFT, distillation) | [link](https://docs.nvidia.com/nemo-oss/rl/latest/) | -| [Gym](https://github.com/NVIDIA-NeMo/Gym) | Alignment | 637 | RL environments for LLM training | [link](https://docs.nvidia.com/nemo-oss/gym/latest/) | -| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Evaluation | 195 | 100+ benchmarks across 18 harnesses | [link](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | -| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Deployment | 27 | Export to TRT-LLM/vLLM/ONNX + Triton serving | [link](https://docs.nvidia.com/nemo-oss/export-deploy/latest/) | -| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Deployment | 5,635 | Programmable safety rails with Colang DSL | [link](https://docs.nvidia.com/nemo-oss/guardrails) | -| [Run](https://github.com/NVIDIA-NeMo/Run) | Infra | 216 | Experiment launcher (local, SLURM, K8s) | [link](https://docs.nvidia.com/nemo-oss/run/latest/) | +| [Emerging-Optimizers](https://github.com/NVIDIA-NeMo/Emerging-Optimizers) | Training | — | Collection of cutting-edge optimizers | [link](https://docs.nvidia.com/nemo/oss/emerging-optimizers/latest/) | +| [NeMo RL](https://github.com/NVIDIA-NeMo/RL) | Alignment | 1,306 | Scalable post-training (GRPO, DPO, SFT, distillation) | [link](https://docs.nvidia.com/nemo/oss/rl/latest/) | +| [Gym](https://github.com/NVIDIA-NeMo/Gym) | Alignment | 637 | RL environments for LLM training | [link](https://docs.nvidia.com/nemo/oss/gym/latest/) | +| [Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | Evaluation | 195 | 100+ benchmarks across 18 harnesses | [link](https://docs.nvidia.com/nemo/oss/evaluator/latest/) | +| [Export-Deploy](https://github.com/NVIDIA-NeMo/Export-Deploy) | Deployment | 27 | Export to TRT-LLM/vLLM/ONNX + Triton serving | [link](https://docs.nvidia.com/nemo/oss/export-deploy/latest/) | +| [Guardrails](https://github.com/NVIDIA-NeMo/Guardrails) | Deployment | 5,635 | Programmable safety rails with Colang DSL | [link](https://docs.nvidia.com/nemo/oss/guardrails) | +| [Run](https://github.com/NVIDIA-NeMo/Run) | Infra | 216 | Experiment launcher (local, SLURM, K8s) | [link](https://docs.nvidia.com/nemo/oss/run/latest/) | | [Nemotron](https://github.com/NVIDIA-NeMo/Nemotron) | Recipes | — | Nemotron model family recipes | [link](https://github.com/NVIDIA-NeMo/Nemotron#readme) | diff --git a/nemo-fw-product-walkthrough.md b/nemo-fw-product-walkthrough.md index 5f325b6..5965bb3 100644 --- a/nemo-fw-product-walkthrough.md +++ b/nemo-fw-product-walkthrough.md @@ -26,16 +26,16 @@ Prepare Data → Train the Model → Align / Improve → Evaluate Quality → De | # | Product | One-Line Summary | Stage | Docs | |---|---------|-----------------|-------|------| -| 1 | [AutoModel](#1-automodel) | Fine-tune AI models with minimal setup | Training | [docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | -| 2 | [Curator](#2-curator--video-curator) | Clean and filter training data at scale | Data | [docs](https://docs.nvidia.com/nemo-oss/curator/latest/) | +| 1 | [AutoModel](#1-automodel) | Fine-tune AI models with minimal setup | Training | [docs](https://docs.nvidia.com/nemo/oss/automodel/latest/) | +| 2 | [Curator](#2-curator--video-curator) | Clean and filter training data at scale | Data | [docs](https://docs.nvidia.com/nemo/oss/curator/latest/) | | 3 | [Customizer](#3-customizer) | Fine-tune models via API (managed service) | Training | Product docs | | 4 | [Data Designer](#4-data-designer) | Generate synthetic training data | Data | [docs](https://nvidia-nemo.github.io/DataDesigner/latest/) | -| 5 | [Evaluator](#5-evaluator) | Benchmark model quality across 100+ tests | Evaluation | [docs](https://docs.nvidia.com/nemo-oss/evaluator/latest/) | -| 6 | [Gym](#6-gym) | Build practice environments for RL training | Alignment | [docs](https://docs.nvidia.com/nemo-oss/gym/latest/) | +| 5 | [Evaluator](#5-evaluator) | Benchmark model quality across 100+ tests | Evaluation | [docs](https://docs.nvidia.com/nemo/oss/evaluator/latest/) | +| 6 | [Gym](#6-gym) | Build practice environments for RL training | Alignment | [docs](https://docs.nvidia.com/nemo/oss/gym/latest/) | | 7 | [MCORE](#7-mcore-megatron-core) | Low-level engine for large-scale training | Training (engine) | [docs](https://docs.nvidia.com/Megatron-Core/) | -| 8 | [Megatron-Bridge](#8-megatron-bridge) | Train at massive scale (1,000+ GPUs) | Training | [docs](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) | -| 9 | [nvFSDP](#9-nvfsdp) | Memory-efficient training technique inside AutoModel | Training (component) | [docs](https://docs.nvidia.com/nemo-oss/automodel/latest/) | -| 10 | [RL](#10-rl) | Improve models using reinforcement learning | Alignment | [docs](https://docs.nvidia.com/nemo-oss/rl/latest/) | +| 8 | [Megatron-Bridge](#8-megatron-bridge) | Train at massive scale (1,000+ GPUs) | Training | [docs](https://docs.nvidia.com/nemo/oss/megatron-bridge/latest/) | +| 9 | [nvFSDP](#9-nvfsdp) | Memory-efficient training technique inside AutoModel | Training (component) | [docs](https://docs.nvidia.com/nemo/oss/automodel/latest/) | +| 10 | [RL](#10-rl) | Improve models using reinforcement learning | Alignment | [docs](https://docs.nvidia.com/nemo/oss/rl/latest/) | | 11 | [Toolkit (Speech)](#11-toolkit-speech) | Train speech recognition and text-to-speech models | Training | [docs](https://docs.nvidia.com/nemo-framework/user-guide/latest/speech_ai/index.html) | --- @@ -64,7 +64,7 @@ A short glossary for terms that come up repeatedly across products. ![NeMo AutoModel](assets/diagram-03-automodel.png) -**Repo:** [NVIDIA-NeMo/Automodel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo-oss/automodel](https://docs.nvidia.com/nemo-oss/automodel/latest/) +**Repo:** [NVIDIA-NeMo/Automodel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo/oss/automodel](https://docs.nvidia.com/nemo/oss/automodel/latest/) ### What Is It? @@ -101,7 +101,7 @@ AutoModel is the **recommended starting point** for most training tasks. It work ![NeMo Curator](assets/diagram-01-curator.png) -**Repo:** [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | **Docs:** [docs.nvidia.com/nemo-oss/curator](https://docs.nvidia.com/nemo-oss/curator/latest/) +**Repo:** [NVIDIA-NeMo/Curator](https://github.com/NVIDIA-NeMo/Curator) | **Docs:** [docs.nvidia.com/nemo/oss/curator](https://docs.nvidia.com/nemo/oss/curator/latest/) ### What Is It? @@ -188,7 +188,7 @@ Data Designer sits alongside Curator in the **data preparation stage** — but t ![NeMo Evaluator](assets/diagram-07-evaluator.png) -**Repo:** [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | **Docs:** [docs.nvidia.com/nemo-oss/evaluator](https://docs.nvidia.com/nemo-oss/evaluator/latest/) +**Repo:** [NVIDIA-NeMo/Evaluator](https://github.com/NVIDIA-NeMo/Evaluator) | **Docs:** [docs.nvidia.com/nemo/oss/evaluator](https://docs.nvidia.com/nemo/oss/evaluator/latest/) ### What Is It? @@ -228,7 +228,7 @@ Evaluator sits **after training and alignment** — it answers "how good is this ![NeMo Gym](assets/diagram-06-nemo-gym.png) -**Repo:** [NVIDIA-NeMo/Gym](https://github.com/NVIDIA-NeMo/Gym) | **Docs:** [docs.nvidia.com/nemo-oss/gym](https://docs.nvidia.com/nemo-oss/gym/latest/) +**Repo:** [NVIDIA-NeMo/Gym](https://github.com/NVIDIA-NeMo/Gym) | **Docs:** [docs.nvidia.com/nemo/oss/gym](https://docs.nvidia.com/nemo/oss/gym/latest/) ### What Is It? @@ -315,7 +315,7 @@ Under the hood: PyTorch MCORE MCORE ![NeMo Megatron-Bridge](assets/diagram-04-megatron-bridge.png) -**Repo:** [NVIDIA-NeMo/Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | **Docs:** [docs.nvidia.com/nemo-oss/megatron-bridge](https://docs.nvidia.com/nemo-oss/megatron-bridge/latest/) +**Repo:** [NVIDIA-NeMo/Megatron-Bridge](https://github.com/NVIDIA-NeMo/Megatron-Bridge) | **Docs:** [docs.nvidia.com/nemo/oss/megatron-bridge](https://docs.nvidia.com/nemo/oss/megatron-bridge/latest/) ### What Is It? @@ -362,7 +362,7 @@ Megatron-Bridge is the **heavy-duty training option** — complementary to AutoM ## 9. nvFSDP -**Location:** Inside [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo-oss/automodel](https://docs.nvidia.com/nemo-oss/automodel/latest/) +**Location:** Inside [AutoModel](https://github.com/NVIDIA-NeMo/Automodel) | **Docs:** [docs.nvidia.com/nemo/oss/automodel](https://docs.nvidia.com/nemo/oss/automodel/latest/) ### What Is It? @@ -395,7 +395,7 @@ nvFSDP is an **implementation detail** of AutoModel. Users configure it through ![NeMo RL](assets/diagram-05-nemo-rl.png) -**Repo:** [NVIDIA-NeMo/RL](https://github.com/NVIDIA-NeMo/RL) | **Docs:** [docs.nvidia.com/nemo-oss/rl](https://docs.nvidia.com/nemo-oss/rl/latest/) +**Repo:** [NVIDIA-NeMo/RL](https://github.com/NVIDIA-NeMo/RL) | **Docs:** [docs.nvidia.com/nemo/oss/rl](https://docs.nvidia.com/nemo/oss/rl/latest/) ### What Is It? @@ -532,7 +532,7 @@ Not all products are documented in the same place: | Docs Host | Products | |-----------|----------| -| `docs.nvidia.com/nemo-oss/...` | AutoModel, Megatron-Bridge, RL, Gym, Evaluator, Curator | +| `docs.nvidia.com/nemo/oss/...` | AutoModel, Megatron-Bridge, RL, Gym, Evaluator, Curator | | `docs.nvidia.com/Megatron-Core/` | MCORE | | `nvidia-nemo.github.io/...` | Data Designer, Skills | | `docs.nvidia.com/nemo-framework/user-guide/...` | Toolkit (Speech) |