docs: faq monotonic trajectory by cmunley1 · Pull Request #613 · NVIDIA-NeMo/Gym

cmunley1 · 2026-01-28T01:46:48Z

add faq on how gym and rl enforce monotonic trajectories and token id correction for on policy training

Signed-off-by: Christian Munley <cmunley@nvidia.com>

- Moved the article to concepts (it's mostly conceptual in its current form) - Reformatted the content - Added crosslinks to/from doc please double check that i did not distort any meaning as part of this change --------- Signed-off-by: Lawrence Lane <llane@nvidia.com> Co-authored-by: cmunley1 <cmunley@nvidia.com>

Signed-off-by: Lawrence Lane <llane@nvidia.com>

cmunley1 · 2026-01-29T04:07:33Z

draft for disable on-policy assertion in on-policy training NVIDIA-NeMo/RL#1840

docs/about/concepts/on-policy-training.md

Signed-off-by: Christian Munley <cmunley@nvidia.com>

docs/environment-tutorials/multi-step.md

hwolff99 · 2026-02-12T19:00:10Z

lgtm

Signed-off-by: cmunley1 <cmunley@nvidia.com>

copy-pr-bot · 2026-02-16T09:10:21Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 · 2026-02-16T21:13:35Z

docs/reference/faq.md

+
+For models with a chat template that drops previous reasoning traces: modify the chat template to retain all thinking, or use the non-thinking model.
+
+For agents with non-monotonic trajectoires, the asserts may need to be disabled. This is not currently supported, but can be experimented with.  


@bxyu-nvidia not sure i like to say 'This is not currently supported, but can be experimented with.' but not sure what else

cmunley1 · 2026-02-16T21:13:53Z

i trimmed this down to just a FAQ on monotonic trajectory

cmunley1 and others added 5 commits January 27, 2026 17:45

document on policy training

8bfd0cb

Signed-off-by: Christian Munley <cmunley@nvidia.com>

small fix

11bc5ea

Signed-off-by: Christian Munley <cmunley@nvidia.com>

move location

9ad3812

Signed-off-by: Christian Munley <cmunley@nvidia.com>

doc build fix

e029fad

Signed-off-by: Lawrence Lane <llane@nvidia.com>

cmunley1 changed the title ~~document on policy training~~ docs: on policy training Jan 29, 2026

bxyu-nvidia requested changes Jan 29, 2026

View reviewed changes

docs/about/concepts/on-policy-training.md Outdated Show resolved Hide resolved

lbliii and others added 3 commits February 2, 2026 15:03

Merge branch 'main' into cmunley1/on-policy-doc

0847425

remove cfg section

d3d1780

Signed-off-by: Christian Munley <cmunley@nvidia.com>

Merge remote-tracking branch 'origin/main' into cmunley1/on-policy-doc

4c2ccd6

cmunley1 requested a review from bxyu-nvidia February 6, 2026 04:11

ci

bc18a9f

Signed-off-by: Christian Munley <cmunley@nvidia.com>

hwolff99 reviewed Feb 12, 2026

View reviewed changes

docs/environment-tutorials/multi-step.md Outdated Show resolved Hide resolved

merge

c6c497a

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 added 2 commits February 16, 2026 02:00

split up on policy and monotonic

cfe9aa1

Signed-off-by: cmunley1 <cmunley@nvidia.com>

remove on policy section

f292677

Signed-off-by: cmunley1 <cmunley@nvidia.com>

cmunley1 commented Feb 16, 2026

View reviewed changes

cmunley1 changed the title ~~docs: on policy training~~ docs: faq monotonic trajectory Feb 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: faq monotonic trajectory#613

docs: faq monotonic trajectory#613
cmunley1 wants to merge 12 commits intomainfrom
cmunley1/on-policy-doc

cmunley1 commented Jan 28, 2026 •

edited

Loading

Uh oh!

cmunley1 commented Jan 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

hwolff99 commented Feb 12, 2026

Uh oh!

copy-pr-bot bot commented Feb 16, 2026

Uh oh!

cmunley1 Feb 16, 2026

Uh oh!

cmunley1 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments


		For models with a chat template that drops previous reasoning traces: modify the chat template to retain all thinking, or use the non-thinking model.

		For agents with non-monotonic trajectoires, the asserts may need to be disabled. This is not currently supported, but can be experimented with.

Conversation

cmunley1 commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmunley1 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hwolff99 commented Feb 12, 2026

Uh oh!

copy-pr-bot bot commented Feb 16, 2026

Uh oh!

cmunley1 Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

cmunley1 commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

cmunley1 commented Jan 28, 2026 •

edited

Loading

cmunley1 commented Jan 29, 2026 •

edited

Loading