Skip to content

docs: faq monotonic trajectory#613

Open
cmunley1 wants to merge 12 commits intomainfrom
cmunley1/on-policy-doc
Open

docs: faq monotonic trajectory#613
cmunley1 wants to merge 12 commits intomainfrom
cmunley1/on-policy-doc

Conversation

@cmunley1
Copy link
Contributor

@cmunley1 cmunley1 commented Jan 28, 2026

add faq on how gym and rl enforce monotonic trajectories and token id correction for on policy training

cmunley1 and others added 5 commits January 27, 2026 17:45
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
- Moved the article to concepts (it's mostly conceptual in its current
form)
- Reformatted the content
- Added crosslinks to/from doc

please double check that i did not distort any meaning as part of this
change

---------

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-authored-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
@cmunley1 cmunley1 changed the title document on policy training docs: on policy training Jan 29, 2026
@cmunley1
Copy link
Contributor Author

cmunley1 commented Jan 29, 2026

draft for disable on-policy assertion in on-policy training NVIDIA-NeMo/RL#1840

@cmunley1 cmunley1 requested a review from bxyu-nvidia February 6, 2026 04:11
Signed-off-by: Christian Munley <cmunley@nvidia.com>
@hwolff99
Copy link
Contributor

lgtm

Signed-off-by: cmunley1 <cmunley@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Feb 16, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>

For models with a chat template that drops previous reasoning traces: modify the chat template to retain all thinking, or use the non-thinking model.

For agents with non-monotonic trajectoires, the asserts may need to be disabled. This is not currently supported, but can be experimented with.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bxyu-nvidia not sure i like to say 'This is not currently supported, but can be experimented with.' but not sure what else

@cmunley1
Copy link
Contributor Author

i trimmed this down to just a FAQ on monotonic trajectory

@cmunley1 cmunley1 changed the title docs: on policy training docs: faq monotonic trajectory Feb 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments