-
Notifications
You must be signed in to change notification settings - Fork 784
fix: handle ref_in_actor flag for LoRA compatibility with verl 0.6.0 #386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes microsoft#383: AttributeError when using LoRA with verl 0.6.0 In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker instead of a separate ref policy worker. This change adds a helper function that checks the ref_in_actor flag and uses the correct worker (actor_rollout_wg or ref_policy_wg). - Add _compute_reference_log_prob() helper function - Update _train_step to use helper instead of direct ref_policy_wg access - Add comprehensive tests covering all scenarios Signed-off-by: Vasu <[email protected]>
@microsoft-github-policy-service agree |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an AttributeError that occurs when using LoRA with verl 0.6.0. In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker instead of a separate reference policy worker. The fix introduces a helper function that checks the ref_in_actor flag and routes to the appropriate worker, maintaining backward compatibility with older verl versions.
- Added
_compute_reference_log_prob()helper function to handle both LoRA and standard reference policy computation modes - Updated
_train_step()to use the new helper function instead of directly accessingref_policy_wg - Added comprehensive unit tests covering all scenarios including LoRA mode, standard mode, error handling, and backward compatibility
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| agentlightning/verl/trainer.py | Added _compute_reference_log_prob() helper function that checks ref_in_actor flag and routes to the correct worker (actor_rollout_wg for LoRA, ref_policy_wg for standard mode). Updated _train_step() to use this helper. |
| tests/trainer/test_verl_trainer.py | Added comprehensive unit tests for the new helper function, covering LoRA mode preference, standard mode fallback, error handling for missing workers, backward compatibility, and data preservation across multiple calls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
tests/trainer/test_verl_trainer.py
Outdated
| @@ -0,0 +1,136 @@ | |||
| # Copyright (c) Microsoft. All rights reserved. | |||
|
|
|||
| from types import SimpleNamespace | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verl tests are put elsewhere. Please don't add dummy tests in unit-tests.
|
Could you add a lora flag in examples/calc_x/train_calc_agent.py so that we can check in tests whether it works? |
|
/ci |
|
🚀 CI Watcher for correlation id-3631315225-miyegp95 triggered by comment 3631315225
✅ All runs completed. |
Sure. I will look into the Ci fails as well. Where do u want the test file? |
|
@Vasuk12 Maybe in examples/calc_x/train_calc_agent add an |
sure :). I was thinking of adding --lora with --lora-rank (default 32) and optionally --lora-adapter-path for loading pre-trained adapters. Does that sound good, or do you have other parameters in mind? |
Add --lora flag to examples/calc_x/train_calc_agent.py to enable LoRA training. When specified, sets lora_rank in verl config which triggers LoRA mode in verl 0.6.0+. - Add --lora flag to enable LoRA training - Add --lora-rank flag (default: 32) for custom LoRA rank - Add --lora-adapter-path flag (optional) for pre-trained adapters - Add config verification logging when LoRA is enabled - Remove test_verl_trainer.py from unit tests (per maintainer request) This enables testing the fix for issue microsoft#383 with LoRA configurations.
…t-lightning into fix/verl-ref-policy # Conflicts: # tests/trainer/test_verl_trainer.py
|
Made _compute_reference_log_prob a member method of AgentLightningTrainer instead of a module-level function. Signed-off-by: Vasu <[email protected]>
|
/ci |
|
🚀 CI Watcher for correlation id-3635199449-mizg844g triggered by comment 3635199449
✅ All runs completed. |
Fixes #383: AttributeError when using LoRA with verl 0.6.0
When using LoRA with verl 0.6.0, AgentLightningTrainer raises AttributeError: 'AgentLightningTrainer' object has no attribute 'ref_policy_wg'. In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker (actor_rollout_wg) instead of a separate ref policy worker (ref_policy_wg).
Added a helper function _compute_reference_log_prob() that:
Testing
All tests pass, including existing trainer tests. The fix maintains backward compatibility with older verl versions.