Skip to content

Conversation

@Vasuk12
Copy link
Contributor

@Vasuk12 Vasuk12 commented Dec 9, 2025

Fixes #383: AttributeError when using LoRA with verl 0.6.0

When using LoRA with verl 0.6.0, AgentLightningTrainer raises AttributeError: 'AgentLightningTrainer' object has no attribute 'ref_policy_wg'. In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker (actor_rollout_wg) instead of a separate ref policy worker (ref_policy_wg).

Added a helper function _compute_reference_log_prob() that:

  • Checks the ref_in_actor flag (set by verl when LoRA is detected)
  • Uses actor_rollout_wg when ref_in_actor=True (LoRA mode)
  • Falls back to ref_policy_wg when ref_in_actor=False (standard mode)
  • Provides clear error messages if the required worker is missing

Testing
All tests pass, including existing trainer tests. The fix maintains backward compatibility with older verl versions.

Fixes microsoft#383: AttributeError when using LoRA with verl 0.6.0

In verl 0.6.0+, when LoRA is enabled, the reference policy is computed
by the actor rollout worker instead of a separate ref policy worker.
This change adds a helper function that checks the ref_in_actor flag
and uses the correct worker (actor_rollout_wg or ref_policy_wg).

- Add _compute_reference_log_prob() helper function
- Update _train_step to use helper instead of direct ref_policy_wg access
- Add comprehensive tests covering all scenarios

Signed-off-by: Vasu <[email protected]>
Copilot AI review requested due to automatic review settings December 9, 2025 09:07
@Vasuk12
Copy link
Contributor Author

Vasuk12 commented Dec 9, 2025

@Vasuk12 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an AttributeError that occurs when using LoRA with verl 0.6.0. In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker instead of a separate reference policy worker. The fix introduces a helper function that checks the ref_in_actor flag and routes to the appropriate worker, maintaining backward compatibility with older verl versions.

  • Added _compute_reference_log_prob() helper function to handle both LoRA and standard reference policy computation modes
  • Updated _train_step() to use the new helper function instead of directly accessing ref_policy_wg
  • Added comprehensive unit tests covering all scenarios including LoRA mode, standard mode, error handling, and backward compatibility

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
agentlightning/verl/trainer.py Added _compute_reference_log_prob() helper function that checks ref_in_actor flag and routes to the correct worker (actor_rollout_wg for LoRA, ref_policy_wg for standard mode). Updated _train_step() to use this helper.
tests/trainer/test_verl_trainer.py Added comprehensive unit tests for the new helper function, covering LoRA mode preference, standard mode fallback, error handling for missing workers, backward compatibility, and data preservation across multiple calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,136 @@
# Copyright (c) Microsoft. All rights reserved.

from types import SimpleNamespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verl tests are put elsewhere. Please don't add dummy tests in unit-tests.

@ultmaster
Copy link
Contributor

Could you add a lora flag in examples/calc_x/train_calc_agent.py so that we can check in tests whether it works?

@ultmaster
Copy link
Contributor

/ci

@github-actions
Copy link

github-actions bot commented Dec 9, 2025

🚀 CI Watcher for correlation id-3631315225-miyegp95 triggered by comment 3631315225
🏃‍♀️ Tracking 1 workflow run(s):

✅ All runs completed.

@Vasuk12
Copy link
Contributor Author

Vasuk12 commented Dec 9, 2025

Could you add a lora flag in examples/calc_x/train_calc_agent.py so that we can check in tests whether it works?

Sure. I will look into the Ci fails as well. Where do u want the test file?

@ultmaster
Copy link
Contributor

@Vasuk12 Maybe in examples/calc_x/train_calc_agent add an --lora. When that is specified on the command line, enable lora configurations and run lora training. How does it sound?

@Vasuk12
Copy link
Contributor Author

Vasuk12 commented Dec 9, 2025

@Vasuk12 Maybe in examples/calc_x/train_calc_agent add an --lora. When that is specified on the command line, enable lora configurations and run lora training. How does it sound?

sure :). I was thinking of adding --lora with --lora-rank (default 32) and optionally --lora-adapter-path for loading pre-trained adapters. Does that sound good, or do you have other parameters in mind?

Add --lora flag to examples/calc_x/train_calc_agent.py to enable LoRA
training. When specified, sets lora_rank in verl config which triggers
LoRA mode in verl 0.6.0+.

- Add --lora flag to enable LoRA training
- Add --lora-rank flag (default: 32) for custom LoRA rank
- Add --lora-adapter-path flag (optional) for pre-trained adapters
- Add config verification logging when LoRA is enabled
- Remove test_verl_trainer.py from unit tests (per maintainer request)

This enables testing the fix for issue microsoft#383 with LoRA configurations.
…t-lightning into fix/verl-ref-policy

# Conflicts:
#	tests/trainer/test_verl_trainer.py
@Vasuk12
Copy link
Contributor Author

Vasuk12 commented Dec 9, 2025

  • Removed dummy tests from unit-tests
  • Added --lora flag with full LoRA support
  • LoRA configurations are enabled when --lora is specified
  • LoRA training will run when --lora is used

Made _compute_reference_log_prob a member method of AgentLightningTrainer instead of a module-level function.

Signed-off-by: Vasu <[email protected]>
@ultmaster
Copy link
Contributor

/ci

@github-actions
Copy link

github-actions bot commented Dec 10, 2025

🚀 CI Watcher for correlation id-3635199449-mizg844g triggered by comment 3635199449
🏃‍♀️ Tracking 1 workflow run(s):

✅ All runs completed.

@ultmaster ultmaster merged commit bbd5c2a into microsoft:main Dec 10, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: Compatibility with verl 0.6.0 and LoRA

2 participants