fix: handle ref_in_actor flag for LoRA compatibility with verl 0.6.0 #386

Vasuk12 · 2025-12-09T09:07:44Z

Fixes #383: AttributeError when using LoRA with verl 0.6.0

When using LoRA with verl 0.6.0, AgentLightningTrainer raises AttributeError: 'AgentLightningTrainer' object has no attribute 'ref_policy_wg'. In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker (actor_rollout_wg) instead of a separate ref policy worker (ref_policy_wg).

Added a helper function _compute_reference_log_prob() that:

Checks the ref_in_actor flag (set by verl when LoRA is detected)
Uses actor_rollout_wg when ref_in_actor=True (LoRA mode)
Falls back to ref_policy_wg when ref_in_actor=False (standard mode)
Provides clear error messages if the required worker is missing

Testing
All tests pass, including existing trainer tests. The fix maintains backward compatibility with older verl versions.

Fixes microsoft#383: AttributeError when using LoRA with verl 0.6.0 In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker instead of a separate ref policy worker. This change adds a helper function that checks the ref_in_actor flag and uses the correct worker (actor_rollout_wg or ref_policy_wg). - Add _compute_reference_log_prob() helper function - Update _train_step to use helper instead of direct ref_policy_wg access - Add comprehensive tests covering all scenarios Signed-off-by: Vasu <[email protected]>

Vasuk12 · 2025-12-09T09:09:30Z

@Vasuk12 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree

Copilot

Pull request overview

This PR fixes an AttributeError that occurs when using LoRA with verl 0.6.0. In verl 0.6.0+, when LoRA is enabled, the reference policy is computed by the actor rollout worker instead of a separate reference policy worker. The fix introduces a helper function that checks the ref_in_actor flag and routes to the appropriate worker, maintaining backward compatibility with older verl versions.

Added _compute_reference_log_prob() helper function to handle both LoRA and standard reference policy computation modes
Updated _train_step() to use the new helper function instead of directly accessing ref_policy_wg
Added comprehensive unit tests covering all scenarios including LoRA mode, standard mode, error handling, and backward compatibility

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
agentlightning/verl/trainer.py	Added `_compute_reference_log_prob()` helper function that checks `ref_in_actor` flag and routes to the correct worker (actor_rollout_wg for LoRA, ref_policy_wg for standard mode). Updated `_train_step()` to use this helper.
tests/trainer/test_verl_trainer.py	Added comprehensive unit tests for the new helper function, covering LoRA mode preference, standard mode fallback, error handling for missing workers, backward compatibility, and data preservation across multiple calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/trainer/test_verl_trainer.py

agentlightning/verl/trainer.py

Co-authored-by: Copilot <[email protected]>

ultmaster · 2025-12-09T09:37:04Z

tests/trainer/test_verl_trainer.py

@@ -0,0 +1,136 @@
+# Copyright (c) Microsoft. All rights reserved.
+
+from types import SimpleNamespace


verl tests are put elsewhere. Please don't add dummy tests in unit-tests.

ultmaster · 2025-12-09T09:40:13Z

Could you add a lora flag in examples/calc_x/train_calc_agent.py so that we can check in tests whether it works?

ultmaster · 2025-12-09T09:50:26Z

/ci

github-actions · 2025-12-09T09:50:36Z

🚀 CI Watcher for correlation id-3631315225-miyegp95 triggered by comment 3631315225
🏃‍♀️ Tracking 1 workflow run(s):

🟢 Calc-X - PR #386 - ci-calc-x - id-3631315225-miyegp95 — completed/success

✅ All runs completed.

Vasuk12 · 2025-12-09T10:11:17Z

Could you add a lora flag in examples/calc_x/train_calc_agent.py so that we can check in tests whether it works?

Sure. I will look into the Ci fails as well. Where do u want the test file?

ultmaster · 2025-12-09T11:06:55Z

@Vasuk12 Maybe in examples/calc_x/train_calc_agent add an --lora. When that is specified on the command line, enable lora configurations and run lora training. How does it sound?

Vasuk12 · 2025-12-09T11:50:04Z

@Vasuk12 Maybe in examples/calc_x/train_calc_agent add an --lora. When that is specified on the command line, enable lora configurations and run lora training. How does it sound?

sure :). I was thinking of adding --lora with --lora-rank (default 32) and optionally --lora-adapter-path for loading pre-trained adapters. Does that sound good, or do you have other parameters in mind?

Add --lora flag to examples/calc_x/train_calc_agent.py to enable LoRA training. When specified, sets lora_rank in verl config which triggers LoRA mode in verl 0.6.0+. - Add --lora flag to enable LoRA training - Add --lora-rank flag (default: 32) for custom LoRA rank - Add --lora-adapter-path flag (optional) for pre-trained adapters - Add config verification logging when LoRA is enabled - Remove test_verl_trainer.py from unit tests (per maintainer request) This enables testing the fix for issue microsoft#383 with LoRA configurations.

…t-lightning into fix/verl-ref-policy # Conflicts: # tests/trainer/test_verl_trainer.py

Vasuk12 · 2025-12-09T12:39:12Z

Removed dummy tests from unit-tests
Added --lora flag with full LoRA support
LoRA configurations are enabled when --lora is specified
LoRA training will run when --lora is used

agentlightning/verl/trainer.py

Made _compute_reference_log_prob a member method of AgentLightningTrainer instead of a module-level function. Signed-off-by: Vasu <[email protected]>

ultmaster · 2025-12-10T03:27:29Z

/ci

github-actions · 2025-12-10T03:27:41Z

🚀 CI Watcher for correlation id-3635199449-mizg844g triggered by comment 3635199449
🏃‍♀️ Tracking 1 workflow run(s):

🟢 Calc-X - PR #386 - ci-calc-x - id-3635199449-mizg844g — completed/success

✅ All runs completed.

…verl-ref-policy

Copilot AI review requested due to automatic review settings December 9, 2025 09:07

Copilot started reviewing on behalf of Vasuk12 December 9, 2025 09:08 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

tests/trainer/test_verl_trainer.py Outdated Show resolved Hide resolved

agentlightning/verl/trainer.py Outdated Show resolved Hide resolved

agentlightning/verl/trainer.py Outdated Show resolved Hide resolved

Vasuk12 and others added 2 commits December 9, 2025 09:14

Update agentlightning/verl/trainer.py

a51f279

Co-authored-by: Copilot <[email protected]>

Update tests/trainer/test_verl_trainer.py

db63ec4

Co-authored-by: Copilot <[email protected]>

ultmaster reviewed Dec 9, 2025

View reviewed changes

ultmaster added the ci-calc-x label Dec 9, 2025

Vasuk12 added 2 commits December 9, 2025 12:29

Merge branch 'fix/verl-ref-policy' of https://github.com/Vasuk12/agen…

7339a12

…t-lightning into fix/verl-ref-policy # Conflicts: # tests/trainer/test_verl_trainer.py

ultmaster reviewed Dec 9, 2025

View reviewed changes

agentlightning/verl/trainer.py Outdated Show resolved Hide resolved

Refactor _compute_reference_log_prob to method

834182b

Made _compute_reference_log_prob a member method of AgentLightningTrainer instead of a module-level function. Signed-off-by: Vasu <[email protected]>

ultmaster added 3 commits December 10, 2025 12:08

augment workflow

de38f23

Merge branch 'main' of github.com:microsoft/agent-lightning into fix/…

46ab36c

…verl-ref-policy

fix setup script dependency

0a58ac2

ultmaster merged commit bbd5c2a into microsoft:main Dec 10, 2025
16 checks passed

		@@ -0,0 +1,136 @@
		# Copyright (c) Microsoft. All rights reserved.

		from types import SimpleNamespace

fix: handle ref_in_actor flag for LoRA compatibility with verl 0.6.0 #386

fix: handle ref_in_actor flag for LoRA compatibility with verl 0.6.0 #386

Uh oh!

Conversation

Vasuk12 commented Dec 9, 2025

Uh oh!

Vasuk12 commented Dec 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ultmaster Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster commented Dec 9, 2025

Uh oh!

ultmaster commented Dec 9, 2025

Uh oh!

github-actions bot commented Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Vasuk12 commented Dec 9, 2025

Uh oh!

ultmaster commented Dec 9, 2025

Uh oh!

Vasuk12 commented Dec 9, 2025

Uh oh!

Vasuk12 commented Dec 9, 2025

Uh oh!

Uh oh!

ultmaster commented Dec 10, 2025

Uh oh!

github-actions bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Dec 9, 2025 •

edited

Loading

github-actions bot commented Dec 10, 2025 •

edited

Loading