Adapt the Search R1 Example to AGL v0.2 #412

JiahangXu · 2025-12-15T07:51:23Z

No description provided.

Copilot

Pull request overview

This pull request adapts the Search R1 example to work with AgentLightning v0.2, replacing the old shell-script-based training approach with a Python-based configuration system.

Key changes:

Introduced a new Python training script with multiple configuration presets (fast, qwen, llama)
Updated the agent implementation to use v0.2 APIs (rollout method instead of async methods, new resource access patterns)
Modified evaluation logic and increased timeout for longer-running operations

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
`train_search_r1_agent.py`	New Python-based training script replacing the old shell script with configurable presets for different models
`search_r1_agent.py`	Refactored agent to use v0.2 APIs with synchronous rollout method, updated logging, and multi-turn conversation handling
`train.sh`	Removed old shell-based training script
`qa_em.py`	Modified extract_solution logic to handle single match case differently
`README.md`	Updated documentation to reference v0.2 and new training command
`daemon.py`	Increased data setup timeout from 60s to 3600s for longer operations

Comments suppressed due to low confidence (4)

examples/search_r1/qa_em.py:82

Logic inconsistency with docstring. The docstring on line 71 states "Returns None if fewer than two such spans are present", but the code now only returns None when there are 0 matches. With exactly 1 match, it will reach line 82 and return matches[-1].group(1).strip(), which contradicts the documented behavior. Either update the condition to if len(matches) < 2: or update the docstring to reflect the new behavior.

    if len(matches) == 0:
        return None

    # If there are 2 or more matches, return the last one
    return matches[-1].group(1).strip()

examples/search_r1/search_r1_agent.py:7

Import of 'shutil' is not used.

import shutil

examples/search_r1/search_r1_agent.py:8

Import of 'tempfile' is not used.

import tempfile

examples/search_r1/search_r1_agent.py:10

Import of 'Literal' is not used.

from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-15T07:56:34Z

examples/search_r1/search_r1_agent.py

    max_tokens: int = 500,
 ) -> str:
+    if not len(messages):
+        messages=[{"role": "user", "content": content}]


Missing space after equals sign. For consistency with PEP 8 style guidelines, there should be a space after the equals sign in the list assignment.

Suggested change

messages=[{"role": "user", "content": content}]

messages = [{"role": "user", "content": content}]

Copilot · 2025-12-15T07:56:34Z

examples/search_r1/search_r1_agent.py

+import shutil
+import tempfile
+import time
+from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast


Unused imports. The imports shutil, tempfile, and Literal are not used anywhere in the file. Consider removing them to keep the code clean.

Suggested change

import shutil

import tempfile

import time

from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast

import time

from typing import Any, Dict, List, Optional, Tuple, TypedDict, cast

Copilot · 2025-12-15T07:56:35Z

agentlightning/verl/daemon.py

            future = asyncio.run_coroutine_threadsafe(coro, self._internal_loop)
        try:
-            future.result(timeout=60)  # Wait for completion with a timeout
+            future.result(timeout=3600)  # Wait for completion with a timeout


Consider adding a comment explaining the timeout increase. The timeout was increased from 60 seconds to 3600 seconds (1 hour), which is a 60x increase. While this may be necessary for the Search R1 example's data setup, it would be helpful to add a comment explaining why such a long timeout is needed, or consider making this configurable rather than hardcoded.

Copilot · 2025-12-15T07:56:35Z

examples/search_r1/train_search_r1_agent.py

+        "experiment_name": "searchr1_minibatch256_runner32",
+        "nnodes": 1,
+        "test_freq": 10,
+        "save_freq":10,


Missing space after colon. According to PEP 8, there should be a space after the colon in dictionary literals.

Suggested change

"save_freq":10,

"save_freq": 10,

Copilot · 2025-12-15T07:56:35Z

examples/search_r1/search_r1_agent.py

+    messages: List[dict] = [],
    temperature: float = 1.0,
    max_tokens: int = 500,
 ) -> str:
+    if not len(messages):
+        messages=[{"role": "user", "content": content}]


Mutable default argument. Using a mutable default argument (empty list) for the messages parameter is a common Python pitfall. The same list object will be reused across multiple function calls, which can lead to unexpected behavior. Use None as the default and create a new list inside the function if needed.

Suggested change

messages: List[dict] = [],

temperature: float = 1.0,

max_tokens: int = 500,

) -> str:

if not len(messages):

messages=[{"role": "user", "content": content}]

messages: Optional[List[dict]] = None,

temperature: float = 1.0,

max_tokens: int = 500,

) -> str:

if messages is None or not len(messages):

messages = [{"role": "user", "content": content}]

Copilot · 2025-12-15T07:56:36Z

contrib/recipes/search_r1/search_r1_agent.py

+    searchr1_dev_data_path = os.path.join(os.environ.get("VERL_SEARCHR1_DATA_DIR", "data"), "test.parquet")
+    if not os.path.exists(searchr1_dev_data_path):
+        raise FileNotFoundError(f"Search_R1 dev data file {searchr1_dev_data_path} does not exist.")
+    df = pd.read_parquet(searchr1_dev_data_path).head(10)  # type: ignore


Missing import for pandas. The function debug_search_r1_agent uses pd.read_parquet() on line 210, but pandas is not imported in this file. Add import pandas as pd to the imports section.

Copilot · 2025-12-15T07:56:36Z

contrib/recipes/search_r1/train_search_r1_agent.py

+def main() -> None:
+    """Main function to parse arguments and run training."""
+    parser = argparse.ArgumentParser(
+        description="Train an Search-R1 agent using different model configurations"


Grammar error: "Train an Search-R1 agent" should be "Train a Search-R1 agent". The article "an" should be "a" because "Search-R1" starts with a consonant sound.

Suggested change

description="Train an Search-R1 agent using different model configurations"

description="Train a Search-R1 agent using different model configurations"

Copilot · 2025-12-15T07:56:36Z

contrib/recipes/search_r1/train_search_r1_agent.py

+def config_train_qwen() -> Dict[str, Any]:
+    """A configuration for training with Qwen-2.5B."""
+
+    config = deepcopy(RL_TRAINING_CONFIG)


Inconsistent docstring description. The docstring says "training with Qwen-2.5B" but the help text in line 157 says "Qwen-2.5-Coder-1.5B", and the model actually used in RL_TRAINING_CONFIG is "meta-llama/Llama-3.2-3B-Instruct". The function doesn't modify the model path, so it will use the LLaMA model, not Qwen. Either update the docstring to accurately describe what model is being used, or modify the config to use a Qwen model.

Suggested change

config = deepcopy(RL_TRAINING_CONFIG)

config = deepcopy(RL_TRAINING_CONFIG)

config["actor_rollout_ref"]["model"]["path"] = "Qwen/Qwen2.5-1.8B" # or another appropriate Qwen-2.5B model

Copilot · 2025-12-15T07:56:37Z

contrib/recipes/search_r1/train_search_r1_agent.py

+
+
+def config_train_llama() -> Dict[str, Any]:
+    """A configuration for training with LLaMA-3.2-1B-Instruct.


Inconsistent docstring with actual implementation. The docstring says "training with LLaMA-3.2-1B-Instruct" but the model path set on line 133 is "meta-llama/Llama-3.2-3B-Instruct", which is a 3B model, not 1B. Update the docstring to say "3B-Instruct" instead of "1B-Instruct".

Suggested change

"""A configuration for training with LLaMA-3.2-1B-Instruct.

"""A configuration for training with LLaMA-3.2-3B-Instruct.

Copilot · 2025-12-15T07:56:37Z

contrib/recipes/search_r1/train_search_r1_agent.py

+import os
+from copy import deepcopy
+from datetime import datetime
+from typing import Any, Dict, Optional


Import of 'Optional' is not used.

Suggested change

from typing import Any, Dict, Optional

from typing import Any, Dict

ultmaster · 2025-12-15T09:04:50Z

Please check whether copilot comments make sense, and fix the lint issues.

SiyunZhao and others added 10 commits November 5, 2025 10:23

Update Search-R1 Example to v0.2.x

c8a8b93

delete redundant script

db9d280

fix path

f60a49b

delete redundant parameter

5e35898

stage debug scripts

5c3b9c6

Merge branch 'main' into dev/search_r1_v02

7b57eed

update logger

11e8589

stage test scripts

161961d

update daemon timeout

283eb89

Merge branch 'main' into dev/search_r1_v02

9ba7377

Copilot AI review requested due to automatic review settings December 15, 2025 07:51

Copilot started reviewing on behalf of JiahangXu December 15, 2025 07:51 View session

Copilot AI reviewed Dec 15, 2025

View reviewed changes

JiahangXu added 2 commits December 15, 2025 08:34

refactor

77dfa41

Merge branch 'main' into dev/search_r1_v02

dd45e40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adapt the Search R1 Example to AGL v0.2 #412

Adapt the Search R1 Example to AGL v0.2 #412

JiahangXu commented Dec 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

Copilot AI Dec 15, 2025

Uh oh!

ultmaster commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	messages=[{"role": "user", "content": content}]
	messages = [{"role": "user", "content": content}]

	description="Train an Search-R1 agent using different model configurations"
	description="Train a Search-R1 agent using different model configurations"

	config = deepcopy(RL_TRAINING_CONFIG)
	config = deepcopy(RL_TRAINING_CONFIG)
	config["actor_rollout_ref"]["model"]["path"] = "Qwen/Qwen2.5-1.8B" # or another appropriate Qwen-2.5B model



		def config_train_llama() -> Dict[str, Any]:
		"""A configuration for training with LLaMA-3.2-1B-Instruct.

	from typing import Any, Dict, Optional
	from typing import Any, Dict

Adapt the Search R1 Example to AGL v0.2 #412

Are you sure you want to change the base?

Adapt the Search R1 Example to AGL v0.2 #412

Conversation

JiahangXu commented Dec 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

ultmaster commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants