-
Notifications
You must be signed in to change notification settings - Fork 780
Adapt the Search R1 Example to AGL v0.2 #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adapts the Search R1 example to work with AgentLightning v0.2, replacing the old shell-script-based training approach with a Python-based configuration system.
Key changes:
- Introduced a new Python training script with multiple configuration presets (fast, qwen, llama)
- Updated the agent implementation to use v0.2 APIs (rollout method instead of async methods, new resource access patterns)
- Modified evaluation logic and increased timeout for longer-running operations
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
train_search_r1_agent.py |
New Python-based training script replacing the old shell script with configurable presets for different models |
search_r1_agent.py |
Refactored agent to use v0.2 APIs with synchronous rollout method, updated logging, and multi-turn conversation handling |
train.sh |
Removed old shell-based training script |
qa_em.py |
Modified extract_solution logic to handle single match case differently |
README.md |
Updated documentation to reference v0.2 and new training command |
daemon.py |
Increased data setup timeout from 60s to 3600s for longer operations |
Comments suppressed due to low confidence (4)
examples/search_r1/qa_em.py:82
- Logic inconsistency with docstring. The docstring on line 71 states "Returns None if fewer than two such spans are present", but the code now only returns None when there are 0 matches. With exactly 1 match, it will reach line 82 and return
matches[-1].group(1).strip(), which contradicts the documented behavior. Either update the condition toif len(matches) < 2:or update the docstring to reflect the new behavior.
if len(matches) == 0:
return None
# If there are 2 or more matches, return the last one
return matches[-1].group(1).strip()
examples/search_r1/search_r1_agent.py:7
- Import of 'shutil' is not used.
import shutil
examples/search_r1/search_r1_agent.py:8
- Import of 'tempfile' is not used.
import tempfile
examples/search_r1/search_r1_agent.py:10
- Import of 'Literal' is not used.
from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| max_tokens: int = 500, | ||
| ) -> str: | ||
| if not len(messages): | ||
| messages=[{"role": "user", "content": content}] |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after equals sign. For consistency with PEP 8 style guidelines, there should be a space after the equals sign in the list assignment.
| messages=[{"role": "user", "content": content}] | |
| messages = [{"role": "user", "content": content}] |
| import shutil | ||
| import tempfile | ||
| import time | ||
| from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused imports. The imports shutil, tempfile, and Literal are not used anywhere in the file. Consider removing them to keep the code clean.
| import shutil | |
| import tempfile | |
| import time | |
| from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast | |
| import time | |
| from typing import Any, Dict, List, Optional, Tuple, TypedDict, cast |
| future = asyncio.run_coroutine_threadsafe(coro, self._internal_loop) | ||
| try: | ||
| future.result(timeout=60) # Wait for completion with a timeout | ||
| future.result(timeout=3600) # Wait for completion with a timeout |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a comment explaining the timeout increase. The timeout was increased from 60 seconds to 3600 seconds (1 hour), which is a 60x increase. While this may be necessary for the Search R1 example's data setup, it would be helpful to add a comment explaining why such a long timeout is needed, or consider making this configurable rather than hardcoded.
| "experiment_name": "searchr1_minibatch256_runner32", | ||
| "nnodes": 1, | ||
| "test_freq": 10, | ||
| "save_freq":10, |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after colon. According to PEP 8, there should be a space after the colon in dictionary literals.
| "save_freq":10, | |
| "save_freq": 10, |
| messages: List[dict] = [], | ||
| temperature: float = 1.0, | ||
| max_tokens: int = 500, | ||
| ) -> str: | ||
| if not len(messages): | ||
| messages=[{"role": "user", "content": content}] |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mutable default argument. Using a mutable default argument (empty list) for the messages parameter is a common Python pitfall. The same list object will be reused across multiple function calls, which can lead to unexpected behavior. Use None as the default and create a new list inside the function if needed.
| messages: List[dict] = [], | |
| temperature: float = 1.0, | |
| max_tokens: int = 500, | |
| ) -> str: | |
| if not len(messages): | |
| messages=[{"role": "user", "content": content}] | |
| messages: Optional[List[dict]] = None, | |
| temperature: float = 1.0, | |
| max_tokens: int = 500, | |
| ) -> str: | |
| if messages is None or not len(messages): | |
| messages = [{"role": "user", "content": content}] |
| searchr1_dev_data_path = os.path.join(os.environ.get("VERL_SEARCHR1_DATA_DIR", "data"), "test.parquet") | ||
| if not os.path.exists(searchr1_dev_data_path): | ||
| raise FileNotFoundError(f"Search_R1 dev data file {searchr1_dev_data_path} does not exist.") | ||
| df = pd.read_parquet(searchr1_dev_data_path).head(10) # type: ignore |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing import for pandas. The function debug_search_r1_agent uses pd.read_parquet() on line 210, but pandas is not imported in this file. Add import pandas as pd to the imports section.
| def main() -> None: | ||
| """Main function to parse arguments and run training.""" | ||
| parser = argparse.ArgumentParser( | ||
| description="Train an Search-R1 agent using different model configurations" |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar error: "Train an Search-R1 agent" should be "Train a Search-R1 agent". The article "an" should be "a" because "Search-R1" starts with a consonant sound.
| description="Train an Search-R1 agent using different model configurations" | |
| description="Train a Search-R1 agent using different model configurations" |
| def config_train_qwen() -> Dict[str, Any]: | ||
| """A configuration for training with Qwen-2.5B.""" | ||
|
|
||
| config = deepcopy(RL_TRAINING_CONFIG) |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent docstring description. The docstring says "training with Qwen-2.5B" but the help text in line 157 says "Qwen-2.5-Coder-1.5B", and the model actually used in RL_TRAINING_CONFIG is "meta-llama/Llama-3.2-3B-Instruct". The function doesn't modify the model path, so it will use the LLaMA model, not Qwen. Either update the docstring to accurately describe what model is being used, or modify the config to use a Qwen model.
| config = deepcopy(RL_TRAINING_CONFIG) | |
| config = deepcopy(RL_TRAINING_CONFIG) | |
| config["actor_rollout_ref"]["model"]["path"] = "Qwen/Qwen2.5-1.8B" # or another appropriate Qwen-2.5B model |
|
|
||
|
|
||
| def config_train_llama() -> Dict[str, Any]: | ||
| """A configuration for training with LLaMA-3.2-1B-Instruct. |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent docstring with actual implementation. The docstring says "training with LLaMA-3.2-1B-Instruct" but the model path set on line 133 is "meta-llama/Llama-3.2-3B-Instruct", which is a 3B model, not 1B. Update the docstring to say "3B-Instruct" instead of "1B-Instruct".
| """A configuration for training with LLaMA-3.2-1B-Instruct. | |
| """A configuration for training with LLaMA-3.2-3B-Instruct. |
| import os | ||
| from copy import deepcopy | ||
| from datetime import datetime | ||
| from typing import Any, Dict, Optional |
Copilot
AI
Dec 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'Optional' is not used.
| from typing import Any, Dict, Optional | |
| from typing import Any, Dict |
|
Please check whether copilot comments make sense, and fix the lint issues. |
No description provided.