Skip to content

Conversation

@JiahangXu
Copy link

No description provided.

Copilot AI review requested due to automatic review settings December 15, 2025 07:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adapts the Search R1 example to work with AgentLightning v0.2, replacing the old shell-script-based training approach with a Python-based configuration system.

Key changes:

  • Introduced a new Python training script with multiple configuration presets (fast, qwen, llama)
  • Updated the agent implementation to use v0.2 APIs (rollout method instead of async methods, new resource access patterns)
  • Modified evaluation logic and increased timeout for longer-running operations

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
train_search_r1_agent.py New Python-based training script replacing the old shell script with configurable presets for different models
search_r1_agent.py Refactored agent to use v0.2 APIs with synchronous rollout method, updated logging, and multi-turn conversation handling
train.sh Removed old shell-based training script
qa_em.py Modified extract_solution logic to handle single match case differently
README.md Updated documentation to reference v0.2 and new training command
daemon.py Increased data setup timeout from 60s to 3600s for longer operations
Comments suppressed due to low confidence (4)

examples/search_r1/qa_em.py:82

  • Logic inconsistency with docstring. The docstring on line 71 states "Returns None if fewer than two such spans are present", but the code now only returns None when there are 0 matches. With exactly 1 match, it will reach line 82 and return matches[-1].group(1).strip(), which contradicts the documented behavior. Either update the condition to if len(matches) < 2: or update the docstring to reflect the new behavior.
    if len(matches) == 0:
        return None

    # If there are 2 or more matches, return the last one
    return matches[-1].group(1).strip()

examples/search_r1/search_r1_agent.py:7

  • Import of 'shutil' is not used.
import shutil

examples/search_r1/search_r1_agent.py:8

  • Import of 'tempfile' is not used.
import tempfile

examples/search_r1/search_r1_agent.py:10

  • Import of 'Literal' is not used.
from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

max_tokens: int = 500,
) -> str:
if not len(messages):
messages=[{"role": "user", "content": content}]
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after equals sign. For consistency with PEP 8 style guidelines, there should be a space after the equals sign in the list assignment.

Suggested change
messages=[{"role": "user", "content": content}]
messages = [{"role": "user", "content": content}]

Copilot uses AI. Check for mistakes.
Comment on lines 7 to 10
import shutil
import tempfile
import time
from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused imports. The imports shutil, tempfile, and Literal are not used anywhere in the file. Consider removing them to keep the code clean.

Suggested change
import shutil
import tempfile
import time
from typing import Any, Dict, List, Literal, Optional, Tuple, TypedDict, cast
import time
from typing import Any, Dict, List, Optional, Tuple, TypedDict, cast

Copilot uses AI. Check for mistakes.
future = asyncio.run_coroutine_threadsafe(coro, self._internal_loop)
try:
future.result(timeout=60) # Wait for completion with a timeout
future.result(timeout=3600) # Wait for completion with a timeout
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a comment explaining the timeout increase. The timeout was increased from 60 seconds to 3600 seconds (1 hour), which is a 60x increase. While this may be necessary for the Search R1 example's data setup, it would be helpful to add a comment explaining why such a long timeout is needed, or consider making this configurable rather than hardcoded.

Copilot uses AI. Check for mistakes.
"experiment_name": "searchr1_minibatch256_runner32",
"nnodes": 1,
"test_freq": 10,
"save_freq":10,
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after colon. According to PEP 8, there should be a space after the colon in dictionary literals.

Suggested change
"save_freq":10,
"save_freq": 10,

Copilot uses AI. Check for mistakes.
Comment on lines 102 to 107
messages: List[dict] = [],
temperature: float = 1.0,
max_tokens: int = 500,
) -> str:
if not len(messages):
messages=[{"role": "user", "content": content}]
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutable default argument. Using a mutable default argument (empty list) for the messages parameter is a common Python pitfall. The same list object will be reused across multiple function calls, which can lead to unexpected behavior. Use None as the default and create a new list inside the function if needed.

Suggested change
messages: List[dict] = [],
temperature: float = 1.0,
max_tokens: int = 500,
) -> str:
if not len(messages):
messages=[{"role": "user", "content": content}]
messages: Optional[List[dict]] = None,
temperature: float = 1.0,
max_tokens: int = 500,
) -> str:
if messages is None or not len(messages):
messages = [{"role": "user", "content": content}]

Copilot uses AI. Check for mistakes.
searchr1_dev_data_path = os.path.join(os.environ.get("VERL_SEARCHR1_DATA_DIR", "data"), "test.parquet")
if not os.path.exists(searchr1_dev_data_path):
raise FileNotFoundError(f"Search_R1 dev data file {searchr1_dev_data_path} does not exist.")
df = pd.read_parquet(searchr1_dev_data_path).head(10) # type: ignore
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing import for pandas. The function debug_search_r1_agent uses pd.read_parquet() on line 210, but pandas is not imported in this file. Add import pandas as pd to the imports section.

Copilot uses AI. Check for mistakes.
def main() -> None:
"""Main function to parse arguments and run training."""
parser = argparse.ArgumentParser(
description="Train an Search-R1 agent using different model configurations"
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar error: "Train an Search-R1 agent" should be "Train a Search-R1 agent". The article "an" should be "a" because "Search-R1" starts with a consonant sound.

Suggested change
description="Train an Search-R1 agent using different model configurations"
description="Train a Search-R1 agent using different model configurations"

Copilot uses AI. Check for mistakes.
def config_train_qwen() -> Dict[str, Any]:
"""A configuration for training with Qwen-2.5B."""

config = deepcopy(RL_TRAINING_CONFIG)
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent docstring description. The docstring says "training with Qwen-2.5B" but the help text in line 157 says "Qwen-2.5-Coder-1.5B", and the model actually used in RL_TRAINING_CONFIG is "meta-llama/Llama-3.2-3B-Instruct". The function doesn't modify the model path, so it will use the LLaMA model, not Qwen. Either update the docstring to accurately describe what model is being used, or modify the config to use a Qwen model.

Suggested change
config = deepcopy(RL_TRAINING_CONFIG)
config = deepcopy(RL_TRAINING_CONFIG)
config["actor_rollout_ref"]["model"]["path"] = "Qwen/Qwen2.5-1.8B" # or another appropriate Qwen-2.5B model

Copilot uses AI. Check for mistakes.


def config_train_llama() -> Dict[str, Any]:
"""A configuration for training with LLaMA-3.2-1B-Instruct.
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent docstring with actual implementation. The docstring says "training with LLaMA-3.2-1B-Instruct" but the model path set on line 133 is "meta-llama/Llama-3.2-3B-Instruct", which is a 3B model, not 1B. Update the docstring to say "3B-Instruct" instead of "1B-Instruct".

Suggested change
"""A configuration for training with LLaMA-3.2-1B-Instruct.
"""A configuration for training with LLaMA-3.2-3B-Instruct.

Copilot uses AI. Check for mistakes.
import os
from copy import deepcopy
from datetime import datetime
from typing import Any, Dict, Optional
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Optional' is not used.

Suggested change
from typing import Any, Dict, Optional
from typing import Any, Dict

Copilot uses AI. Check for mistakes.
@ultmaster
Copy link
Contributor

Please check whether copilot comments make sense, and fix the lint issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants