Skip to content

Conversation

@Sh3lock
Copy link

@Sh3lock Sh3lock commented Dec 15, 2025

No description provided.

添加多个运行脚本以支持不同环境下的 SQL Agent 执行

新增 `run.sh`、`run_local.sh`、`run_qwen2.5_coder_local.sh` 和 `run_train_local.sh` 脚本,
用于配置并启动 Spider 数据集上的 SQL Agent 任务。
支持本地部署的 vLLM 兼容 API、训练及评估流程的一键执行。
Copilot AI review requested due to automatic review settings December 15, 2025 14:31
@Sh3lock Sh3lock closed this Dec 15, 2025
@Sh3lock
Copy link
Author

Sh3lock commented Dec 15, 2025

Opened by mistake,sry!!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds comprehensive logging infrastructure and evaluation mode support to the Spider SQL agent training system. The changes enable better monitoring of training progress, hardware utilization, and provide flexible configuration management through external JSON/YAML files.

Key Changes

  • Added infrastructure for detailed logging including hardware snapshots, GPU/CPU monitoring, progress tracking, and sequence length analysis
  • Introduced support for external configuration files (JSON/YAML) with CLI argument --config-file
  • Created shell scripts for launching training and evaluation workflows with different model configurations

Reviewed changes

Copilot reviewed 12 out of 15 changed files in this pull request and generated 22 comments.

Show a summary per file
File Description
run_train_local.sh New training launcher script with nohup support and environment logging
run_qwen2.5_coder_local.sh Evaluation script for Qwen2.5-Coder-0.5B against local vLLM endpoint
run_local.sh Evaluation script for Qwen3-4B against local vLLM endpoint
run.sh Evaluation script using Aliyun DashScope API
examples/spider/train_sql_agent.py Enhanced training script with config file loading, hardware monitoring, and logging infrastructure
examples/spider/sql_agent_eval.py New comprehensive evaluation script with concurrency support and token tracking
examples/spider/doc/README.md Chinese documentation describing the logging system changes
examples/spider/configs/singleGPU_qwen05b.json Single GPU configuration for Qwen2.5-Coder-0.5B
examples/spider/configs/doubleGPU_qwen05b.json Dual GPU configuration for Qwen2.5-Coder-1.5B
agentlightning/verl/trainer.py Enhanced trainer with progress logging, sequence metrics, and validation summaries
.gitignore Added specific log directory entries
result/qwen2.5-coder-1.5b-insruct-dashscopeApi.txt Evaluation results for 1.5B model
result/qwen2.5-coder-0.5b.txt Evaluation results for 0.5B model

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# export WANDB_DISABLED=true

# 配置路径(直接在此处硬编码,修改更方便)
CONFIG_PATH="/home/lthpc/student/LiTengfei/project/myfork/agent-lightning/examples/spider/configs/singleGPU_qwen05b.json"
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded absolute path with user-specific directory makes this configuration non-portable. Consider using a relative path or an environment variable that can be set per user/environment.

Suggested change
CONFIG_PATH="/home/lthpc/student/LiTengfei/project/myfork/agent-lightning/examples/spider/configs/singleGPU_qwen05b.json"
CONFIG_PATH="$ROOT/examples/spider/configs/singleGPU_qwen05b.json"

Copilot uses AI. Check for mistakes.
}
},
"model": {
"path": "/home/lthpc/student/LiTengfei/LLaMA-Factory/models/Qwen2.5-Coder-1.5B-Instruct",
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded absolute path containing user-specific directory makes this configuration non-portable. Consider using a relative path or environment variable substitution to work across different development environments.

Suggested change
"path": "/home/lthpc/student/LiTengfei/LLaMA-Factory/models/Qwen2.5-Coder-1.5B-Instruct",
"path": "${MODEL_DIR}/Qwen2.5-Coder-1.5B-Instruct",

Copilot uses AI. Check for mistakes.
}
},
"model": {
"path": "/home/lthpc/student/LiTengfei/LLaMA-Factory/models/Qwen2.5-Coder-1.5B-Instruct",
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename is 'doubleGPU_qwen05b.json' suggesting a 0.5B model, but the model path points to 'Qwen2.5-Coder-1.5B-Instruct' which is a 1.5B model. This inconsistency between the filename and the actual model configuration could lead to confusion.

Copilot uses AI. Check for mistakes.
}
},
"model": {
"path": "/home/lthpc/student/LiTengfei/LLaMA-Factory/models/Qwen2.5-Coder-0.5B-Instruct",
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded absolute path containing user-specific directory makes this configuration non-portable. Consider using a relative path or environment variable substitution to work across different development environments.

Suggested change
"path": "/home/lthpc/student/LiTengfei/LLaMA-Factory/models/Qwen2.5-Coder-0.5B-Instruct",
"path": "${MODEL_PATH}",

Copilot uses AI. Check for mistakes.
"total_epochs": 2,
"log_interval": 1,
"save_freq": 500,
"default_local_dir": "/home/lthpc/student/LiTengfei/project/myfork/agent-lightning/examples/spider/ckpt_1.5b_grpo_config2"
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded absolute path containing user-specific directory makes this configuration non-portable. Consider using a relative path or environment variable substitution to work across different development environments.

Copilot uses AI. Check for mistakes.
props = torch.cuda.get_device_properties(idx)
lines.append(
f"gpu_{idx}: name={props.name}, total_mem_gb={round(props.total_memory/1e9,2)}, "
f"capability={props.major}.{props.minor}"
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the word 'unreasoable' - it should be 'unreasonable'.

Copilot uses AI. Check for mistakes.
lock = threading.Lock()
throttle_lock = threading.Lock()
last_call_ts = [0.0]
min_interval = 0.5 # seconds between starting any two requests to respect rate limits
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic number 0.5 for minimum interval between requests lacks explanation. Consider adding a comment explaining this rate limit value or making it configurable, especially since different API providers may have different rate limits.

Copilot uses AI. Check for mistakes.
Comment on lines +578 to +580
self._progress_log(
f"[{ts_val}] [val] step {self.global_steps}/{self.total_training_steps} | {key}={val}"
)
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _progress_log method is called but is never defined. This will result in an AttributeError at runtime. The method needs to be defined in the class, likely writing to a progress log file configured via config.trainer.progress_log_file.

Copilot uses AI. Check for mistakes.

cpu_percent = str(psutil.cpu_percent(interval=None))
mem_percent = str(psutil.virtual_memory().percent)
except Exception:
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
except Exception:
# Ignore all exceptions: missing CPU/memory stats are non-fatal and will be recorded as 'na'

Copilot uses AI. Check for mistakes.
try:
load_avg_vals = os.getloadavg()
load_avg = ",".join(str(round(v, 2)) for v in load_avg_vals)
except Exception:
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants