-
Notifications
You must be signed in to change notification settings - Fork 45
Description
/Trinity-RFT/examples/dapo_math
开发者您好,我在测试dapo这个示例的时候,遇到了评估这个path to AIME2024数据集一直报错的问题,要么识别不到路径,要么就是数据集没识别到
ERROR 11-05 10:20:36 [launcher.py:133] FileNotFoundError: Couldn't find any data file at /data/Trinity-RFT/aime_2024_problems.parquet.
要么就是字符串的问题,想问一下您这边是跑通的嘛,数据集AIME2024怎么给的呢,我把eval_tasksets:那部分删掉是可以正常运行的
project: Trinity-RFT-example
name: dapo
checkpoint_root_dir: ${oc.env:TRINITY_CHECKPOINT_ROOT_DIR,./checkpoints}
model:
model_path: ${oc.env:TRINITY_MODEL_PATH,/data/LLM/llm_rl/Qwen2.5-1.5B}
max_response_tokens: 20480
max_model_len: 21504
algorithm:
algorithm_type: grpo
repeat_times: 16
policy_loss_fn_args:
clip_range_low: 0.2
clip_range_high: 0.28
cluster:
node_num: 1
gpu_per_node: 8
buffer:
total_epochs: 1
batch_size: 32
explorer_input:
taskset:
name: dapo-math
storage_type: file
path: /data/Trinity-RFT/open-r1/DAPO-Math-17k-Processed
subset_name: all
format:
prompt_key: 'prompt'
response_key: 'solution'
system_prompt: 'Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.'
rollout_args:
temperature: 1.0
logprobs: 0
workflow_args:
use_base: true
reward_fn_args:
enable_overlong_penalty: true
penalty_factor: 1.0
max_response_length: 20480
cache_length: 4096
eval_tasksets:
- name: AIME2024
storage_type: file
path: ${oc.env:TRINITY_TASKSET_PATH, /data/Trinity-RFT/aime_2024} # e.g. path to AIME2024
repeat_times: 32
format:
prompt_key: 'question'
response_key: 'answer'
system_prompt: 'Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.'
rollout_args:
temperature: 1.0
top_p: 0.7
default_workflow_type: 'math_boxed_workflow'
default_reward_fn_type: 'math_dapo_reward'
trainer_input:
experience_buffer:
name: math_buffer
storage_type: queue
explorer:
eval_interval: 10
runner_per_model: 8
rollout_model:
engine_num: 4
tensor_parallel_size: 1
enable_prefix_caching: false
enforce_eager: true
dtype: bfloat16
seed: 42
synchronizer:
sync_method: 'nccl'
sync_interval: 16
sync_timeout: 1200
trainer:
save_interval: 100
trainer_config:
actor_rollout_ref:
model:
use_remove_padding: true
actor:
use_dynamic_bsz: true
ppo_max_token_len_per_gpu: 22000
ulysses_sequence_parallel_size: 1
optim:
lr: 1e-6
lr_warmup_steps: 20
ref:
log_prob_use_dynamic_bsz: ${trainer.trainer_config.actor_rollout_ref.actor.use_dynamic_bsz}
log_prob_max_token_len_per_gpu: ${trainer.trainer_config.actor_rollout_ref.actor.ppo_max_token_len_per_gpu}
ulysses_sequence_parallel_size: ${trainer.trainer_config.actor_rollout_ref.actor.ulysses_sequence_parallel_size} # sp size