Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions docs/en/notes/guide/mixer/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ update_times: 2
- `init_mixture_proportions`: Initial sampling proportions, required when `mixture_sample_rule='mixture'`.
- `warmup_step`: Before the first dynamic proportion update, the model needs to perform `warmup_step` steps of regular training. This helps the model establish initial understanding of data distribution.
- `update_step`: Frequency of domain proportion updates. After every `update_step` training steps, the Mixer will be triggered to update domain proportions for the next training phase.
- `update_times`: Total number of dynamic data proportion calculations during the entire training process. Therefore, total training steps = `(update_times * update_step + warmup_step) * global_batch_size`
- `update_times`: Number of dynamic data proportion updates per Flex epoch. Total steps are derived from `num_train_epochs` unless `train_step > 0`.

### Static Mixing Configuration

Expand All @@ -47,7 +47,7 @@ train_type: dynamic_mix
static_mix: true # Whether to fix initial static mixing proportions (only effective in dynamic_mix trainer)
mixture_sample_rule: mixture # Initial sampling rule
init_mixture_proportions: [0.7, 0.3] # Initial proportions, can be adjusted by additional algorithms
train_step: 3 # Total training steps (only effective in dynamic_mix trainer), excluding warmup and update steps
train_step: 3 # fixed total steps; set to 0 to use num_train_epochs
```

When static mixing is enabled, the training process will use fixed `init_mixture_proportions` without dynamic adjustment.
Expand Down Expand Up @@ -129,4 +129,4 @@ mixers:

#### Key Points

- `params`: All parameters defined under this block will be passed as keyword arguments to the `__init__` constructor of the `RandomMixer` class. For example, the `seed` value here will be passed to the `seed` parameter of the `__init__` method.
- `params`: All parameters defined under this block will be passed as keyword arguments to the `__init__` constructor of the `RandomMixer` class. For example, the `seed` value here will be passed to the `seed` parameter of the `__init__` method.
6 changes: 5 additions & 1 deletion docs/en/notes/guide/selector/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ component_name: less
warmup_step: 4
update_step: 3
update_times: 2
num_train_epochs: 1.0
train_step: 0

eval_dataset: alpaca_zh_demo
```
```

`update_times` is the number of dynamic selections per Flex epoch. Use `num_train_epochs: 1.0` for one Flex epoch; use `num_train_epochs: N` with `train_step: 0` for multi-epoch runs. If `train_step > 0`, it fixes the total number of steps and overrides `num_train_epochs`.
3 changes: 2 additions & 1 deletion docs/en/notes/guide/selector/selector_delta_loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ update_times: 2
eval_dataset: alpaca_zh_demo
```

`update_times` is the number of dynamic selections per Flex epoch. Delta Loss requires `update_times >= 2`: the first selection builds initial losses, and later selections use delta loss. For multi-epoch runs, set `num_train_epochs: N` and keep `train_step: 0`.

---

### Step 4: Run Training
Expand Down Expand Up @@ -180,4 +182,3 @@ The merged model will be saved in:
## 3. Model Evaluation

It is recommended to use the [DataFlow](https://github.com/OpenDCAI/DataFlow) [Model QA Evaluation Pipeline](https://opendcai.github.io/DataFlow-Doc/zh/guide/2k5wjgls/) for systematic evaluation of the generated model.

5 changes: 2 additions & 3 deletions docs/en/notes/guide/selector/selector_less.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,10 +151,10 @@ eval_dataset: alpaca_zh_demo
* `output_dir`: Output directory of dynamic fine-tuning (LoRA adapter).
* `warmup_step`: Number of warmup steps before the first sample selection.
* `update_step`: Number of steps between each dynamic data selection.
* `update_times`: Total number of dynamic data selection iterations.
* `update_times`: Number of dynamic data selection iterations per Flex epoch.
* `eval_dataset`: Validation dataset.

Both `dataset` and `eval_dataset` can be selected from `DataFlex/data/dataset_info.json` or local JSON files in ShareGPT/Alpaca format. Note: The training set size significantly affects computation cost. Total steps = `warmup_step + update_step × update_times`.
Both `dataset` and `eval_dataset` can be selected from `DataFlex/data/dataset_info.json` or local JSON files in ShareGPT/Alpaca format. Note: the training set size significantly affects computation cost. Steps per Flex epoch = `warmup_step + update_step × update_times`. Total steps are derived from `num_train_epochs` unless `train_step > 0`.

---

Expand Down Expand Up @@ -212,4 +212,3 @@ The merged model will be saved in:
## 3. Model Evaluation

It is recommended to use the [DataFlow](https://github.com/OpenDCAI/DataFlow) [Model QA Evaluation Pipeline](https://opendcai.github.io/DataFlow-Doc/zh/guide/2k5wjgls/) for systematic evaluation of the generated model.

3 changes: 2 additions & 1 deletion docs/en/notes/guide/selector/selector_loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ update_times: 2
eval_dataset: alpaca_zh_demo
```

`update_times` is the number of dynamic selections per Flex epoch. For multi-epoch runs, set `num_train_epochs: N` and keep `train_step: 0`; if `train_step > 0`, it fixes total steps and overrides `num_train_epochs`.

---

### Step 4: Run Training
Expand Down Expand Up @@ -185,4 +187,3 @@ The merged model will be saved in:
## 3. Model Evaluation

It is recommended to use the [DataFlow](https://github.com/OpenDCAI/DataFlow) [Model QA Evaluation Pipeline](https://opendcai.github.io/DataFlow-Doc/zh/guide/2k5wjgls/) for systematic evaluation of the generated model.

3 changes: 1 addition & 2 deletions docs/en/notes/guide/selector/selector_nice.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ early_stopping_min_delta: 0.01
**Parameter description:**

* `component_name`: Must match the `nice` component in `components.yaml`, determining reward backend and projection dimensions.
* `warmup_step` / `update_step` / `update_times`: Control the dynamic selection schedule; total steps = `warmup_step + update_step × update_times`.
* `warmup_step` / `update_step` / `update_times`: Control the dynamic selection schedule per Flex epoch. Total steps are derived from `num_train_epochs` unless `train_step > 0`.
* `eval_dataset`: Validation set (Alpaca/ShareGPT style); reward model is used for scoring during generation.
* `output_dir`: Path to save LoRA adapters and caches.

Expand Down Expand Up @@ -226,4 +226,3 @@ The merged model will be saved to:

It is recommended to use the [DataFlow](https://github.com/OpenDCAI/DataFlow)
[Model QA Evaluation Pipeline](https://opendcai.github.io/DataFlow-Doc/en/guide/2k5wjgls/) to systematically evaluate the generated model, and to inspect the scoring logs in `cache_dir` to analyze the reward model’s sensitivity to different samples.

3 changes: 1 addition & 2 deletions docs/en/notes/guide/selector/selector_offline_near.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ update_times: 2
**Notes:**

* `component_name: near` enables the NEAR component.
* `warmup_step / update_step / update_times` decide **when** and **how often** to reselect the training subset; total steps ≈ `warmup_step + update_step × update_times`.
* `warmup_step / update_step / update_times` decide **when** and **how often** to re-select the training subset in each Flex epoch. Total steps are derived from `num_train_epochs` unless `train_step > 0`.
* total batch_size=device_number x per_device_train_batch_size x gradient_accumulation_steps

---
Expand Down Expand Up @@ -201,4 +201,3 @@ llamafactory-cli export llama3_lora_sft.yaml

We recommend using the [DataFlow](https://github.com/OpenDCAI/DataFlow) QA evaluation pipeline to compare **NEAR** against **Less** and **random sampling**.


3 changes: 1 addition & 2 deletions docs/en/notes/guide/selector/selector_offline_tsds.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ update_times: 2
**Notes:**

* `component_name: tsds` enables the TSDS component.
* `warmup_step / update_step / update_times` decide **when** and **how often** to reselect the training subset; total steps ≈ `warmup_step + update_step × update_times`.
* `warmup_step / update_step / update_times` decide **when** and **how often** to re-select the training subset in each Flex epoch. Total steps are derived from `num_train_epochs` unless `train_step > 0`.
* total batch_size=device_number x per_device_train_batch_size x gradient_accumulation_steps

---
Expand Down Expand Up @@ -235,4 +235,3 @@ llamafactory-cli export llama3_lora_sft.yaml

We recommend using the [DataFlow](https://github.com/OpenDCAI/DataFlow) QA evaluation pipeline to compare **TSDS** against **Less** and **random sampling**.


4 changes: 2 additions & 2 deletions docs/en/notes/guide/selector/selector_zeroth.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,10 +123,10 @@ eval_dataset: alpaca_zh_demo
* `output_dir`: Output directory of dynamic fine-tuning (LoRA adapter).
* `warmup_step`: Number of warmup steps before the first sample selection.
* `update_step`: Number of steps between each dynamic data selection.
* `update_times`: Total number of dynamic data selection iterations.
* `update_times`: Number of dynamic data selection iterations per Flex epoch.
* `eval_dataset`: Validation dataset.

Both `dataset` and `eval_dataset` can be selected from `DataFlex/data/dataset_info.json` or local JSON files in ShareGPT/Alpaca format. Note: The training set size significantly affects computation cost. Total steps = `warmup_step + update_step × update_times`.
Both `dataset` and `eval_dataset` can be selected from `DataFlex/data/dataset_info.json` or local JSON files in ShareGPT/Alpaca format. Note: the training set size significantly affects computation cost. Steps per Flex epoch = `warmup_step + update_step × update_times`. Total steps are derived from `num_train_epochs` unless `train_step > 0`.

---

Expand Down
7 changes: 5 additions & 2 deletions docs/en/notes/guide/weighter/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,8 @@ train_type: dynamic_weight
components_cfg_file: src/dataflex/configs/components.yaml
component_name: loss
warmup_step: 1
train_step: 3 # total train steps (including warmup)
```
num_train_epochs: 1.0
train_step: 0 # set positive to fix total steps and override num_train_epochs
```

For multi-epoch runs, set `num_train_epochs: N` and keep `train_step: 0`. `warmup_step` is a global step threshold and does not reset each epoch.
8 changes: 5 additions & 3 deletions docs/en/notes/guide/weighter/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,8 @@ train_type: dynamic_weight # Select trainer type. Available options:
components_cfg_file: src/dataflex/configs/components.yaml
component_name: loss # Select component name, corresponding to components defined in components_cfg_file
warmup_step: 1
train_step: 3 # Total training steps (including warm_up)
num_train_epochs: 1.0
train_step: 0 # set positive to fix total steps and override num_train_epochs
```

### Parameter Details
Expand All @@ -35,7 +36,8 @@ train_step: 3 # Total training steps (including warm_up)
- `component_name`: Defines the specific strategy for data weighting. For example, `loss` uses a loss-based weighter.
- `components_cfg_file`: Defines the parameter file containing specific parameters for the corresponding strategy.
- `warmup_step`: Before the first dynamic weighting, the model needs to perform `warmup_step` steps of regular training. This helps the model establish initial understanding of data distribution.
- `train_step`: Total training steps (including warmup). Weight Trainer will dynamically weight samples at each training step after warmup completion.
- `train_step`: Optional fixed total steps. If `train_step > 0`, it overrides `num_train_epochs`; for multi-epoch runs, keep `train_step: 0`.
- `num_train_epochs`: Controls the number of epochs when `train_step: 0`. `warmup_step` is a global step threshold and does not reset each epoch.

## How to Add Custom Weighter in DataFlex

Expand Down Expand Up @@ -140,4 +142,4 @@ weighters:

#### Key Points

- `params`: All parameters defined under this block will be passed as keyword arguments to the `__init__` constructor of the `CustomWeighter` class. For example, the `strategy` value here will be passed to the `strategy` parameter of the `__init__` method.
- `params`: All parameters defined under this block will be passed as keyword arguments to the `__init__` constructor of the `CustomWeighter` class. For example, the `strategy` value here will be passed to the `strategy` parameter of the `__init__` method.
6 changes: 3 additions & 3 deletions docs/zh/notes/guide/mixer/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ update_times: 2
- `init_mixture_proportions`: 初始采样对应的比例,`mixture_sample_rule='mixture'` 时需要指定。
- `warmup_step`: 在执行第一次动态配比更新前,模型需要先进行 `warmup_step` 步的常规训练。这有助于模型建立对数据分布的初步认知。
- `update_step`: 领域配比更新的频率。每当训练进行 `update_step` 步后,Mixer 将被触发,更新领域配比用于下一阶段的训练。
- `update_times`: 整个训练过程中,动态数据配比计算的总次数。因此总的训练步数为 `(update_times * update_step + warmup_step) * global_batch_size`
- `update_times`: 每个 Flex epoch 内动态数据配比计算的次数。总步数由 `num_train_epochs` 推导;若 `train_step > 0`,则以 `train_step` 为准。

### 静态混合配置

Expand All @@ -47,7 +47,7 @@ train_type: dynamic_mix
static_mix: true # 是否固定初始静态混合比例(仅在dynamic_mix训练器中生效)
mixture_sample_rule: mixture # 初始采样规则
init_mixture_proportions: [0.7, 0.3] # 对应初始的比例,可通过额外算法自行调整
train_step: 3 # 总训练步数(仅在dynamic_mix训练器中生效),不考虑warmup和update steps
train_step: 3 # 固定总步数;设为 0 时由 num_train_epochs 控制
```

启用静态混合后,训练过程中将使用固定的 `init_mixture_proportions` 比例,不再动态调整。
Expand Down Expand Up @@ -129,4 +129,4 @@ mixers:

#### 关键点说明

- `params`: 该块下定义的所有参数都将作为关键字参数传递给 `RandomMixer` 类的 `__init__` 构造函数。例如,这里的 `seed` 值会传递给 `__init__` 方法的 `seed` 参数。
- `params`: 该块下定义的所有参数都将作为关键字参数传递给 `RandomMixer` 类的 `__init__` 构造函数。例如,这里的 `seed` 值会传递给 `__init__` 方法的 `seed` 参数。
4 changes: 4 additions & 0 deletions docs/zh/notes/guide/selector/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ component_name: less
warmup_step: 4
update_step: 3
update_times: 2
num_train_epochs: 1.0
train_step: 0

eval_dataset: alpaca_zh_demo
```

`update_times` 表示每个 Flex epoch 内的动态选择次数。单个 Flex epoch 使用 `num_train_epochs: 1.0`;多 epoch 使用 `num_train_epochs: N` 且保持 `train_step: 0`。若 `train_step > 0`,则固定总步数并覆盖 `num_train_epochs`。
3 changes: 2 additions & 1 deletion docs/zh/notes/guide/selector/selector_delta_loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ update_times: 2
eval_dataset: alpaca_zh_demo
```

`update_times` 表示每个 Flex epoch 内的动态选择次数。Delta Loss 需要 `update_times >= 2`:第一次选择建立 initial loss,后续选择根据 delta loss 更新样本。多 epoch 训练请设置 `num_train_epochs: N` 且保持 `train_step: 0`。

---

### 步骤四:运行训练
Expand Down Expand Up @@ -178,4 +180,3 @@ llamafactory-cli export llama3_lora_sft.yaml
## 3. 模型评估

推荐使用[DataFlow](https://github.com/OpenDCAI/DataFlow)的[模型QA能力评估流水线](https://opendcai.github.io/DataFlow-Doc/zh/guide/2k5wjgls/)对生成后的模型进行系统性评估。

5 changes: 2 additions & 3 deletions docs/zh/notes/guide/selector/selector_less.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,12 +146,12 @@ eval_dataset: alpaca_zh_demo
* `output_dir`: 动态微调结果(LoRA 适配器)的输出路径。
* `warmup_step`: 训练初期第一次训练数据选择前,进行warmup的步数。
* `update_step`: 每次训练数据动态选择的步数。
* `update_times`: 数据动态选择的总次数
* `update_times`: 每个 Flex epoch 内的数据动态选择次数
* `eval_dataset`: 验证数据集。

dataset和eval_dataset可选`DataFlex/data/dataset_info.json`中数据,或本地路径下sharegpt或alpaca格式的json数据。注意该方法的情形下,训练集规模会较大影响计算成本。

总步数 = `warmup_step + update_step × update_times`。
每个 Flex epoch 的步数 = `warmup_step + update_step × update_times`。总步数由 `num_train_epochs` 推导;若 `train_step > 0`,则以 `train_step` 为准

---

Expand Down Expand Up @@ -207,4 +207,3 @@ llamafactory-cli export llama3_lora_sft.yaml
## 3. 模型评估

推荐使用[DataFlow](https://github.com/OpenDCAI/DataFlow)的[模型QA能力评估流水线](https://opendcai.github.io/DataFlow-Doc/zh/guide/2k5wjgls/)对生成后的模型进行系统性评估。

3 changes: 2 additions & 1 deletion docs/zh/notes/guide/selector/selector_loss.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ update_times: 2
eval_dataset: alpaca_zh_demo
```

`update_times` 表示每个 Flex epoch 内的动态选择次数。多 epoch 训练请设置 `num_train_epochs: N` 且保持 `train_step: 0`;若 `train_step > 0`,则固定总步数并覆盖 `num_train_epochs`。

---

### 步骤四:运行训练
Expand Down Expand Up @@ -184,4 +186,3 @@ llamafactory-cli export llama3_lora_sft.yaml
## 3. 模型评估

推荐使用[DataFlow](https://github.com/OpenDCAI/DataFlow)的[模型QA能力评估流水线](https://opendcai.github.io/DataFlow-Doc/zh/guide/2k5wjgls/)对生成后的模型进行系统性评估。

2 changes: 1 addition & 1 deletion docs/zh/notes/guide/selector/selector_nice.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@ early_stopping_min_delta: 0.01

**参数说明:**
* `component_name`: 与 `components.yaml` 中的 `nice` 组件保持一致,决定奖励后端与投影维度等设置。
* `warmup_step` / `update_step` / `update_times`: 决定动态选择的触发节奏;总步数 = `warmup_step + update_step × update_times`
* `warmup_step` / `update_step` / `update_times`: 决定每个 Flex epoch 内的动态选择节奏;总步数由 `num_train_epochs` 推导,若 `train_step > 0` 则以 `train_step` 为准
* `eval_dataset`: 验证集,可以是 Alpaca/ShareGPT 样式,生成时会调用奖励模型打分。
* `output_dir`: LoRA 适配器与缓存保存路径。

Expand Down
4 changes: 2 additions & 2 deletions docs/zh/notes/guide/selector/selector_offline_near.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,7 +173,7 @@ update_times: 2
**参数说明:**

* `component_name: near`:启用 NEAR 组件。
* `warmup_step / update_step / update_times`:决定**何时**与**多久**进行一次动态选择;总步数 ≈ `warmup_step + update_step × update_times`
* `warmup_step / update_step / update_times`:决定每个 Flex epoch 内**何时**与**多久**进行一次动态选择;总步数由 `num_train_epochs` 推导,若 `train_step > 0` 则以 `train_step` 为准
* 总batch_size=device_number x per_device_train_batch_size x gradient_accumulation_steps


Expand Down Expand Up @@ -218,4 +218,4 @@ llamafactory-cli export llama3_lora_sft.yaml

## 9. 评估与对比

建议使用 [DataFlow](https://github.com/OpenDCAI/DataFlow) 的模型 QA 评估流水线,对 **NEAR** 与 **Less**、**随机采样** 等策略进行并列评测
建议使用 [DataFlow](https://github.com/OpenDCAI/DataFlow) 的模型 QA 评估流水线,对 **NEAR** 与 **Less**、**随机采样** 等策略进行并列评测
4 changes: 2 additions & 2 deletions docs/zh/notes/guide/selector/selector_offline_tsds.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ update_times: 2
**参数说明:**

* `component_name: tsds`:启用 TSDS 组件。
* `warmup_step / update_step / update_times`:决定**何时**与**多久**进行一次动态选择;总步数 ≈ `warmup_step + update_step × update_times`
* `warmup_step / update_step / update_times`:决定每个 Flex epoch 内**何时**与**多久**进行一次动态选择;总步数由 `num_train_epochs` 推导,若 `train_step > 0` 则以 `train_step` 为准
* 总batch_size=device_number x per_device_train_batch_size x gradient_accumulation_steps

---
Expand Down Expand Up @@ -253,4 +253,4 @@ llamafactory-cli export llama3_lora_sft.yaml

## 9. 评估与对比

建议使用 [DataFlow](https://github.com/OpenDCAI/DataFlow) 的模型 QA 评估流水线,对 **TSDS** 与 **Less**、**随机采样** 等策略进行并列评测
建议使用 [DataFlow](https://github.com/OpenDCAI/DataFlow) 的模型 QA 评估流水线,对 **TSDS** 与 **Less**、**随机采样** 等策略进行并列评测
Loading
Loading