Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob

## 🚀 News

* [2025-12] Trinity-RFT has supported [tinker](https://thinkingmachines.ai/tinker/) training backend, which enables model training on devices **without GPUs**.
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
* [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
* [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
Expand Down Expand Up @@ -154,6 +155,10 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor

> [!NOTE]
> This project is currently under active development. Comments and suggestions are welcome!
>
> **No GPU? No problem!** You can still try it out:
> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.


### Step 1: installation
Expand Down Expand Up @@ -186,6 +191,9 @@ Choose one of the following options:
conda create -n trinity python=3.12
conda activate trinity

pip install -e ".[verl]"
# If you have no GPU, use Tinker instead.
# pip install -e ".[tinker]"
pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# if you encounter issues when installing flash-attn, try:
Expand All @@ -198,6 +206,9 @@ pip install -e ".[flash_attn]"
python3.10 -m venv .venv
source .venv/bin/activate

pip install -e ".[verl]"
# If you have no GPU, use Tinker instead.
# pip install -e ".[tinker]"
pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# if you encounter issues when installing flash-attn, try:
Expand Down
11 changes: 11 additions & 0 deletions README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:

## 🚀 新闻

* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
* [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。
* [2025-11] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。
* [2025-11] 推出 [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)).
Expand Down Expand Up @@ -79,6 +80,10 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:

> [!NOTE]
> 更多教程请参考 [Trinity-RFT 文档](https://modelscope.github.io/Trinity-RFT/)。
>
> 没有 GPU?没问题!你仍然可以尝试以下方案:
> 1. 按照安装步骤操作(可跳过 GPU 专用的软件包,例如 `flash-attn`)
> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。



Expand Down Expand Up @@ -185,6 +190,9 @@ cd Trinity-RFT
conda create -n trinity python=3.12
conda activate trinity

pip install -e ".[verl]"
# 如果没有GPU,需要使用Tinker则改为
# pip install -e ".[tinker]"
pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# 如果安装 flash-attn 时遇到问题,可尝试:
Expand All @@ -197,6 +205,9 @@ pip install -e ".[flash_attn]"
python3.10 -m venv .venv
source .venv/bin/activate

pip install -e ".[verl]"
# 如果没有GPU,需要使用Tinker则改为
# pip install -e ".[tinker]"
pip install -e ".[dev]"
pip install -e ".[flash_attn]"
# 如果安装 flash-attn 时遇到问题,可尝试:
Expand Down
Binary file added docs/sphinx_doc/assets/tinker-gsm8k.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
33 changes: 32 additions & 1 deletion docs/sphinx_doc/source/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,21 @@ model:
max_response_tokens: 16384
min_response_tokens: 1
enable_prompt_truncation: true
repetition_penalty: 1.0
lora_configs: null
rope_scaling: null
rope_theta: null
tinker:
enable: false
base_model: null
rank: 32
seed: null
train_mlp: true
train_attn: true
train_unembed: true
```

- `model_path`: Path to the model being trained.
- `model_path`: Path to the model being trained. If `tinker` is enabled, this is the path to the local tokenizer.
- `critic_model_path`: Optional path to a separate critic model. If empty, defaults to `model_path`.
- `custom_chat_template`: Optional custom chat template in string format. If not specified, the system will use the default chat template from tokenizer.
- `chat_template_path`: Optional path to the chat template file in jinja2 type; overrides `custom_chat_template` if set. If not specified, the system will use the default chat template from tokenizer.
Expand All @@ -175,6 +187,25 @@ model:
- `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
- `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
- `enable_prompt_truncation`: Whether to truncate the prompt. Default is `true`. If set to `true`, the prompt will be truncated to `max_prompt_tokens` tokens; if set to `false`, the prompt will not be truncated and there is a risk that the prompt length plus response length exceeds `max_model_len`. This function does not work with openai api mode.
- `repetition_penalty`: Repetition penalty factor. Default is `1.0`.
- `lora_configs`: Optional LoRA configuration. If not specified, defaults to `null`. Currently, only one LoRA configuration is supported, and this configuration will not be applied if `tinker` is enabled.
- `name`: Name of the LoRA. Default is `None`.
- `path`: Path to the LoRA. Default is `None`.
- `base_model_name`: Name of the base model for LoRA. If not specified, defaults to `None`.
- `lora_rank`: Rank of the LoRA. Default is `32`.
- `lora_alpha`: Alpha value of the LoRA. Default is `32`.
- `lora_dtype`: Data type of the LoRA. Default is `auto`.
- `target_modules`: List of target modules for LoRA. Default is `all-linear`.
- `rope_scaling`: Optional RoPE scaling configuration in JSON format. If not specified, defaults to `null`.
- `rope_theta`: Optional RoPE theta value. If not specified, defaults to `null`.
- `tinker`: Optional Tinker configuration. Note: LoRA configuration will be ignored if Tinker is enabled.
- `enable`: Whether to enable Tinker. Default is `false`.
- `base_model`: Path to the base model for Tinker. If not specified, defaults to `model_path`.
- `rank`: LoRA rank controlling the size of adaptation matrices. Default is `32`.
- `seed`: Random seed for Tinker. If not specified, defaults to `null`.
- `train_mlp`: Whether to train the MLP layer. Default is `true`.
- `train_attn`: Whether to train the attention layer. Default is `true`.
- `train_unembed`: Whether to train the unembedding layer. Default is `true`.

```{tip}
If you are using the openai API provided by Explorer, only `max_model_len` will take effect, and the value of `max_response_tokens`, `max_prompt_tokens`, and `min_response_tokens` will be ignored. When `max_tokens` is not independently specified, each API call will generate up to `max_model_len - prompt_length` tokens. Therefore, please ensure that the prompt length is less than `max_model_len` when using the API.
Expand Down
33 changes: 32 additions & 1 deletion docs/sphinx_doc/source_zh/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,9 +164,21 @@ model:
max_response_tokens: 16384
min_response_tokens: 1
enable_prompt_truncation: true
repetition_penalty: 1.0
lora_configs: null
rope_scaling: null
rope_theta: null
tinker:
enable: false
base_model: null
rank: 32
seed: null
train_mlp: true
train_attn: true
train_unembed: true
```

- `model_path`: 被训练模型的路径。
- `model_path`: 被训练模型的路径。如果启用了`tinker`,则该路径为本地 tokenizer 的路径。
- `critic_model_path`: 可选的独立 critic 模型路径。若为空,则默认为 `model_path`。
- `custom_chat_template`: 可选的自定义 chat template 字符串格式。若未指定,系统会使用 tokenizer 的默认 chat template。
- `chat_template_path`: 可选的 chat template 文件路径,类型通常为 jinja2;若设置,则覆盖 `custom_chat_template`。若未指定,系统会使用 tokenizer 的默认 chat template。
Expand All @@ -175,6 +187,25 @@ model:
- `max_response_tokens`: 模型生成的回复中允许的最大 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
- `min_response_tokens`: 模型生成的回复中允许的最小 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
- `enable_prompt_truncation`: 是否截断 prompt。默认为 `true`。若设置为 `true`,则 prompt 将被截断为 `max_prompt_tokens` 个 token;若设置为 `false`,则 prompt 不会被截断,存在 prompt 和 response 长度之和超过 `max_model_len` 的风险。在 OpenAI API 模式下不生效。
- `repetition_penalty`:重复惩罚因子。默认值为 `1.0`。
- `lora_configs`:可选的 LoRA 配置。若未指定,则默认为 `null`。目前仅支持一个 LoRA 配置,并且如果启用了`tinker`,则不会使用此LoRA配置。
- `name`:LoRA 的名称。默认为 `None`。
- `path`:LoRA 的路径。默认为 `None`。
- `base_model_name`:LoRA 所基于的基础模型名称。若未指定,则默认为 `None`。
- `lora_rank`:LoRA 的秩(rank)。默认为 `32`。
- `lora_alpha`:LoRA 的 alpha 值。默认为 `32`。
- `lora_dtype`:LoRA 的数据类型。默认为 `auto`。
- `target_modules`:LoRA 的目标模块列表。默认为 `all-linear`。
- `rope_scaling`:可选的 RoPE 缩放配置,采用 JSON 格式。若未指定,则默认为 `null`。
- `rope_theta`:可选的 RoPE theta 值。若未指定,则默认为 `null`。
- `tinker`:可选的 Tinker 配置。注意:若启用 Tinker,则 LoRA 配置将被忽略。
- `enable`:是否启用 Tinker。默认为 `false`。
- `base_model`:Tinker 所使用的基础模型路径。若未指定,则默认为 `model_path`。
- `rank`:控制适配矩阵大小的 LoRA 秩(rank)。默认为 `32`。
- `seed`:Tinker 使用的随机种子。若未指定,则默认为 `null`。
- `train_mlp`:是否训练 MLP 层。默认为 `true`。
- `train_attn`:是否训练注意力层。默认为 `true`。
- `train_unembed`:是否训练反嵌入(unembedding)层。默认为 `true`。

```{tip}
如果使用的是 Explorer 提供的 openai API,则只有 `max_model_len` 会生效,而 `max_response_tokens`、`max_prompt_tokens` 和 `min_response_tokens` 的值将被忽略,在没有独立指定 `max_tokens` 时,每次 API 调用将生成最多 `max_model_len - prompt_length` 个 token,因此在使用时请确保 prompt 长度小于 `max_model_len`。
Expand Down
Loading