modelscope · chenyushuo · Dec 24, 2025 · Dec 25, 2025 · Dec 25, 2025 · Dec 25, 2025
diff --git a/README.md b/README.md
@@ -42,6 +42,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 
 ## 🚀 News
 
+* [2025-12] Trinity-RFT has supported [tinker](https://thinkingmachines.ai/tinker/) training backend, which enables model training on devices **without GPUs**.
 * [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
 * [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
 * [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data  ([paper](https://arxiv.org/pdf/2510.25441)).
@@ -154,6 +155,10 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
 
 > [!NOTE]
 > This project is currently under active development. Comments and suggestions are welcome!
+>
+> **No GPU? No problem!** You can still try it out:
+> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
+> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
 
 
 ### Step 1: installation
@@ -186,6 +191,9 @@ Choose one of the following options:
 conda create -n trinity python=3.12
 conda activate trinity
 
+pip install -e ".[verl]"
+# If you have no GPU, use Tinker instead.
+# pip install -e ".[tinker]"
 pip install -e ".[dev]"
 pip install -e ".[flash_attn]"
 # if you encounter issues when installing flash-attn, try:
@@ -198,6 +206,9 @@ pip install -e ".[flash_attn]"
 python3.10 -m venv .venv
 source .venv/bin/activate
 
+pip install -e ".[verl]"
+# If you have no GPU, use Tinker instead.
+# pip install -e ".[tinker]"
 pip install -e ".[dev]"
 pip install -e ".[flash_attn]"
 # if you encounter issues when installing flash-attn, try:

diff --git a/README_zh.md b/README_zh.md
@@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 
 ## 🚀 新闻
 
+* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端，可在**无 GPU 的设备**上进行模型训练。
 * [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务，让 AI 智能体能够理解模糊症状、主动询问后续问题，并提供精准推荐（[新闻](https://tech.china.com.cn/sx/20251201/411376.shtml)）。
 * [2025-11] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布：修复若干 Bug。
 * [2025-11] 推出 [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask)：利用离线专家数据，训练具备主动问询能力的对话智能体（[论文](https://arxiv.org/pdf/2510.25441)）.
@@ -79,6 +80,10 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 
 > [!NOTE]
 > 更多教程请参考 [Trinity-RFT 文档](https://modelscope.github.io/Trinity-RFT/)。
+>
+> 没有 GPU？没问题！你仍然可以尝试以下方案：
+> 1. 按照安装步骤操作（可跳过 GPU 专用的软件包，例如 `flash-attn`）
+> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**，该示例专为仅使用 CPU 的系统设计。
 
 
 
@@ -185,6 +190,9 @@ cd Trinity-RFT
 conda create -n trinity python=3.12
 conda activate trinity
 
+pip install -e ".[verl]"
+# 如果没有GPU，需要使用Tinker则改为
+# pip install -e ".[tinker]"
 pip install -e ".[dev]"
 pip install -e ".[flash_attn]"
 # 如果安装 flash-attn 时遇到问题，可尝试：
@@ -197,6 +205,9 @@ pip install -e ".[flash_attn]"
 python3.10 -m venv .venv
 source .venv/bin/activate
 
+pip install -e ".[verl]"
+# 如果没有GPU，需要使用Tinker则改为
+# pip install -e ".[tinker]"
 pip install -e ".[dev]"
 pip install -e ".[flash_attn]"
 # 如果安装 flash-attn 时遇到问题，可尝试：

diff --git a/docs/sphinx_doc/assets/tinker-gsm8k.png b/docs/sphinx_doc/assets/tinker-gsm8k.png
diff --git a/docs/sphinx_doc/source/tutorial/trinity_configs.md b/docs/sphinx_doc/source/tutorial/trinity_configs.md
@@ -164,9 +164,21 @@ model:
   max_response_tokens: 16384
   min_response_tokens: 1
   enable_prompt_truncation: true
+  repetition_penalty: 1.0
+  lora_configs: null
+  rope_scaling: null
+  rope_theta: null
+  tinker:
+    enable: false
+    base_model: null
+    rank: 32
+    seed: null
+    train_mlp: true
+    train_attn: true
+    train_unembed: true
 ```
 
-- `model_path`: Path to the model being trained.
+- `model_path`: Path to the model being trained. If `tinker` is enabled, this is the path to the local tokenizer.
 - `critic_model_path`: Optional path to a separate critic model. If empty, defaults to `model_path`.
 - `custom_chat_template`: Optional custom chat template in string format. If not specified, the system will use the default chat template from tokenizer.
 - `chat_template_path`: Optional path to the chat template file in jinja2 type; overrides `custom_chat_template` if set. If not specified, the system will use the default chat template from tokenizer.
@@ -175,6 +187,25 @@ model:
 - `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
 - `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
 - `enable_prompt_truncation`: Whether to truncate the prompt. Default is `true`. If set to `true`, the prompt will be truncated to `max_prompt_tokens` tokens; if set to `false`, the prompt will not be truncated and there is a risk that the prompt length plus response length exceeds `max_model_len`. This function does not work with openai api mode.
+- `repetition_penalty`: Repetition penalty factor. Default is `1.0`.
+- `lora_configs`: Optional LoRA configuration. If not specified, defaults to `null`. Currently, only one LoRA configuration is supported, and this configuration will not be applied if `tinker` is enabled.
+  - `name`: Name of the LoRA. Default is `None`.
+  - `path`: Path to the LoRA. Default is `None`.
+  - `base_model_name`: Name of the base model for LoRA. If not specified, defaults to `None`.
+  - `lora_rank`: Rank of the LoRA. Default is `32`.
+  - `lora_alpha`: Alpha value of the LoRA. Default is `32`.
+  - `lora_dtype`: Data type of the LoRA. Default is `auto`.
+  - `target_modules`: List of target modules for LoRA. Default is `all-linear`.
+- `rope_scaling`: Optional RoPE scaling configuration in JSON format. If not specified, defaults to `null`.
+- `rope_theta`: Optional RoPE theta value. If not specified, defaults to `null`.
+- `tinker`: Optional Tinker configuration. Note: LoRA configuration will be ignored if Tinker is enabled.
+  - `enable`: Whether to enable Tinker. Default is `false`.
+  - `base_model`: Path to the base model for Tinker. If not specified, defaults to `model_path`.
+  - `rank`: LoRA rank controlling the size of adaptation matrices. Default is `32`.
+  - `seed`: Random seed for Tinker. If not specified, defaults to `null`.
+  - `train_mlp`: Whether to train the MLP layer. Default is `true`.
+  - `train_attn`: Whether to train the attention layer. Default is `true`.
+  - `train_unembed`: Whether to train the unembedding layer. Default is `true`.
 
 ```{tip}
 If you are using the openai API provided by Explorer, only `max_model_len` will take effect, and the value of `max_response_tokens`, `max_prompt_tokens`, and `min_response_tokens` will be ignored. When `max_tokens` is not independently specified, each API call will generate up to `max_model_len - prompt_length` tokens. Therefore, please ensure that the prompt length is less than `max_model_len` when using the API.

diff --git a/docs/sphinx_doc/source_zh/tutorial/trinity_configs.md b/docs/sphinx_doc/source_zh/tutorial/trinity_configs.md
@@ -164,9 +164,21 @@ model:
   max_response_tokens: 16384
   min_response_tokens: 1
   enable_prompt_truncation: true
+  repetition_penalty: 1.0
+  lora_configs: null
+  rope_scaling: null
+  rope_theta: null
+  tinker:
+    enable: false
+    base_model: null
+    rank: 32
+    seed: null
+    train_mlp: true
+    train_attn: true
+    train_unembed: true
 ```
 
-- `model_path`: 被训练模型的路径。
+- `model_path`: 被训练模型的路径。如果启用了`tinker`，则该路径为本地 tokenizer 的路径。
 - `critic_model_path`: 可选的独立 critic 模型路径。若为空，则默认为 `model_path`。
 - `custom_chat_template`: 可选的自定义 chat template 字符串格式。若未指定，系统会使用 tokenizer 的默认 chat template。
 - `chat_template_path`: 可选的 chat template 文件路径，类型通常为 jinja2；若设置，则覆盖 `custom_chat_template`。若未指定，系统会使用 tokenizer 的默认 chat template。
@@ -175,6 +187,25 @@ model:
 - `max_response_tokens`: 模型生成的回复中允许的最大 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
 - `min_response_tokens`: 模型生成的回复中允许的最小 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
 - `enable_prompt_truncation`: 是否截断 prompt。默认为 `true`。若设置为 `true`，则 prompt 将被截断为 `max_prompt_tokens` 个 token；若设置为 `false`，则 prompt 不会被截断，存在 prompt 和 response 长度之和超过 `max_model_len` 的风险。在 OpenAI API 模式下不生效。
+- `repetition_penalty`：重复惩罚因子。默认值为 `1.0`。
+- `lora_configs`：可选的 LoRA 配置。若未指定，则默认为 `null`。目前仅支持一个 LoRA 配置，并且如果启用了`tinker`，则不会使用此LoRA配置。
+  - `name`：LoRA 的名称。默认为 `None`。
+  - `path`：LoRA 的路径。默认为 `None`。
+  - `base_model_name`：LoRA 所基于的基础模型名称。若未指定，则默认为 `None`。
+  - `lora_rank`：LoRA 的秩（rank）。默认为 `32`。
+  - `lora_alpha`：LoRA 的 alpha 值。默认为 `32`。
+  - `lora_dtype`：LoRA 的数据类型。默认为 `auto`。
+  - `target_modules`：LoRA 的目标模块列表。默认为 `all-linear`。
+- `rope_scaling`：可选的 RoPE 缩放配置，采用 JSON 格式。若未指定，则默认为 `null`。
+- `rope_theta`：可选的 RoPE theta 值。若未指定，则默认为 `null`。
+- `tinker`：可选的 Tinker 配置。注意：若启用 Tinker，则 LoRA 配置将被忽略。
+  - `enable`：是否启用 Tinker。默认为 `false`。
+  - `base_model`：Tinker 所使用的基础模型路径。若未指定，则默认为 `model_path`。
+  - `rank`：控制适配矩阵大小的 LoRA 秩（rank）。默认为 `32`。
+  - `seed`：Tinker 使用的随机种子。若未指定，则默认为 `null`。
+  - `train_mlp`：是否训练 MLP 层。默认为 `true`。
+  - `train_attn`：是否训练注意力层。默认为 `true`。
+  - `train_unembed`：是否训练反嵌入（unembedding）层。默认为 `true`。
 
 ```{tip}
 如果使用的是 Explorer 提供的 openai API，则只有 `max_model_len` 会生效，而 `max_response_tokens`、`max_prompt_tokens` 和 `min_response_tokens` 的值将被忽略，在没有独立指定 `max_tokens` 时，每次 API 调用将生成最多 `max_model_len - prompt_length` 个 token，因此在使用时请确保 prompt 长度小于 `max_model_len`。