llama : (mrope) allow using normal 1D position for text token by ngxson · Pull Request #13138 · ggml-org/llama.cpp

ngxson · 2025-04-27T16:25:27Z

For M-RoPE, we want to use normal 1D position for text token.

This is done to simplify the use case of llama_decode() with text tokens, which is needed for adding Qwen2VL to libmtmd and to server.cpp

This should also align with #11875, because in the future we want text position to be tracked internally by libllama

ngxson · 2025-04-27T16:26:49Z


 ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
-    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
+    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);


@ggerganov Because build_inp_attn_scale is currently used exclusively by llama 4, do you think we should get rid of n_pos_per_embd and replace it with a GGML_ASSERT(n_pos_per_embd() == 1) ?

The main motivation is to make this code looks less complicated, as there is ~0% chance Qwen model gonna use this

Yes, we can do that.

On second thought, build_inp_attn_scale should work well even in the case of N pos per token.

That's because the scale is applied per embedding, and the number of embedding is independent from N pos per token.

In any cases, I removed the n_pos_per_embd in 9cd16a3 , merging this PR once the CI is green

ggerganov · 2025-04-28T05:27:25Z


 ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
-    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
+    auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);


Yes, we can do that.

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

llama : (mrope) use normal position for text token

bd310ff

ngxson requested a review from ggerganov April 27, 2025 16:25

github-actions Bot added the examples label Apr 27, 2025

ngxson commented Apr 27, 2025

View reviewed changes

ngxson mentioned this pull request Apr 27, 2025

mtmd : add qwen2vl and qwen2.5vl #13141

Merged

ggerganov approved these changes Apr 28, 2025

View reviewed changes

rm n_pos_per_embd from llm_graph_input_attn_temp

9cd16a3

ngxson merged commit d2b2031 into ggml-org:master Apr 28, 2025

ngxson mentioned this pull request Apr 28, 2025

llama-graph : fix text position for mrope #13159

Merged

timwu pushed a commit to timwu/llama.cpp that referenced this pull request Dec 20, 2025

llama : (mrope) allow using normal 1D position for text token (ggml-o…

b2bacac

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : (mrope) allow using normal 1D position for text token (ggml-o…

c015d55

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

llama : (mrope) allow using normal 1D position for text token (ggml-o…

fde3b56

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026

llama : (mrope) allow using normal 1D position for text token (ggml-o…

98cbeeb

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

llama : (mrope) allow using normal 1D position for text token (ggml-o…

e31579b

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

llama : (mrope) allow using normal 1D position for text token (ggml-o…

4fad44a

…rg#13138) * llama : (mrope) use normal position for text token * rm n_pos_per_embd from llm_graph_input_attn_temp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : (mrope) allow using normal 1D position for text token#13138

llama : (mrope) allow using normal 1D position for text token#13138
ngxson merged 2 commits into
ggml-org:masterfrom
ngxson:xsn/mrope_normal_pos_text

ngxson commented Apr 27, 2025 •

edited

Loading

Uh oh!

ngxson Apr 27, 2025 •

edited

Loading

Uh oh!

ggerganov Apr 28, 2025

Uh oh!

ngxson Apr 28, 2025

Uh oh!

ggerganov Apr 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ngxson commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 28, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ngxson commented Apr 27, 2025 •

edited

Loading

ngxson Apr 27, 2025 •

edited

Loading