Skip to content

llama : (mrope) allow using normal 1D position for text token#13138

Merged
ngxson merged 2 commits into
ggml-org:masterfrom
ngxson:xsn/mrope_normal_pos_text
Apr 28, 2025
Merged

llama : (mrope) allow using normal 1D position for text token#13138
ngxson merged 2 commits into
ggml-org:masterfrom
ngxson:xsn/mrope_normal_pos_text

Conversation

@ngxson

@ngxson ngxson commented Apr 27, 2025

Copy link
Copy Markdown
Collaborator

For M-RoPE, we want to use normal 1D position for text token.

This is done to simplify the use case of llama_decode() with text tokens, which is needed for adding Qwen2VL to libmtmd and to server.cpp

This should also align with #11875, because in the future we want text position to be tracked internally by libllama

@ngxson ngxson requested a review from ggerganov April 27, 2025 16:25
Comment thread src/llama-graph.cpp Outdated

ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);

@ngxson ngxson Apr 27, 2025

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ggerganov Because build_inp_attn_scale is currently used exclusively by llama 4, do you think we should get rid of n_pos_per_embd and replace it with a GGML_ASSERT(n_pos_per_embd() == 1) ?

The main motivation is to make this code looks less complicated, as there is ~0% chance Qwen model gonna use this

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, build_inp_attn_scale should work well even in the case of N pos per token.

That's because the scale is applied per embedding, and the number of embedding is independent from N pos per token.

In any cases, I removed the n_pos_per_embd in 9cd16a3 , merging this PR once the CI is green

Comment thread src/llama-graph.cpp Outdated

ggml_tensor * llm_graph_context::build_inp_attn_scale() const {
auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_token(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);
auto inp = std::make_unique<llm_graph_input_attn_temp>(n_pos_per_embd(), hparams.n_attn_temp_floor_scale, hparams.f_attn_temp_scale);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can do that.

@ngxson ngxson merged commit d2b2031 into ggml-org:master Apr 28, 2025
timwu pushed a commit to timwu/llama.cpp that referenced this pull request Dec 20, 2025
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
phibya pushed a commit to ziee-ai/llama.cpp that referenced this pull request May 29, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026
…rg#13138)

* llama : (mrope) use normal position for text token

* rm n_pos_per_embd from llm_graph_input_attn_temp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants