Bugfix: support qk_head_dim != v_head_dim in FMHA #40

huanghua1994 · 2025-12-23T21:10:44Z

The original implementation does not support qk_head_dim != v_head_dim, which is needed in Multi-head Latent Attention. Also fix some test code logic.

Description

The original implementation does not support qk_head_dim != v_head_dim, which is needed in Multi-head Latent Attention. Problem sizes in samples/AttentionFMHA.py are updated s.t. qk_head_dim != v_head_dim and q_num_head != kv_num_head to test a generic GQA case. Parameters and the way calling PyTorch scale_dot_product_attention are also updated to avoid being unable to find a working backend.

All tests have passed locally on a B200.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

The original implementation does not support qk_head_dim != v_head_dim, which is needed in Multi-head Latent Attention. Also fix some test code logic. Signed-off-by: Hua Huang <[email protected]>

Bugfix: support qk_head_dim != v_head_dim in FMHA

75d0dcb

The original implementation does not support qk_head_dim != v_head_dim, which is needed in Multi-head Latent Attention. Also fix some test code logic. Signed-off-by: Hua Huang <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugfix: support qk_head_dim != v_head_dim in FMHA #40

Bugfix: support qk_head_dim != v_head_dim in FMHA #40

huanghua1994 commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bugfix: support qk_head_dim != v_head_dim in FMHA #40

Are you sure you want to change the base?

Bugfix: support qk_head_dim != v_head_dim in FMHA #40

Conversation

huanghua1994 commented Dec 23, 2025

Description

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant