skip scaled grouped mm test on unsupported arches by liqiangxl · Pull Request #5816 · NVIDIA/Fuser

liqiangxl · 2026-01-13T21:59:00Z

Same as #5810
err msg
Exception raised from run_nvfp4_scaled_group_mm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_group_mm.cu:518

github-actions · 2026-01-13T21:59:47Z

Review updated until commit 0f6e0be

Auto-merge Status

✅ Internal CI is finished
✅ No failed checks
✅ PR is mergeable
ℹ️ PR mergeable_state: unstable

Description

Updated test skip conditions to only run on compute 10.0 devices
Removed blackwell-specific skip conditions for nvfp4 scaled grouped mm test
Simplified architecture checking by replacing multiple conditions with single microarchitecture check
Fixed test compatibility by targeting specific compute capability version

Changes walkthrough

Relevant files

Bug fix

test_with_id_model_indexer.py `Update test skip conditions for nvfp4 scaled grouped mm` tests/python/direct/test_with_id_model_indexer.py Updated import statements to use microarchitecture_is instead of is_pre_blackwell and microarchitecture_is_pre Removed two separate skip conditions for blackwell devices and compute 12.0 Added single skip condition that only runs test on compute 10.0 devices Updated skip reason to reflect compute 10.0 requirement	+2/-6

PR Reviewer Guide

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review
Architecture Check Logic The PR changes the architecture checking from two separate conditions (is_pre_blackwell and microarchitecture_is_pre(12)) to a single condition (microarchitecture_is(10, 0)). This significantly narrows the supported architectures. Verify that this change aligns with the actual hardware requirements for the nvfp4_scaled_group_mm functionality and that compute capability 10.0 is the correct minimum requirement. @pytest.mark.skipif( not microarchitecture_is(10, 0), reason="Only supported on compute 10.0" ) Import Compatibility The PR removes imports for 'is_pre_blackwell' and 'microarchitecture_is_pre' while adding 'microarchitecture_is'. Ensure that 'microarchitecture_is' is available in the python.direct_utils module and that removing the other functions won't break other parts of the codebase that might still depend on them. from nvfuser_direct import ( FusionDefinition, DataType, ) from python.utils import set_env from python.direct_utils import ( FLOAT4_E2M1_MAX, FLOAT8_E4M3_EPS, FLOAT8_E4M3_MAX, pytorch_nvfp4_quantize, microarchitecture_is, linear_to_swizzled_128_4, round_up, activation_scale_to_nvfp4,

Test failures

(Medium, 1) Thunder nvFuser NanoGPT autograd mismatch (CUDA A100)

Test Name A100 Source

thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32 ❌

greptile-apps · 2026-01-13T22:01:38Z

Greptile Summary

Simplified architecture check for the scaled grouped MM test to only run on compute capability 10.0 (Blackwell). This aligns with the underlying CUTLASS kernel implementation which is guarded by CUTLASS_ARCH_MMA_SM100_SUPPORTED and prevents runtime exceptions on unsupported architectures.

Replaced two skip conditions (is_pre_blackwell() and not microarchitecture_is_pre(12)) with a single precise check (microarchitecture_is(10, 0))
Removed unused imports is_pre_blackwell and microarchitecture_is_pre
Consistent with similar fixes in PR skip test cutlass mxfp8_gemm on unsupported arches #5810 and skip test nvfp4_gemm on unsupported arches #5815 for other CUTLASS kernels

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The change correctly restricts the test to compute capability 10.0, matching the CUTLASS kernel's SM100 requirement. It follows the exact pattern established in PRs skip test cutlass mxfp8_gemm on unsupported arches #5810 and skip test nvfp4_gemm on unsupported arches #5815 for similar issues, removes unused imports, and prevents runtime exceptions on unsupported hardware
No files require special attention

Important Files Changed

Filename	Overview
tests/python/direct/test_with_id_model_indexer.py	Simplified architecture check to only run on compute 10.0 (Blackwell), matching the underlying CUTLASS kernel support

Sequence Diagram

sequenceDiagram
    participant pytest as Pytest Runner
    participant test as test_layout_op_and_cutlass_nvfp4_grouped_mm
    participant utils as microarchitecture_is
    participant cuda as CUDA Device
    participant cutlass as CUTLASS Kernel (SM100)
    
    pytest->>utils: Check microarchitecture_is(10, 0)
    utils->>cuda: get_device_properties()
    cuda-->>utils: return device properties
    utils-->>pytest: return (major == 10 and minor == 0)
    
    alt compute capability != 10.0
        pytest->>pytest: Skip test with reason
        Note over pytest: Test skipped on unsupported arch
    else compute capability == 10.0
        pytest->>test: Execute test
        test->>cutlass: Call cutlass_nvfp4_grouped_mm
        Note over cutlass: CUTLASS_ARCH_MMA_SM100_SUPPORTED enabled
        cutlass-->>test: Return result
        test-->>pytest: Test passes
    end

liqiangxl · 2026-01-14T01:47:43Z

!test

jacobhinkle

LGTM

liqiangxl added 2 commits January 13, 2026 13:44

only test nvfp4_gemm on 10.0 device

73c7293

scaled_groupled_mm

51fc5a5

Merge branch 'main' into llu/skip_scaled_grouped_mm

0f6e0be

liqiangxl added the enable-auto-merge Auto-merge a PR when: 1) PR mergeable 2) Internal CI complete 3) No failures label Jan 14, 2026

liqiangxl requested a review from jacobhinkle January 14, 2026 15:48

jacobhinkle approved these changes Jan 15, 2026

View reviewed changes

github-actions Bot merged commit dea65c6 into main Jan 15, 2026
63 checks passed

github-actions Bot removed the enable-auto-merge Auto-merge a PR when: 1) PR mergeable 2) Internal CI complete 3) No failures label Jan 15, 2026

github-actions Bot deleted the llu/skip_scaled_grouped_mm branch January 15, 2026 14:43

greptile-apps Bot mentioned this pull request Jan 20, 2026

skip scaled/grouped mm related tests on unsupported gpus #5847

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skip scaled grouped mm test on unsupported arches#5816

skip scaled grouped mm test on unsupported arches#5816
github-actions[bot] merged 3 commits intomainfrom
llu/skip_scaled_grouped_mm

liqiangxl commented Jan 13, 2026

Uh oh!

github-actions Bot commented Jan 13, 2026 •

edited

Loading

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

greptile-apps Bot commented Jan 13, 2026 •

edited

Loading

Uh oh!

liqiangxl commented Jan 14, 2026

Uh oh!

jacobhinkle left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liqiangxl commented Jan 13, 2026

Uh oh!

github-actions Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Auto-merge Status

Description

Changes walkthrough

PR Reviewer Guide

Test failures

Uh oh!

greptile-apps Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

liqiangxl commented Jan 14, 2026

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jan 13, 2026 •

edited

Loading

greptile-apps Bot commented Jan 13, 2026 •

edited

Loading