Skip to content

skip scaled grouped mm test on unsupported arches#5816

Merged
github-actions[bot] merged 3 commits intomainfrom
llu/skip_scaled_grouped_mm
Jan 15, 2026
Merged

skip scaled grouped mm test on unsupported arches#5816
github-actions[bot] merged 3 commits intomainfrom
llu/skip_scaled_grouped_mm

Conversation

@liqiangxl
Copy link
Copy Markdown
Collaborator

Same as #5810
err msg
Exception raised from run_nvfp4_scaled_group_mm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_group_mm.cu:518

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 13, 2026

Review updated until commit 0f6e0be

Auto-merge Status

✅ Internal CI is finished
✅ No failed checks
✅ PR is mergeable
ℹ️ PR mergeable_state: unstable

Description

  • Updated test skip conditions to only run on compute 10.0 devices

  • Removed blackwell-specific skip conditions for nvfp4 scaled grouped mm test

  • Simplified architecture checking by replacing multiple conditions with single microarchitecture check

  • Fixed test compatibility by targeting specific compute capability version

Changes walkthrough

Relevant files
Bug fix
test_with_id_model_indexer.py
Update test skip conditions for nvfp4 scaled grouped mm   

tests/python/direct/test_with_id_model_indexer.py

  • Updated import statements to use microarchitecture_is instead of
    is_pre_blackwell and microarchitecture_is_pre
  • Removed two separate skip conditions for blackwell devices and compute
    12.0
  • Added single skip condition that only runs test on compute 10.0
    devices
  • Updated skip reason to reflect compute 10.0 requirement
  • +2/-6     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 PR contains tests
    ⚡ Recommended focus areas for review
    Architecture Check Logic

    The PR changes the architecture checking from two separate conditions (is_pre_blackwell and microarchitecture_is_pre(12)) to a single condition (microarchitecture_is(10, 0)). This significantly narrows the supported architectures. Verify that this change aligns with the actual hardware requirements for the nvfp4_scaled_group_mm functionality and that compute capability 10.0 is the correct minimum requirement.

    @pytest.mark.skipif(
        not microarchitecture_is(10, 0), reason="Only supported on compute 10.0"
    )
    Import Compatibility

    The PR removes imports for 'is_pre_blackwell' and 'microarchitecture_is_pre' while adding 'microarchitecture_is'. Ensure that 'microarchitecture_is' is available in the python.direct_utils module and that removing the other functions won't break other parts of the codebase that might still depend on them.

    from nvfuser_direct import (
        FusionDefinition,
        DataType,
    )
    from python.utils import set_env
    from python.direct_utils import (
        FLOAT4_E2M1_MAX,
        FLOAT8_E4M3_EPS,
        FLOAT8_E4M3_MAX,
        pytorch_nvfp4_quantize,
        microarchitecture_is,
        linear_to_swizzled_128_4,
        round_up,
        activation_scale_to_nvfp4,

    Test failures

    • (Medium, 1) Thunder nvFuser NanoGPT autograd mismatch (CUDA A100)

      Test Name A100 Source
      thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32

    @greptile-apps
    Copy link
    Copy Markdown
    Contributor

    greptile-apps Bot commented Jan 13, 2026

    Greptile Summary

    Simplified architecture check for the scaled grouped MM test to only run on compute capability 10.0 (Blackwell). This aligns with the underlying CUTLASS kernel implementation which is guarded by CUTLASS_ARCH_MMA_SM100_SUPPORTED and prevents runtime exceptions on unsupported architectures.

    Confidence Score: 5/5

    Important Files Changed

    Filename Overview
    tests/python/direct/test_with_id_model_indexer.py Simplified architecture check to only run on compute 10.0 (Blackwell), matching the underlying CUTLASS kernel support

    Sequence Diagram

    sequenceDiagram
        participant pytest as Pytest Runner
        participant test as test_layout_op_and_cutlass_nvfp4_grouped_mm
        participant utils as microarchitecture_is
        participant cuda as CUDA Device
        participant cutlass as CUTLASS Kernel (SM100)
        
        pytest->>utils: Check microarchitecture_is(10, 0)
        utils->>cuda: get_device_properties()
        cuda-->>utils: return device properties
        utils-->>pytest: return (major == 10 and minor == 0)
        
        alt compute capability != 10.0
            pytest->>pytest: Skip test with reason
            Note over pytest: Test skipped on unsupported arch
        else compute capability == 10.0
            pytest->>test: Execute test
            test->>cutlass: Call cutlass_nvfp4_grouped_mm
            Note over cutlass: CUTLASS_ARCH_MMA_SM100_SUPPORTED enabled
            cutlass-->>test: Return result
            test-->>pytest: Test passes
        end
    
    Loading

    @liqiangxl liqiangxl added the enable-auto-merge Auto-merge a PR when: 1) PR mergeable 2) Internal CI complete 3) No failures label Jan 14, 2026
    @liqiangxl
    Copy link
    Copy Markdown
    Collaborator Author

    !test

    @liqiangxl liqiangxl requested a review from jacobhinkle January 14, 2026 15:48
    Copy link
    Copy Markdown
    Collaborator

    @jacobhinkle jacobhinkle left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM

    @github-actions github-actions Bot merged commit dea65c6 into main Jan 15, 2026
    63 checks passed
    @github-actions github-actions Bot removed the enable-auto-merge Auto-merge a PR when: 1) PR mergeable 2) Internal CI complete 3) No failures label Jan 15, 2026
    @github-actions github-actions Bot deleted the llu/skip_scaled_grouped_mm branch January 15, 2026 14:43
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants