You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The PR changes the architecture checking from two separate conditions (is_pre_blackwell and microarchitecture_is_pre(12)) to a single condition (microarchitecture_is(10, 0)). This significantly narrows the supported architectures. Verify that this change aligns with the actual hardware requirements for the nvfp4_scaled_group_mm functionality and that compute capability 10.0 is the correct minimum requirement.
@pytest.mark.skipif(notmicroarchitecture_is(10, 0), reason="Only supported on compute 10.0")
The PR removes imports for 'is_pre_blackwell' and 'microarchitecture_is_pre' while adding 'microarchitecture_is'. Ensure that 'microarchitecture_is' is available in the python.direct_utils module and that removing the other functions won't break other parts of the codebase that might still depend on them.
Simplified architecture check for the scaled grouped MM test to only run on compute capability 10.0 (Blackwell). This aligns with the underlying CUTLASS kernel implementation which is guarded by CUTLASS_ARCH_MMA_SM100_SUPPORTED and prevents runtime exceptions on unsupported architectures.
Replaced two skip conditions (is_pre_blackwell() and not microarchitecture_is_pre(12)) with a single precise check (microarchitecture_is(10, 0))
Removed unused imports is_pre_blackwell and microarchitecture_is_pre
Simplified architecture check to only run on compute 10.0 (Blackwell), matching the underlying CUTLASS kernel support
Sequence Diagram
sequenceDiagram
participant pytest as Pytest Runner
participant test as test_layout_op_and_cutlass_nvfp4_grouped_mm
participant utils as microarchitecture_is
participant cuda as CUDA Device
participant cutlass as CUTLASS Kernel (SM100)
pytest->>utils: Check microarchitecture_is(10, 0)
utils->>cuda: get_device_properties()
cuda-->>utils: return device properties
utils-->>pytest: return (major == 10 and minor == 0)
alt compute capability != 10.0
pytest->>pytest: Skip test with reason
Note over pytest: Test skipped on unsupported arch
else compute capability == 10.0
pytest->>test: Execute test
test->>cutlass: Call cutlass_nvfp4_grouped_mm
Note over cutlass: CUTLASS_ARCH_MMA_SM100_SUPPORTED enabled
cutlass-->>test: Return result
test-->>pytest: Test passes
end
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Same as #5810
err msg
Exception raised from run_nvfp4_scaled_group_mm at /opt/pytorch/nvfuser/cutlass/nvfp4_scaled_group_mm.cu:518