Workaround clang-cl HIP multi-arch /Fo bug on Windows#4920
Workaround clang-cl HIP multi-arch /Fo bug on Windows#4920ivarusic-amd wants to merge 4 commits into
Conversation
|
Is this bug something that we need to report to the compiler team? |
Yes, we should report it to the compiler team for HIP SDK 7.2 and open an issue in llvm/llvm-project if the problem exists in the latest clang-cl version. On Windows, HIP SDK uses llvm v21. The latest released llvm is v22.1.6. The latest development is llvm v23. TheRock uses llvm v23. |
| # the same temp filename for all GPU target intermediate link outputs, causing | ||
| # all code objects to contain only the last --offload-arch target. Using -o | ||
| # instead of /Fo avoids this by letting clang generate unique temp filenames. | ||
| if(WIN32) |
There was a problem hiding this comment.
Shouldnt you check that the compiler is clang-cl and not clang++?
There was a problem hiding this comment.
Did that as well
| # the same temp filename for all GPU target intermediate link outputs, causing | ||
| # all code objects to contain only the last --offload-arch target. Using -o | ||
| # instead of /Fo avoids this by letting clang generate unique temp filenames. | ||
| if(WIN32) |
There was a problem hiding this comment.
Can we add a cmake flag to enable this workaround? Like MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG=On?
| if(NOT MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG) | ||
| if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND CMAKE_CXX_COMPILER_FRONTEND_VARIANT STREQUAL "MSVC") | ||
| set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG ON CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE) | ||
| else() | ||
| set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG OFF CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE) | ||
| endif() | ||
| endif() |
There was a problem hiding this comment.
Remove this.
Its already a cached variable, and its really confusing what it is doing. It should be set explicitly by the user when invoking cmake: cmake .. -DMIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG=On.
| if(NOT MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG) | |
| if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND CMAKE_CXX_COMPILER_FRONTEND_VARIANT STREQUAL "MSVC") | |
| set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG ON CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE) | |
| else() | |
| set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG OFF CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE) | |
| endif() | |
| endif() |
There was a problem hiding this comment.
We need the workaround to be set automatically, not by a user in the command line, whenever the user attempts to compile MIGraphX with clang-cl. It must be automatic. The issue concerns only clang-cl. It does not concern the MIGraphX build with clang++. The problem does not manifest at compile time. The compilation is just fine. You will not even see it when running MIGraphX on the gfx12. The problem starts to manifest when you run MIGraphX on a GPU other than gfx12. It is a very subtle issue that was hard to find, and it took us a few days to find.
We won't ask each customer or user to remember to add that option to the command line if they wish to compile MIGraphX themselves and run on something other than gfx12. That's a horrible user experience. I believe an out-of-the-box experience is much better. I believe you would agree with me here.
There was a problem hiding this comment.
But it should be fixed in the compiler. Its not a bug in migraphx.
The issue concerns only clang-cl.
Then we should drop support for clang-cl since its broken.
The problem starts to manifest when you run MIGraphX on a GPU other than gfx12.
Is there a way to detect this bug at cmake time by compiling(or running) a test program?
We won't ask each customer or user to remember to add that option to the command line if they wish to compile MIGraphX themselves and run on something other than gfx12. That's a horrible user experience.
Yes, thats why it should be fixed in the compiler. This is not a bug in migraphx.
There was a problem hiding this comment.
I asked claude how we could detect this at compile-time and it wrote this function:
# verify_offload_archs(<result_var> [arch1 arch2 ...])
# Sets <result_var> to TRUE if clang emitted code objects for every arch,
# FALSE otherwise. Defaults to gfx906;gfx1201 if no archs are given.
# Also sets <result_var>_MISSING to the list of archs that were skipped.
function(verify_offload_archs _outvar)
set(_archs ${ARGN})
if(NOT _archs)
set(_archs gfx906 gfx1201)
endif()
string(REPLACE ";" "," _arch_csv "${_archs}")
set(_canary "${CMAKE_BINARY_DIR}/canary.hip")
file(WRITE "${_canary}" "__attribute__((global)) void k() {}\n")
execute_process(
COMMAND ${CMAKE_CXX_COMPILER}
-x hip --offload-arch=${_arch_csv}
-nogpuinc -nogpulib
-c "${_canary}" -o "${CMAKE_BINARY_DIR}/canary.o"
RESULT_VARIABLE _rc
OUTPUT_VARIABLE _out ERROR_VARIABLE _out)
if(NOT _rc EQUAL 0)
message(STATUS "verify_offload_archs: canary compile failed:\n${_out}")
set(${_outvar} FALSE PARENT_SCOPE)
set(${_outvar}_MISSING "${_archs}" PARENT_SCOPE)
return()
endif()
get_filename_component(_bindir "${CMAKE_CXX_COMPILER}" DIRECTORY)
find_program(LLVM_OBJDUMP NAMES llvm-objdump HINTS "${_bindir}")
if(NOT LLVM_OBJDUMP)
message(STATUS "verify_offload_archs: llvm-objdump not found")
set(${_outvar} FALSE PARENT_SCOPE)
set(${_outvar}_MISSING "${_archs}" PARENT_SCOPE)
return()
endif()
execute_process(
COMMAND "${LLVM_OBJDUMP}" --offloading "${CMAKE_BINARY_DIR}/canary.o"
OUTPUT_VARIABLE _dump RESULT_VARIABLE _drc ERROR_VARIABLE _dump)
if(NOT _drc EQUAL 0)
message(STATUS "verify_offload_archs: llvm-objdump failed:\n${_dump}")
set(${_outvar} FALSE PARENT_SCOPE)
set(${_outvar}_MISSING "${_archs}" PARENT_SCOPE)
return()
endif()
set(_missing "")
foreach(_a IN LISTS _archs)
string(FIND "${_dump}" "${_a}" _pos)
if(_pos EQUAL -1)
list(APPEND _missing "${_a}")
endif()
endforeach()
if(_missing)
message(STATUS "verify_offload_archs: skipped arch(s): ${_missing}")
set(${_outvar} FALSE PARENT_SCOPE)
set(${_outvar}_MISSING "${_missing}" PARENT_SCOPE)
else()
set(${_outvar} TRUE PARENT_SCOPE)
set(${_outvar}_MISSING "" PARENT_SCOPE)
endif()
endfunction()We can use that to detect the error and then print a message for the user to set MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG. It probably needs some tweaks to work on windows though.
There was a problem hiding this comment.
We can also add this as aPOST_BUILD step to migraphx_device to verify it built all the targets:
function(post_verify_targets TARGET)
add_custom_command(TARGET ${TARGET} POST_BUILD
COMMAND ${CMAKE_COMMAND}
-DOBJDUMP=${LLVM_OBJDUMP}
-DTARGET_FILE=$<TARGET_FILE:${TARGET}>
-DARCHS=${GPU_TARGETS}
-P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/verify_offload_archs.cmake
VERBATIM
COMMENT "Verifying offload archs in $<TARGET_FILE:${TARGET}>")
endfunction()With a cmake/verify_offload_archs.cmake cmake script to do the verification:
cmake/verify_offload_archs.cmake
# Invoked as:
# cmake -DOBJDUMP=<path> -DTARGET_FILE=<lib.so> -DARCHS=gfx906,gfx1201 \
# -P verify_offload_archs.cmake
# Note: ARCHS is COMMA-separated, not ';'-separated, to survive the
# custom-command argument boundary (CMake would split a ';' list otherwise).
if(NOT OBJDUMP OR NOT TARGET_FILE OR NOT ARCHS)
message(FATAL_ERROR "verify_offload_archs: OBJDUMP, TARGET_FILE, ARCHS required")
endif()
string(REPLACE "," ";" _archs "${ARCHS}")
execute_process(
COMMAND "${OBJDUMP}" --offloading "${TARGET_FILE}"
OUTPUT_VARIABLE _dump RESULT_VARIABLE _rc ERROR_VARIABLE _dump)
if(NOT _rc EQUAL 0)
message(FATAL_ERROR "llvm-objdump --offloading failed on ${TARGET_FILE}:\n${_dump}")
endif()
set(_missing "")
foreach(_a IN LISTS _archs)
string(FIND "${_dump}" "${_a}" _pos)
if(_pos EQUAL -1)
list(APPEND _missing "${_a}")
endif()
endforeach()
if(_missing)
message(FATAL_ERROR
"Offload arch(s) missing from ${TARGET_FILE}: ${_missing}\n"
"--offloading output:\n${_dump}")
endif()
message(STATUS "Offload archs verified in ${TARGET_FILE}: ${_archs}")I think this POST_BUILD check will be quite robust and can even check if the workaround is even working.
| endif() | ||
| message(STATUS "MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG = ${MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG}") | ||
|
|
||
| if(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG) |
There was a problem hiding this comment.
You should check that the compiler is clang-cl here as well.
Motivation
On Windows, migraphx_device.dll was missing device code for all GPU architectures except the last one specified via --offload-arch, silently breaking execution on all other targets.
Technical Details
The root cause is a clang-cl bug with HIP multi-arch builds: when the MSVC-style /Fo flag is used to specify the output object, clang-cl reuses the same temporary filename for the intermediate GPU code objects across all --offload-arch targets. Each target overwrites the previous one, so only the last arch's code object survives and gets linked into the final DLL.
The fix overrides CMAKE_CXX_COMPILE_OBJECT on WIN32 for the migraphx_device target to use -o instead of /Fo. This lets clang generate unique temporary filenames per offload-arch, preserving all code objects and producing a correctly fat-binary device library.
Changelog Category
Add a
CHANGELOG.mdentry for any option other thanNot Applicable