Skip to content

Workaround clang-cl HIP multi-arch /Fo bug on Windows#4920

Open
ivarusic-amd wants to merge 4 commits into
developfrom
fix_hip_multiarch_fo_bug
Open

Workaround clang-cl HIP multi-arch /Fo bug on Windows#4920
ivarusic-amd wants to merge 4 commits into
developfrom
fix_hip_multiarch_fo_bug

Conversation

@ivarusic-amd
Copy link
Copy Markdown
Contributor

Motivation

On Windows, migraphx_device.dll was missing device code for all GPU architectures except the last one specified via --offload-arch, silently breaking execution on all other targets.

Technical Details

The root cause is a clang-cl bug with HIP multi-arch builds: when the MSVC-style /Fo flag is used to specify the output object, clang-cl reuses the same temporary filename for the intermediate GPU code objects across all --offload-arch targets. Each target overwrites the previous one, so only the last arch's code object survives and gets linked into the final DLL.

The fix overrides CMAKE_CXX_COMPILE_OBJECT on WIN32 for the migraphx_device target to use -o instead of /Fo. This lets clang generate unique temporary filenames per offload-arch, preserving all code objects and producing a correctly fat-binary device library.

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

    • Added: New functionality.
    • [] Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

@ivarusic-amd ivarusic-amd requested a review from causten as a code owner May 29, 2026 08:59
@ivarusic-amd ivarusic-amd added bug Something isn't working high priority A PR with high priority for review and merging. Windows Related changes for Windows Environments labels May 29, 2026
@fjankovi
Copy link
Copy Markdown
Collaborator

Is this bug something that we need to report to the compiler team?

@apwojcik
Copy link
Copy Markdown
Collaborator

Is this bug something that we need to report to the compiler team?

Yes, we should report it to the compiler team for HIP SDK 7.2 and open an issue in llvm/llvm-project if the problem exists in the latest clang-cl version. On Windows, HIP SDK uses llvm v21. The latest released llvm is v22.1.6. The latest development is llvm v23. TheRock uses llvm v23.

Comment thread src/targets/gpu/CMakeLists.txt Outdated
# the same temp filename for all GPU target intermediate link outputs, causing
# all code objects to contain only the last --offload-arch target. Using -o
# instead of /Fo avoids this by letting clang generate unique temp filenames.
if(WIN32)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt you check that the compiler is clang-cl and not clang++?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that as well

Comment thread src/targets/gpu/CMakeLists.txt Outdated
# the same temp filename for all GPU target intermediate link outputs, causing
# all code objects to contain only the last --offload-arch target. Using -o
# instead of /Fo avoids this by letting clang generate unique temp filenames.
if(WIN32)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a cmake flag to enable this workaround? Like MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG=On?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ivarusic-amd ivarusic-amd requested a review from pfultz2 May 29, 2026 15:26
Comment on lines +91 to +97
if(NOT MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG)
if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND CMAKE_CXX_COMPILER_FRONTEND_VARIANT STREQUAL "MSVC")
set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG ON CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE)
else()
set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG OFF CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE)
endif()
endif()
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this.

Its already a cached variable, and its really confusing what it is doing. It should be set explicitly by the user when invoking cmake: cmake .. -DMIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG=On.

Suggested change
if(NOT MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG)
if(CMAKE_CXX_COMPILER_ID STREQUAL "Clang" AND CMAKE_CXX_COMPILER_FRONTEND_VARIANT STREQUAL "MSVC")
set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG ON CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE)
else()
set(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG OFF CACHE BOOL "Workaround clang-cl HIP multi-arch /Fo bug on Windows" FORCE)
endif()
endif()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the workaround to be set automatically, not by a user in the command line, whenever the user attempts to compile MIGraphX with clang-cl. It must be automatic. The issue concerns only clang-cl. It does not concern the MIGraphX build with clang++. The problem does not manifest at compile time. The compilation is just fine. You will not even see it when running MIGraphX on the gfx12. The problem starts to manifest when you run MIGraphX on a GPU other than gfx12. It is a very subtle issue that was hard to find, and it took us a few days to find.
We won't ask each customer or user to remember to add that option to the command line if they wish to compile MIGraphX themselves and run on something other than gfx12. That's a horrible user experience. I believe an out-of-the-box experience is much better. I believe you would agree with me here.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it should be fixed in the compiler. Its not a bug in migraphx.

The issue concerns only clang-cl.

Then we should drop support for clang-cl since its broken.

The problem starts to manifest when you run MIGraphX on a GPU other than gfx12.

Is there a way to detect this bug at cmake time by compiling(or running) a test program?

We won't ask each customer or user to remember to add that option to the command line if they wish to compile MIGraphX themselves and run on something other than gfx12. That's a horrible user experience.

Yes, thats why it should be fixed in the compiler. This is not a bug in migraphx.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked claude how we could detect this at compile-time and it wrote this function:

# verify_offload_archs(<result_var> [arch1 arch2 ...])
#   Sets <result_var> to TRUE if clang emitted code objects for every arch,
#   FALSE otherwise. Defaults to gfx906;gfx1201 if no archs are given.
#   Also sets <result_var>_MISSING to the list of archs that were skipped.
function(verify_offload_archs _outvar)
  set(_archs ${ARGN})
  if(NOT _archs)
    set(_archs gfx906 gfx1201)
  endif()
  string(REPLACE ";" "," _arch_csv "${_archs}")

  set(_canary "${CMAKE_BINARY_DIR}/canary.hip")
  file(WRITE "${_canary}" "__attribute__((global)) void k() {}\n")

  execute_process(
    COMMAND ${CMAKE_CXX_COMPILER}
            -x hip --offload-arch=${_arch_csv}
            -nogpuinc -nogpulib
            -c "${_canary}" -o "${CMAKE_BINARY_DIR}/canary.o"
    RESULT_VARIABLE _rc
    OUTPUT_VARIABLE _out ERROR_VARIABLE _out)
  if(NOT _rc EQUAL 0)
    message(STATUS "verify_offload_archs: canary compile failed:\n${_out}")
    set(${_outvar} FALSE PARENT_SCOPE)
    set(${_outvar}_MISSING "${_archs}" PARENT_SCOPE)
    return()
  endif()

  get_filename_component(_bindir "${CMAKE_CXX_COMPILER}" DIRECTORY)
  find_program(LLVM_OBJDUMP NAMES llvm-objdump HINTS "${_bindir}")
  if(NOT LLVM_OBJDUMP)
    message(STATUS "verify_offload_archs: llvm-objdump not found")
    set(${_outvar} FALSE PARENT_SCOPE)
    set(${_outvar}_MISSING "${_archs}" PARENT_SCOPE)
    return()
  endif()

  execute_process(
    COMMAND "${LLVM_OBJDUMP}" --offloading "${CMAKE_BINARY_DIR}/canary.o"
    OUTPUT_VARIABLE _dump RESULT_VARIABLE _drc ERROR_VARIABLE _dump)
  if(NOT _drc EQUAL 0)
    message(STATUS "verify_offload_archs: llvm-objdump failed:\n${_dump}")
    set(${_outvar} FALSE PARENT_SCOPE)
    set(${_outvar}_MISSING "${_archs}" PARENT_SCOPE)
    return()
  endif()

  set(_missing "")
  foreach(_a IN LISTS _archs)
    string(FIND "${_dump}" "${_a}" _pos)
    if(_pos EQUAL -1)
      list(APPEND _missing "${_a}")
    endif()
  endforeach()

  if(_missing)
    message(STATUS "verify_offload_archs: skipped arch(s): ${_missing}")
    set(${_outvar} FALSE PARENT_SCOPE)
    set(${_outvar}_MISSING "${_missing}" PARENT_SCOPE)
  else()
    set(${_outvar} TRUE PARENT_SCOPE)
    set(${_outvar}_MISSING "" PARENT_SCOPE)
  endif()
endfunction()

We can use that to detect the error and then print a message for the user to set MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG. It probably needs some tweaks to work on windows though.

Copy link
Copy Markdown
Collaborator

@pfultz2 pfultz2 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add this as aPOST_BUILD step to migraphx_device to verify it built all the targets:

function(post_verify_targets TARGET)
    add_custom_command(TARGET ${TARGET} POST_BUILD
      COMMAND ${CMAKE_COMMAND}
              -DOBJDUMP=${LLVM_OBJDUMP}
              -DTARGET_FILE=$<TARGET_FILE:${TARGET}>
              -DARCHS=${GPU_TARGETS}
              -P ${CMAKE_CURRENT_SOURCE_DIR}/cmake/verify_offload_archs.cmake
      VERBATIM
      COMMENT "Verifying offload archs in $<TARGET_FILE:${TARGET}>")
endfunction()

With a cmake/verify_offload_archs.cmake cmake script to do the verification:

cmake/verify_offload_archs.cmake
# Invoked as:
#   cmake -DOBJDUMP=<path> -DTARGET_FILE=<lib.so> -DARCHS=gfx906,gfx1201 \
#         -P verify_offload_archs.cmake
# Note: ARCHS is COMMA-separated, not ';'-separated, to survive the
# custom-command argument boundary (CMake would split a ';' list otherwise).

if(NOT OBJDUMP OR NOT TARGET_FILE OR NOT ARCHS)
  message(FATAL_ERROR "verify_offload_archs: OBJDUMP, TARGET_FILE, ARCHS required")
endif()
string(REPLACE "," ";" _archs "${ARCHS}")

execute_process(
  COMMAND "${OBJDUMP}" --offloading "${TARGET_FILE}"
  OUTPUT_VARIABLE _dump RESULT_VARIABLE _rc ERROR_VARIABLE _dump)
if(NOT _rc EQUAL 0)
  message(FATAL_ERROR "llvm-objdump --offloading failed on ${TARGET_FILE}:\n${_dump}")
endif()

set(_missing "")
foreach(_a IN LISTS _archs)
  string(FIND "${_dump}" "${_a}" _pos)
  if(_pos EQUAL -1)
    list(APPEND _missing "${_a}")
  endif()
endforeach()

if(_missing)
  message(FATAL_ERROR
    "Offload arch(s) missing from ${TARGET_FILE}: ${_missing}\n"
    "--offloading output:\n${_dump}")
endif()
message(STATUS "Offload archs verified in ${TARGET_FILE}: ${_archs}")

I think this POST_BUILD check will be quite robust and can even check if the workaround is even working.

endif()
message(STATUS "MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG = ${MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG}")

if(MIGRAPHX_WORKAROUND_HIP_MULTI_ARCH_BUG)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check that the compiler is clang-cl here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working high priority A PR with high priority for review and merging. UAI Windows Related changes for Windows Environments

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants