Skip to content

[Comgr][hotswap] Add code-object metadata parsing#2437

Open
tgymnich wants to merge 2 commits into
amd-stagingfrom
users/tgymnich/hotswap-pr-02-code-object-metadata
Open

[Comgr][hotswap] Add code-object metadata parsing#2437
tgymnich wants to merge 2 commits into
amd-stagingfrom
users/tgymnich/hotswap-pr-02-code-object-metadata

Conversation

@tgymnich
Copy link
Copy Markdown

@tgymnich tgymnich commented May 7, 2026

Adds the ELF/MsgPack metadata parser that the rest of the raiser pipeline
depends on. Public surface is extractTextSection, listKernelNames,
extractKernelMeta, findKernelSymbolOffset, and detectIsaFromElf.

No MC stack, no disassembler, no opcode canonicalisation yet — those land
in subsequent commits. The kernel-descriptor fields on KernelMeta (kernel
code properties, kernarg preload, compute_pgm_rsrc1/2) are populated here
so later layers (UserSgprLayout, kernarg layout) have the metadata they
expect; the raiser itself does not consume them yet.

Includes a small focused gtest covering the empty-input and malformed-ELF
guards on the public API.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

@tgymnich tgymnich requested review from chinmaydd and lamb-j as code owners May 7, 2026 16:12
@tgymnich tgymnich changed the title [Comgr][hotswap] Add bare-bones hotswap transpiler scaffolding [Comgr][hotswap] Add code-object metadata parsing May 7, 2026
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from 81e6053 to d1a05c9 Compare May 7, 2026 16:23
@z1-cciauto
Copy link
Copy Markdown
Collaborator

@tgymnich tgymnich changed the base branch from amd-staging to users/tgymnich/hotswap-pr-01-scaffolding May 7, 2026 16:29
@chinmaydd chinmaydd added comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature labels May 7, 2026
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-01-scaffolding branch from 2675e92 to 3831ee0 Compare May 8, 2026 10:17
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from d1a05c9 to 8be9478 Compare May 8, 2026 10:17
@lamb-j
Copy link
Copy Markdown
Collaborator

lamb-j commented May 8, 2026

Thanks for putting this up. Two main pieces of feedback, both around the AGENT_CONVENTIONS "Comgr first, LLVM second, custom never" rule. The below is obv Claude's thoughts. But in general, we can beef up the existing Comgr metadata API implementations if we need to support Hotswap's needs. I think that could benefit both "sub-projects".


Reuse existing Comgr metadata APIs

amd/comgr/src/comgr-metadata.{h,cpp} already does most of what this PR adds, and reusing it would also fix a few correctness issues:

This PR Existing Comgr API
extractKernelMeta / listKernelNames (manual note walk + MsgPack parse) getMetadataRootgetElfMetadataRoot<ELFT> (comgr-metadata.cpp:190, 254)
detectIsaFromElf (e_machine → "gfx1250") getElfIsaNamegetElfIsaNameFromElfHeader<ELFT> (comgr-metadata.cpp:350, 410)
findKernelSymbolOffset symbol walk amd_comgr_iterate_symbols + amd_comgr_symbol_get_info, or the internal helpers in comgr-symbol.cpp

Switching to getElfMetadataRoot also fixes three latent bugs in the hand-rolled walker:

  1. Misses program-header notes. getElfMetadataRoot walks PT_NOTE program headers first, then SHT_NOTE sections. This PR only walks sections — fully-linked AMDGPU code objects often carry notes via program headers only, so they'd come back with zero kernels.
  2. Misses name=="AMD" / NT_AMD_HSA_METADATA (older YAML) and PAL metadata. The hand-rolled check is hardcoded to name=="AMDGPU" && type==32. processNote + mergeNoteRecords handle all three vendor-name/type combinations.
  3. Bails on anything that isn't ELF64LE. getElfObjectFileBase dispatches across ELF32LE / ELF64LE / ELF32BE / ELF64BE.

detectIsaFromElf similarly returns a bare "gfx1250" and drops OS ABI, ABI version, and the sramecc/xnack feature bits. getElfIsaNameFromElfHeader already produces the canonical amdgcn-amd-amdhsa--gfx1250:sramecc+:xnack- triple — depending on what the raiser does downstream with the ISA string, dropping the feature bits is likely a problem.

The Comgr internal helpers take DataObject* / DataMeta*, which a standalone transpiler library doesn't have. The minor update I'd propose is lifting getElfMetadataRoot and getElfIsaNameFromElfHeader to a MemoryBufferRef/ArrayRef<uint8_t> overload (the templates already take ELFObjectFile<ELFT>*, so the wrapping is thin), with the existing DataObject* entry points becoming one-line forwarders. Happy to do that update separately if it helps unblock this.

The one piece that's genuinely new is readKernelDescriptorBytes — the 64-byte KD blob isn't surfaced via MsgPack and the raiser legitimately needs it. Even there, the symbol-lookup half should go through amd_comgr_iterate_symbols / a comgr-symbol.cpp helper; only the "reinterpret the bytes as kernel_descriptor_t" part is net-new.

Reuse existing LLVM facilities

A few hand-rolled bits that already exist in llvm/:

  • readFile()llvm::MemoryBuffer::getFile (handles error, empty file, mmap)
  • readU32 / readU16 (host-endian memcpy) → llvm::support::endian::read32le / read16le (llvm/Support/Endian.h)
  • Manual ELF note iteration → llvm::object::ELFFile::notes() (Elf_Note_Iterator) — this also removes the ~50-line duplication between listKernelNames and extractKernelMeta
  • if (shdr.sh_type != 7) // SHT_NOTEllvm::ELF::SHT_NOTE
  • if (type == 32 ...)llvm::ELF::NT_AMDGPU_METADATA
  • 64 (KD size) → sizeof(llvm::amdhsa::kernel_descriptor_t), and ideally read fields via reinterpret_cast<kernel_descriptor_t*> rather than readU32(... + COMPUTE_PGM_RSRC1_OFFSET)

Most of these collapse into the Comgr reuse above (the note iteration disappears entirely if you go through getElfMetadataRoot).

@chinmaydd
Copy link
Copy Markdown

We really really really need to pull out common functionality that is shared between the transpiler and the translator into a singular module. We will have to suffer from a lot of code duplication later otherwise.

@tgymnich tgymnich marked this pull request as draft May 8, 2026 21:26
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-01-scaffolding branch from 3831ee0 to 9f6c2ed Compare May 11, 2026 11:40
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from 8be9478 to 6ab4011 Compare May 11, 2026 11:41
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-01-scaffolding branch from 9f6c2ed to ec69509 Compare May 11, 2026 12:32
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from 6ab4011 to dd6bb68 Compare May 11, 2026 12:32
@tgymnich tgymnich marked this pull request as ready for review May 11, 2026 12:40
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-01-scaffolding branch from ec69509 to 3b59a99 Compare May 11, 2026 12:42
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch 2 times, most recently from 94e24d0 to d6f97c5 Compare May 11, 2026 15:21
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch 2 times, most recently from 7fde156 to 2c6f5a2 Compare May 11, 2026 22:34
@tgymnich tgymnich marked this pull request as draft May 12, 2026 15:03
Base automatically changed from users/tgymnich/hotswap-pr-01-scaffolding to amd-staging May 12, 2026 17:03
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from 2c6f5a2 to 73d12ca Compare May 13, 2026 13:31
@tgymnich tgymnich marked this pull request as ready for review May 13, 2026 14:18
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch 2 times, most recently from cb7ba8d to ccdfadc Compare May 15, 2026 11:16
@tgymnich tgymnich requested a review from ftynse May 15, 2026 11:44
@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from ccdfadc to 9bc71b6 Compare May 15, 2026 13:06
Comment thread amd/comgr/src/hotswap/CMakeLists.txt Outdated
Comment thread amd/comgr/src/hotswap/CMakeLists.txt Outdated
@@ -0,0 +1,344 @@
//===- code-object-utils.cpp - Hotswap transpiler -------------------------===//
//
// Part of Comgr, under the Apache License v2.0 with LLVM Exceptions. See
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely needs to be redirected beyond the PR author, but why this is not just "part of LLVM project". If there is intention to contribute this to LLVM, are we we expecting to have to update all files?

Comment thread amd/comgr/src/hotswap/code-object-utils.cpp Outdated
Comment thread amd/comgr/src/hotswap/code-object-utils.cpp Outdated
Comment thread amd/comgr/test-unit/MCContextInlineSrcmgrTest.cpp Outdated
Comment thread amd/comgr/test-unit/MCContextInlineSrcmgrTest.cpp Outdated
Comment thread amd/comgr/test-unit/MCContextInlineSrcmgrTest.cpp Outdated
Comment thread amd/comgr/test-unit/MCContextInlineSrcmgrTest.cpp Outdated
Comment thread amd/comgr/test-unit/MCContextInlineSrcmgrTest.cpp Outdated
Copy link
Copy Markdown

@ftynse ftynse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall concern is the error flow. This code mostly just dumps stuff to errs without telling errors and warnings apart, sometimes suppressing things later. There is also no consistent error propagation mechanism, sometimes it's random booleans (LLVM had a huge discussion on whether true because error or success), sometimes it's empty containers (which could be valid!), sometimes it's some intermediate field.

@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from 9bc71b6 to 9880187 Compare May 15, 2026 15:28
@ftynse
Copy link
Copy Markdown

ftynse commented May 15, 2026

Added error handling here 960a69e @tgymnich feel free to pull in

@ftynse
Copy link
Copy Markdown

ftynse commented May 15, 2026

more in f68dd49, but irrelevant for the current PR

@tgymnich tgymnich force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch 5 times, most recently from e20dd25 to 877df15 Compare May 18, 2026 14:38
martin-luecke and others added 2 commits May 21, 2026 11:31
… in raiser

Adds the AMDGPU code-object metadata extraction surface that the rest of
the raiser pipeline depends on:

* `extractTextSection`, `listKernelNames`, `extractKernelMeta`,
  `findKernelSymbolOffset`, and `detectIsaFromElf` in
  `hotswap/code_object_utils.{h,cpp}`.
* `KernelMeta` populated with both MsgPack-derived fields (kernarg/group/
  private segment sizes, args) and the raw kernel-descriptor register
  bytes (`compute_pgm_rsrc{1,2}`, `kernel_code_properties`,
  `kernarg_preload`) read from `<name>.kd` in `.rodata`.

Reuses comgr's existing parsing infrastructure rather than duplicating
it: each helper wraps the input ELF bytes in an
`AMD_COMGR_DATA_KIND_EXECUTABLE` data object via a small
`ScopedDataObject` RAII helper and routes through
`metadata::getMetadataRoot` / `metadata::getElfIsaName` for the metadata
walk and ISA-name extraction.

Also adopts llvm::Error / llvm::Expected for the raiser scaffolding
entry point (mirrors upstream rocm commit f68dd49): raiseToIR
returns Expected<RaiseResult>, validation failures surface as
HotswapError values via the existing hotswap-error.h, and the
handrolled raise-failure.h is removed. An empty-input scaffolding
mode bypass is added so test scaffolding can stub the raiser without
standing up a real ISA + metadata pair. The KD-bytes lookup comments
reflect partial-success (MsgPack-derived fields are returned even if
.rodata KD parse fails).

No MC stack, no disassembler, no opcode canonicalisation yet -- those
land in subsequent commits.

Co-Authored-By: Tim Gymnich <tim@gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Alex Zinenko <git@ozinenko.com>
@ftynse ftynse force-pushed the users/tgymnich/hotswap-pr-02-code-object-metadata branch from 877df15 to 9428a8b Compare May 21, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comgr Related to Code Object Manager hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants