[Comgr][hotswap] Add AMDGPU MC state setup#2439
Conversation
d1a05c9 to
8be9478
Compare
e8a1fa9 to
9183737
Compare
8be9478 to
6ab4011
Compare
9183737 to
6122d14
Compare
6ab4011 to
dd6bb68
Compare
6122d14 to
e1822d4
Compare
dd6bb68 to
94e24d0
Compare
e1822d4 to
b9c85a5
Compare
2b8bfde to
70e5099
Compare
b9c85a5 to
dade881
Compare
70e5099 to
73d63ce
Compare
dade881 to
c1170b2
Compare
73d63ce to
4314144
Compare
73d12ca to
cb7ba8d
Compare
963fd34 to
5f3d5aa
Compare
cb7ba8d to
ccdfadc
Compare
5f3d5aa to
5a5c860
Compare
ccdfadc to
9bc71b6
Compare
5a5c860 to
a098906
Compare
9bc71b6 to
9880187
Compare
a098906 to
4eda47a
Compare
9880187 to
afbdc65
Compare
4eda47a to
4b9f8b4
Compare
afbdc65 to
9978b9f
Compare
4b9f8b4 to
97a7c25
Compare
9978b9f to
a1bb2b1
Compare
97a7c25 to
4dbeb97
Compare
a1bb2b1 to
e20dd25
Compare
4dbeb97 to
8cbf5bd
Compare
ftynse
left a comment
There was a problem hiding this comment.
Two overall design points:
- We should seriously consider having the opcode->canonical-op table be produced from tablegen inputs defining the instruction sets. This will ensure a single source of truth with the lowering path and provide compile-time completeness checking for us instead of runtime checking. It may also somewhat avoid the huge lists of boilerplate macros.
- There should be some way to test these. The documentation refers to some corpus of tests, but those are integration tests at best.
More minor design point is to avoid parsing the string from of mnemonics. There should be a better way of obtaining the same information from LLVM.
| // type used by the disassembler's TSFlags / OperandType fields. | ||
| #include "SIDefines.h" | ||
|
|
||
| #include "Utils/AMDGPUBaseInfo.h" // AMDGPU::isVOPD |
| // Alias `COMGR::hotswap::SIInstrFlags` to the LLVM namespace so existing call | ||
| // sites (`SIInstrFlags::SOPP`, `SIInstrFlags::FLAT`, etc.) keep compiling. |
There was a problem hiding this comment.
What does "existing call sites" mean here? Don't allow temporal references in comments, they are useless if not harmful over time.
|
|
||
| // AMDGPU target-specific operand type for VOP3 source modifiers (abs, neg). | ||
| // Defined in llvm::AMDGPU::OperandType from SIDefines.h. | ||
| constexpr unsigned OPERAND_INPUT_MODS = llvm::AMDGPU::OPERAND_INPUT_MODS; |
There was a problem hiding this comment.
Why not use the field from llvm::AMDGPU directly?
| // * `VOP3P` coexists with `VOP3` on some subtargets; check VOP3P first. | ||
| // * VOPD has no dedicated TSFlags bit (LLVM's VOPD3 bit varies across | ||
| // versions); use `AMDGPU::isVOPD(opc)` instead. | ||
| inline const char *formatName(uint64_t Flags, unsigned Opc) { |
| // the handler refusal contract is keyed on. | ||
| if (Flags & SIInstrFlags::VIMAGE) return "VIMAGE"; | ||
| if (Flags & SIInstrFlags::TENSOR_CNT) return "VIMAGE"; | ||
| return "Unknown"; |
There was a problem hiding this comment.
I would rather use Expected<StringRef> here or another proper error reporting mechanism that a magic string that is then string-matched to detect error.
| // Rationale: LLVM exposes `AMDGPU::getVCMPXOpFromVCMP` as a V_CMP → V_CMPX | ||
| // mapping, but no public helper that hands back a CmpInst::Predicate or | ||
| // element width. Rather than hand-list 100 opcode→metadata pairs we parse | ||
| // the pseudo name once at init time; the same token grammar is already | ||
| // hard-coded in LLVM's TableGen for these instructions. |
There was a problem hiding this comment.
Dubious. There's already a huge table above with a lot of boilerplate, and this function + doc is longer than 100 LoC so we don't even save space at a cost of extra complexity and potentially fragile string parsing.
| CanonicalOp OpcodeMap::lookup(unsigned Opcode) const { | ||
| auto It = Map.find(Opcode); | ||
| return It != Map.end() ? It->second : CanonicalOp::Unknown; | ||
| } |
There was a problem hiding this comment.
If only there was DenseMap::lookup/lookup_or.
| // `_nosdst_` collapse: starting with GFX11, VOPC CMPX instructions no longer | ||
| // write a scalar destination register (EXEC receives the mask directly) and | ||
| // LLVM represents this as a `_nosdst_` variant. The non-`_nosdst_` target | ||
| // form keeps the scalar dst (for older subtargets). Both forms share | ||
| // dispatch-relevant TSFlags; the raiser's CMPX handler only writes EXEC and | ||
| // ignores the optional sdst, so collapsing the variant onto the base is | ||
| // safe. The source has one fewer def when the base includes sdst (e64 | ||
| // forms) and the same number of defs otherwise (e32, where both lack sdst). |
There was a problem hiding this comment.
I don't see what this comment relates to, perhaps nosdstDropsScalarDef way below.
| ByName.try_emplace(MCII.getName(P), P); | ||
|
|
||
| struct Rule { | ||
| llvm::StringRef Needle; |
| auto StripOnce = [&](llvm::StringRef Name, std::string &Out) -> int { | ||
| for (size_t I = 0; I < std::size(Rules); ++I) { | ||
| const Rule &R = Rules[I]; | ||
| if (R.IsSuffix) { | ||
| if (!Name.ends_with(R.Needle)) | ||
| continue; | ||
| Out = Name.drop_back(R.Needle.size()).str(); | ||
| return static_cast<int>(I); | ||
| } | ||
| size_t Pos = Name.find(R.Needle); | ||
| if (Pos == llvm::StringRef::npos) | ||
| continue; | ||
| Out = (Name.substr(0, Pos).str() + std::string("_") + | ||
| Name.substr(Pos + R.Needle.size()).str()); | ||
| return static_cast<int>(I); | ||
| } | ||
| return -1; | ||
| }; |
There was a problem hiding this comment.
SmallString + the corresponding ostream would avoid reallocations here.
… in raiser
Adds the AMDGPU code-object metadata extraction surface that the rest of
the raiser pipeline depends on:
* `extractTextSection`, `listKernelNames`, `extractKernelMeta`,
`findKernelSymbolOffset`, and `detectIsaFromElf` in
`hotswap/code_object_utils.{h,cpp}`.
* `KernelMeta` populated with both MsgPack-derived fields (kernarg/group/
private segment sizes, args) and the raw kernel-descriptor register
bytes (`compute_pgm_rsrc{1,2}`, `kernel_code_properties`,
`kernarg_preload`) read from `<name>.kd` in `.rodata`.
Reuses comgr's existing parsing infrastructure rather than duplicating
it: each helper wraps the input ELF bytes in an
`AMD_COMGR_DATA_KIND_EXECUTABLE` data object via a small
`ScopedDataObject` RAII helper and routes through
`metadata::getMetadataRoot` / `metadata::getElfIsaName` for the metadata
walk and ISA-name extraction.
Also adopts llvm::Error / llvm::Expected for the raiser scaffolding
entry point (mirrors upstream rocm commit f68dd49): raiseToIR
returns Expected<RaiseResult>, validation failures surface as
HotswapError values via the existing hotswap-error.h, and the
handrolled raise-failure.h is removed. An empty-input scaffolding
mode bypass is added so test scaffolding can stub the raiser without
standing up a real ISA + metadata pair. The KD-bytes lookup comments
reflect partial-success (MsgPack-derived fields are returned even if
.rodata KD parse fails).
No MC stack, no disassembler, no opcode canonicalisation yet -- those
land in subsequent commits.
Co-Authored-By: Tim Gymnich <tim@gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Alex Zinenko <git@ozinenko.com>
Adds the three foundation files the per-format handlers will branch on:
* canonical_op.{h,cpp} — `CanonicalOp` enum (one entry per
target-independent semantic op the raiser knows how to lift) and a
name table for diagnostics.
* amdgpu_formats.h — convenience aliases over `SIInstrFlags` so
handler dispatches read as `Flags & FORMAT_VALU` / `FORMAT_SOPK`
instead of hand-typing the bit layout.
* opcode_map.{h,cpp} — `OpcodeMap` builds an MC-opcode -> CanonicalOp
table once at first use by walking the MCInstrInfo registered for
a given AMDGPU subtarget. Per-handler entries are folded into the
table here; the actual `lookup` consumers come in subsequent
handler commits.
`hotswap/CMakeLists.txt` grows to compile canonical_op.cpp +
opcode_map.cpp into the OBJECT lib. No new external dependencies.
Co-Authored-By: Tim Gymnich <tim@gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8cbf5bd to
d3b2b77
Compare
e20dd25 to
877df15
Compare
877df15 to
9428a8b
Compare
Adds
mc_state.{hpp,cpp}providinginitMCState— a one-call setup of the LLVM MC stack (Target, Triple, MCInstrInfo, MCRegisterInfo, MCAsmInfo, MCSubtargetInfo, MCContext, MCDisassembler, MCInstPrinter) keyed off an AMDGPU ISA name.The MC stack is consumed by the disassembler and per-handler raiser code that lands in subsequent commits. Includes the
InlineSrcMgr-on-MCContext defensive init that prevents the SIG6Either SourceMgr should be available UNREACHABLEabort, with a gtest regression fence.CMakeLists gains the LLVM build-tree requirement, the AMDGPU target-private include paths (SIDefines.h, AMDGPUBaseInfo.h, the TableGen-generated AMDGPUGen*.inc files), and the LLVM MC + AMDGPU library link surface.