[Comgr][hotswap] Add DecodedInst + basic kernel decoder#2517
Conversation
963fd34 to
5f3d5aa
Compare
6e6f193 to
4d23bed
Compare
5f3d5aa to
5a5c860
Compare
4d23bed to
f434f7f
Compare
5a5c860 to
a098906
Compare
f434f7f to
5b815e6
Compare
a098906 to
4eda47a
Compare
5b815e6 to
634765b
Compare
4eda47a to
4b9f8b4
Compare
634765b to
a32b336
Compare
4b9f8b4 to
97a7c25
Compare
a32b336 to
5249422
Compare
97a7c25 to
4dbeb97
Compare
5249422 to
eac9fc5
Compare
4dbeb97 to
8cbf5bd
Compare
eac9fc5 to
c1bbdbf
Compare
ftynse
left a comment
There was a problem hiding this comment.
Overall, the DecodedInst data structure looks needlessly fat and redundant. A lot of its fields are duplicating information from MCInst, which it also stores, and should be converted into accessor functions. Some flags are relevant for any instruction with a specific name/class, not a particular instance, so there is no value in storing them per-instance except micro-optimization on the hot path (that would require data prior to implementing them). Even when it's a hot path, a plausible solution would be to have a look-aside table with that information and store pointers into that table instead.
There are also comments about string parsing, though I don't really see any code performing, and would complain about it unless the approach is unavoidable. The only place with string parsing seems to be vcmp that I already complained about in the previous PR.
| } | ||
|
|
||
| namespace COMGR::hotswap { |
There was a problem hiding this comment.
| } | |
| namespace COMGR::hotswap { |
|
|
||
| namespace COMGR::hotswap { | ||
|
|
||
| struct DecodedInst { |
There was a problem hiding this comment.
This needs a doc. All of the fields need doc.
|
|
||
| } // namespace COMGR::hotswap | ||
|
|
||
| #endif |
There was a problem hiding this comment.
Nit: add a comment saying which #if this closes.
| std::string Mnemonic; | ||
| std::string RawMnemonic; |
There was a problem hiding this comment.
Do we really need to store the textual mnemonic, twice? For every instance of an instruction. Can't we rather have an enum entry and/or the opcode. It also looks like we have the MCInst below, so this may even be redundant.
| struct DecodedInst { | ||
| std::string Mnemonic; | ||
| std::string RawMnemonic; | ||
| std::string FullText; |
There was a problem hiding this comment.
If we really need to store textual representation of the instruction for some reason, as opposed to generating it on-the-fly from MCInt, maybe we can generate full textual assembly once into a memory mapped file or something like that and only store pointers/StringRefs to that instead of piecemeal dynamic allocations of strings for every instance.
| DecodedInst Di; | ||
| Di.RawMnemonic = getMnemonic(Mc, Inst); | ||
| { | ||
| std::string S; |
There was a problem hiding this comment.
SmallString to avoid dynamic allocations.
| Mc.Printer->printInst(&Inst, 0, "", *Mc.SubtargetInfo, Os); | ||
| Di.FullText = StringRef(S).ltrim().str(); |
There was a problem hiding this comment.
See comment above about printing the whole function and keeping references instead, assuming this is needed at all.
| Mc.Printer->printInst(&Inst, 0, "", *Mc.SubtargetInfo, Os); | ||
| Di.FullText = StringRef(S).ltrim().str(); | ||
| } | ||
| Di.Mnemonic = stripEncoding(StringRef(Di.RawMnemonic)).str(); |
| break; | ||
| } | ||
| Off += InstSize; | ||
| } |
There was a problem hiding this comment.
Perhaps we want to signal how many bytes we processed if it's not the whole input.
| static constexpr unsigned KMaxSrcs = 24; | ||
| unsigned SrcMap[KMaxSrcs] = {}; | ||
| unsigned ModMap[KMaxSrcs] = {}; |
There was a problem hiding this comment.
So we are storing 48*4=192 bytes of information per instruction here where the overwhelming majority of instructions won't ever use 24 sources. Sounds ridiculously wasteful. Just use a SmallVector.
|
A separate question is testing. We need a way to test this as well. |
Wraps the MC-layer disassembler in a hotswap-shaped API:
* decoded_inst.h — `DecodedInst` value type bundling MCInst, raw
bytes, kernel offset, TSFlags, and an explicit `IsBranch` /
`BranchTarget` decoration so Phase-3 BB layout can run without
re-querying MCInstrDesc.
* parsed_reg.h — `ParsedReg{Kind, BaseIdx, NReg}` value type so
handlers can match on register class without re-parsing
`MCOperand::getReg()` at every consumer.
* decode.{h,cpp} — `decodeKernel` walks the .text section once,
populates a `DecodeResult{Insts, BlockStarts, Offsets}` view used
by every subsequent raise phase. Block boundaries derive from the
branch target set discovered during the linear sweep so Phase 3
can pre-create LLVM BasicBlocks before any IR builder lands.
* mc_state.cpp — minor reshuffle to expose the `MCDisassembler` /
`MCSubtargetInfo` pair to the new decode entry point.
Co-Authored-By: Tim Gymnich <tim@gymni.ch>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8cbf5bd to
d3b2b77
Compare
c1bbdbf to
36b17dd
Compare
Wraps the MC-layer disassembler in a hotswap-shaped API:
DecodedInstvalue type bundling MCInst, raw bytes, kernel offset, TSFlags, and an explicitIsBranch/BranchTargetdecoration so Phase-3 BB layout can run without re-querying MCInstrDesc.ParsedReg{Kind, BaseIdx, NReg}value type so handlers can match on register class without re-parsingMCOperand::getReg()at every consumer.decodeKernelwalks the .text section once, populates aDecodeResult{Insts, BlockStarts, Offsets}view used by every subsequent raise phase. Block boundaries derive from the branch target set discovered during the linear sweep so Phase 3 can pre-create LLVM BasicBlocks before any IR builder lands.MCDisassembler/MCSubtargetInfopair to the new decode entry point.