feat(convert): automated Triton→Gluon convert session + consumable co…#15
Merged
Merged
Conversation
…nverter sheets Convert session (orchestrator/optimize.py + prompts/convert.md + agents/gpu-kernel-convert.md): - After --convert-after (default 5) stalled Triton iterations, spend ONE convert-only session that lowers kernel.py Triton→Gluon (no optimization), arch-selected. - Commit gate = correctness AND performance parity: gluon geomean must be within +5% of the Triton HEAD. Gated in the agent AND mechanically in optimize.py (reverts a >5%-slower commit). - Cyclic + failure-aware: a rejected conversion is recorded to memory (survives the safety-net revert), Triton optimization resumes, and conversion re-fires after another --convert-after stalled rounds; each attempt reads prior failures and takes a different lowering. - tools/extract_ttgir.py for real-layout extraction. Converter KB reformat (gpu-wiki/docs/converter): one consumable sheet per framework transition (API map + pitfalls + pointers to local Triton 3.7 source) instead of floating per-topic files. - New: pytorch-to-triton.md, nvidia/blackwell.md (tcgen05/TMEM/TMA), nvidia/hopper.md, amd/cdna3.md, amd/cdna4.md; README is now a (transition, arch) router. - Collapsed the old 7-file-per-arch trees + amd/common + the misfiled Hopper dup; fixed RELATIONS.md, dir READMEs, top README, and 3 ref-docs cross-links. Sheets kept product-neutral (self-contained check passes).
…er, restore diagnostics - optimize.py: detect a committed conversion via git HEAD (head_kernel_is_gluon) + quality_gate PASS, not memory git_commit_hash (a convert session may commit without recording the hash); the ±5% parity revert now fires reliably. - convert.md / gpu-kernel-convert.md: mandate TTGIR-first (dump + read THIS kernel's real layouts before drafting Gluon; never copy the reference example's layouts); sm_103 (B300) routes to blackwell.md. - converter sheets: sm_103 coverage; restored condensed "Common failures (symptom -> cause -> fix)" diagnostics (exact error strings, measured regressions) per arch — the least source-recoverable, highest-value content for the failure/retry path.
…letion before exit
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…nverter sheets
Convert session (orchestrator/optimize.py + prompts/convert.md + agents/gpu-kernel-convert.md):
Converter KB reformat (gpu-wiki/docs/converter): one consumable sheet per framework transition (API map + pitfalls + pointers to local Triton 3.7 source) instead of floating per-topic files.