Skip to content

feat(convert): automated Triton→Gluon convert session + consumable co…#15

Merged
smallmou merged 3 commits into
alibaba:sol_execbenchfrom
yanglf1121:sol_execbench_beiyuan
Jul 5, 2026
Merged

feat(convert): automated Triton→Gluon convert session + consumable co…#15
smallmou merged 3 commits into
alibaba:sol_execbenchfrom
yanglf1121:sol_execbench_beiyuan

Conversation

@yanglf1121

Copy link
Copy Markdown
Contributor

…nverter sheets

Convert session (orchestrator/optimize.py + prompts/convert.md + agents/gpu-kernel-convert.md):

  • After --convert-after (default 5) stalled Triton iterations, spend ONE convert-only session that lowers kernel.py Triton→Gluon (no optimization), arch-selected.
  • Commit gate = correctness AND performance parity: gluon geomean must be within +5% of the Triton HEAD. Gated in the agent AND mechanically in optimize.py (reverts a >5%-slower commit).
  • Cyclic + failure-aware: a rejected conversion is recorded to memory (survives the safety-net revert), Triton optimization resumes, and conversion re-fires after another --convert-after stalled rounds; each attempt reads prior failures and takes a different lowering.
  • tools/extract_ttgir.py for real-layout extraction.

Converter KB reformat (gpu-wiki/docs/converter): one consumable sheet per framework transition (API map + pitfalls + pointers to local Triton 3.7 source) instead of floating per-topic files.

  • New: pytorch-to-triton.md, nvidia/blackwell.md (tcgen05/TMEM/TMA), nvidia/hopper.md, amd/cdna3.md, amd/cdna4.md; README is now a (transition, arch) router.
  • Collapsed the old 7-file-per-arch trees + amd/common + the misfiled Hopper dup; fixed RELATIONS.md, dir READMEs, top README, and 3 ref-docs cross-links. Sheets kept product-neutral (self-contained check passes).

…nverter sheets

Convert session (orchestrator/optimize.py + prompts/convert.md + agents/gpu-kernel-convert.md):
- After --convert-after (default 5) stalled Triton iterations, spend ONE convert-only session that
  lowers kernel.py Triton→Gluon (no optimization), arch-selected.
- Commit gate = correctness AND performance parity: gluon geomean must be within +5% of the Triton
  HEAD. Gated in the agent AND mechanically in optimize.py (reverts a >5%-slower commit).
- Cyclic + failure-aware: a rejected conversion is recorded to memory (survives the safety-net revert),
  Triton optimization resumes, and conversion re-fires after another --convert-after stalled rounds;
  each attempt reads prior failures and takes a different lowering.
- tools/extract_ttgir.py for real-layout extraction.

Converter KB reformat (gpu-wiki/docs/converter): one consumable sheet per framework transition
(API map + pitfalls + pointers to local Triton 3.7 source) instead of floating per-topic files.
- New: pytorch-to-triton.md, nvidia/blackwell.md (tcgen05/TMEM/TMA), nvidia/hopper.md, amd/cdna3.md,
  amd/cdna4.md; README is now a (transition, arch) router.
- Collapsed the old 7-file-per-arch trees + amd/common + the misfiled Hopper dup; fixed RELATIONS.md,
  dir READMEs, top README, and 3 ref-docs cross-links. Sheets kept product-neutral (self-contained check passes).
…er, restore diagnostics

- optimize.py: detect a committed conversion via git HEAD (head_kernel_is_gluon) + quality_gate PASS,
  not memory git_commit_hash (a convert session may commit without recording the hash); the ±5% parity
  revert now fires reliably.
- convert.md / gpu-kernel-convert.md: mandate TTGIR-first (dump + read THIS kernel's real layouts before
  drafting Gluon; never copy the reference example's layouts); sm_103 (B300) routes to blackwell.md.
- converter sheets: sm_103 coverage; restored condensed "Common failures (symptom -> cause -> fix)"
  diagnostics (exact error strings, measured regressions) per arch — the least source-recoverable,
  highest-value content for the failure/retry path.
@smallmou smallmou merged commit eab1536 into alibaba:sol_execbench Jul 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants