[CI] Add parity auto-trigger workflow by ethanwee1 · Pull Request #3231 · ROCm/pytorch

ethanwee1 · 2026-05-18T14:26:03Z

Summary

Add a scheduled parity auto-trigger that scans completed pytorch/pytorch main trunk.yml pushes and dispatches parity.yml once per ready upstream SHA.
Gate dispatch on the ROCm arch workflows that actually ran for a SHA, plus the CUDA jobs consumed by parity, so partial reports are avoided.
Add a pull_request dry-run path with a smaller scan window to validate the scanner without creating parity reports from PR CI.

How it works

The workflow runs every 10 minutes and queries recent completed pytorch/pytorch trunk.yml push runs on main. Those trunk runs provide the candidate upstream SHAs to evaluate.
For each candidate SHA, it first checks recent ROCm/pytorch parity.yml run titles. If any existing parity run already contains that SHA, the SHA is skipped so we keep one report per upstream commit.
Maximum number of dispatches of parity.yml are 50, which is comfortably above the maximum number of commits to main branch of pytorch/pytorch in any 10-minute interval
It then lists all upstream workflow runs for that SHA and determines which ROCm arches actually ran. Missing periodic arch workflows are not treated as pending work; only arches with matching workflow files are expected in that report.
For the arches that did run, it lists upstream check-runs and waits for the matching ROCm test shards to reach status=completed. It also waits for the CUDA default, distributed, and inductor check-runs consumed by parity.
Auxiliary shards such as mem_leak_check and rerun_disabled_tests are ignored because the parity report does not consume them.
Once all relevant ROCm and CUDA check-runs are complete, it dispatches parity.yml with the ready arch list and a CSV prefix containing the upstream SHA, for example autoparity-YYYYMMDD-<sha>.
Pull request runs are forced to dry_run=true, so they exercise the scanner and log would-be dispatches without creating reports. Scheduled and manually dispatched runs can create real parity reports.

Test plan

Validated workflow YAML and embedded shell locally with yaml.BaseLoader and bash -n.
PR dry-run workflow succeeded: https://github.com/ROCm/pytorch/actions/runs/26039732579
Full non-dry-run workflow_dispatch succeeded: https://github.com/ROCm/pytorch/actions/runs/26041358738
The full run used dry_run=false, scanned 20 recent upstream trunk runs, skipped SHAs with pending parity check-runs, dispatched 5 ready SHAs, and stopped at max_dispatches=5.
Dispatched parity reports all completed successfully:
- d76e83ef / mi355: https://github.com/ROCm/pytorch/actions/runs/26041518406
- 457e1890 / mi355: https://github.com/ROCm/pytorch/actions/runs/26041528996
- 60f38508 / mi355: https://github.com/ROCm/pytorch/actions/runs/26041541647
- d1d96569 / mi355: https://github.com/ROCm/pytorch/actions/runs/26041551854
- 6e3cf2e4 / mi355, mi300, mi200: https://github.com/ROCm/pytorch/actions/runs/26041618237

Dispatch cadence note

The full validation run used max_dispatches=5 only to avoid flooding ROCm/pytorch during manual testing.
The production scheduled workflow runs every 10 minutes and defaults to max_dispatches=50, max_commits=200, and max_age_hours=72 unless manually overridden.

rocm-repo-management-api · 2026-05-18T14:37:05Z

Jenkins build for f4dfbd8845f2d05dd28225ca78af48c1926d9e31 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Copilot

Pull request overview

Adds a new GitHub Actions workflow to automatically scan recent completed trunk.yml push runs on main, determine when the relevant ROCm + CUDA check-runs for a given upstream SHA are fully complete, and then dispatch parity.yml (with a PR-only dry-run mode).

Changes:

Introduces a scheduled (every 10 minutes) + manual + PR dry-run “parity auto-trigger” workflow.
Implements SHA deduplication by checking existing parity.yml run titles in the current repo.
Gates dispatch on completion of ROCm arch shard check-runs (only for arch workflows detected as having run) plus specific CUDA check-runs used by parity.

jithunnair-amd · 2026-05-21T06:12:40Z

+          # Pull recent parity runs. Run titles look like:
+          #   "<csv_name or SHA> · mi355, mi300, mi200"
+          # Once any parity run exists for a SHA, we do not dispatch another
+          # report for that SHA. This keeps the dashboard to one report per
+          # upstream commit.
+          EXISTING=$(gh run list \
+            --repo "$GITHUB_REPOSITORY" \
+            --workflow parity.yml \
+            --limit 1000 \
+            --json displayTitle 2>/dev/null || echo '[]')
+
+          sha_already_dispatched() {
+            local sha="$1"
+            echo "$EXISTING" | jq -e --arg sha "$sha" \
+              'any(.[]; .displayTitle | contains($sha))' >/dev/null


Since the same parity.yml can be triggered outside of the auto-parity, we should narrow this down to consider only the auto-parity-triggered runs of parity.yml. In addition to SHA, one more "variable" is the arch list. For simplicity, let us get rid of the arch list workflow input entirely in auto-parity, so that we don't have to worry about whether a particular run covered all the archs we wanted.

That being said, I do prefer the suggested version by Copilot review above, since it gets rid of the magic number of 1000 and seems to make the window logic more robust by using the right API parameters.

Reworked. The dedup set is no longer gh run list --limit 1000; it is a REST query of parity.yml workflow_dispatch runs via --paginate with a created>=<max_age window> filter, so the dedup window always covers the scan window (no fixed 1000 cap).

Dedup now considers only auto-parity-created runs: matches require the autoparity- run-name prefix or the github-actions[bot] actor, so manual parity.yml runs no longer suppress automation. For the arch concern, this PR is scoped to trunk/mi355 (archs defaults to mi355) and the dedup key is SHA + auto-parity signal rather than the arch list, so partial-arch coverage cannot cause a missed/duplicate dispatch. The archs input is retained (defaulted) but no longer factors into dedup; happy to drop it entirely if you prefer for the trunk-only scope.

Done — implemented the Copilot suggestion: REST API with --paginate + a created>= time filter, dropping the magic 1000. The dedup window now derives from max_age_hours.

jithunnair-amd

@ethanwee1 I may not be able to help get this merged once you address the review comments, so please take Pruthvi's help for that. However, please do fix some of the issues pointed out in the review comments and make the inputs interface a bit simpler for the average user (please work with @pablo-garay to iterate on this).
I'm also okay with having the cron-scheduled runs only covering trunk in this PR, since that's our top requirement, and handle the complexity of other archs/workflows in a follow-up PR.

Thanks!

jithunnair-amd · 2026-05-21T06:12:40Z

+          # Pull recent parity runs. Run titles look like:
+          #   "<csv_name or SHA> · mi355, mi300, mi200"
+          # Once any parity run exists for a SHA, we do not dispatch another
+          # report for that SHA. This keeps the dashboard to one report per
+          # upstream commit.
+          EXISTING=$(gh run list \
+            --repo "$GITHUB_REPOSITORY" \
+            --workflow parity.yml \
+            --limit 1000 \
+            --json displayTitle 2>/dev/null || echo '[]')
+
+          sha_already_dispatched() {
+            local sha="$1"
+            echo "$EXISTING" | jq -e --arg sha "$sha" \
+              'any(.[]; .displayTitle | contains($sha))' >/dev/null


Since the same parity.yml can be triggered outside of the auto-parity, we should narrow this down to consider only the auto-parity-triggered runs of parity.yml. In addition to SHA, one more "variable" is the arch list. For simplicity, let us get rid of the arch list workflow input entirely in auto-parity, so that we don't have to worry about whether a particular run covered all the archs we wanted.

jithunnair-amd · 2026-05-26T11:35:34Z

+          # Pull recent parity runs. Run titles look like:
+          #   "<csv_name or SHA> · mi355, mi300, mi200"
+          # Once any parity run exists for a SHA, we do not dispatch another
+          # report for that SHA. This keeps the dashboard to one report per
+          # upstream commit.
+          EXISTING=$(gh run list \
+            --repo "$GITHUB_REPOSITORY" \
+            --workflow parity.yml \
+            --limit 1000 \
+            --json displayTitle 2>/dev/null || echo '[]')
+
+          sha_already_dispatched() {
+            local sha="$1"
+            echo "$EXISTING" | jq -e --arg sha "$sha" \
+              'any(.[]; .displayTitle | contains($sha))' >/dev/null


That being said, I do prefer the suggested version by Copilot review above, since it gets rid of the magic number of 1000 and seems to make the window logic more robust by using the right API parameters.

rocm-repo-management-api · 2026-05-27T14:37:15Z

Jenkins build for 1375a6adeea743262fe99f276cb8afb5c511af86 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

ethanwee1 · 2026-05-27T14:43:08Z

Addressed the latest review feedback from Jithun:

Simplified the workflow_dispatch interface by removing the user-facing arch and regex-map inputs.
Scoped this PR to completed upstream trunk.yml pushes and mi355 default parity only; broader arch/workflow coverage can follow separately.
Removed the check-run/workflow regex duplication with download_testlogs. Readiness is now based on completed upstream trunk workflow runs for the trunk-only report.
Narrowed dedupe to auto-parity-created parity runs by using the autoparity-* run title prefix, so manual parity.yml runs do not suppress automation.
Reworked trunk run scanning to page until max_commits unique SHAs are collected, so values above GitHub's per_page=100 cap work without scanning the entire workflow history.

Validation:

YAML parse + extracted shell bash -n passed locally.
Local dry-run scan passed.
Remote workflow_dispatch dry-run passed: https://github.com/ROCm/pytorch/actions/runs/26518294404

rocm-repo-management-api · 2026-05-27T14:52:11Z

Jenkins build for 1375a6adeea743262fe99f276cb8afb5c511af86 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

Add a scheduled scanner that dispatches one parity report per ready upstream PyTorch main commit, with PR dry-runs to validate readiness without creating reports.

Scope auto parity to completed trunk mi355 default reports, remove user-facing regex inputs, and use paginated workflow APIs for candidate and dedupe windows.

Fetch completed trunk runs page by page only until the configured unique SHA limit is reached, avoiding full workflow-history scans.

rocm-repo-management-api · 2026-05-28T18:07:09Z

Jenkins build for d7ff112ed2257d50c0031bdd24d25675b3f89e52 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

Stop passing csv_name from parity-auto so auto-dispatched parity reports use the same output names as direct parity.yml runs.

Restore the auto parity ready-arch dispatch behavior while preserving parity.yml's default CSV naming by omitting csv_name.

rocm-repo-management-api · 2026-05-28T18:21:53Z

Jenkins build for d7ff112ed2257d50c0031bdd24d25675b3f89e52 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

…efix Re-introduce the "autoparity-" marker on parity runs dispatched by parity-auto.yml so they are visually distinguishable in the Actions list and the title-based dedup keeps working - without reusing csv_name, which also drives the output/CSV filenames. Adds a dedicated boolean parity.yml input `auto_triggered` that only prefixes the run name with "autoparity-" (leaving csv_name and all output filenames untouched), and has parity-auto.yml pass `-f auto_triggered=true` on dispatch.

rocm-repo-management-api · 2026-06-04T15:22:15Z

Jenkins build for 9363a9ff67078a129173686393c3c75d05a8105d commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

The every-commit auto-trigger should only cover what trunk.yml runs on every main push: the mi355 (gfx950) ROCm test shards that ride along in trunk.yml, compared against trunk's CUDA test shards. The other ROCm arches (mi300, mi200, navi31, nightly) come from separate periodic workflows that don't run on every commit, so including them produced partial/late reports. Reduce the defaults to mi355 only and restrict its workflow regex to trunk.yml: archs -> "mi355" arch_jobname_regex_map -> {"mi355": "rocm.*mi355.*/ test (default|distributed|inductor),"} arch_workflow_regex_map-> {"mi355": "(^|/)trunk.yml$"} The CUDA gate is unchanged. The maps remain overridable inputs for manual multi-arch dispatches.

rocm-repo-management-api · 2026-06-04T15:36:53Z

Jenkins build for 8fa2be633504300696fbb45ac85f5bc53e212b7e commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

Add parity_job_config.json (shared with the download_testlogs PR #3278) and have parity-auto.yml read the ROCm check-run/workflow regexes and the CUDA check-run regex from it over the contents API at GITHUB_SHA, instead of hardcoding them. The arch_*_regex_map inputs become optional overrides (blank = use the config). One source of truth for job-name matching. parity.yml: readability only - a top-of-file pipeline overview, an explanation of the dense run-name precedence, and comments on the artifact-prefix logic and the download_testlogs flag builder. No behaviour change. Note: parity_job_config.json is also added in #3278; merge that first (or rebase) to avoid an add/add conflict.

rocm-repo-management-api · 2026-06-04T17:37:20Z

Jenkins build for 608493bf99fbfdecabc97e03bc58f68aae8707a1 commit finished as NOT_BUILT
Links: Pipeline Overview / Build artifacts / Test Results

Addresses review feedback: replace the per-line echo runs in the config-summary and step-summary blocks with single printf '%s\n' calls so the printing logic is a few lines instead of many.

rocm-repo-management-api · 2026-06-04T21:37:01Z

Jenkins build for 608493bf99fbfdecabc97e03bc58f68aae8707a1 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

jithunnair-amd requested a review from Copilot May 20, 2026 00:23

Copilot started reviewing on behalf of jithunnair-amd May 20, 2026 00:25 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

jithunnair-amd requested changes May 26, 2026

View reviewed changes

ethanwee1 requested a review from pablo-garay May 26, 2026 18:47

ethanwee1 added 3 commits May 28, 2026 18:02

[CI] Add parity auto-trigger workflow

061999a

Add a scheduled scanner that dispatches one parity report per ready upstream PyTorch main commit, with PR dry-runs to validate readiness without creating reports.

[CI] Simplify parity auto trigger scope

1945a34

Scope auto parity to completed trunk mi355 default reports, remove user-facing regex inputs, and use paginated workflow APIs for candidate and dedupe windows.

[CI] Limit parity auto trunk pagination

a2b268a

Fetch completed trunk runs page by page only until the configured unique SHA limit is reached, avoiding full workflow-history scans.

ethanwee1 force-pushed the ethanwee/parity-auto-every-commit branch from 1375a6a to a2b268a Compare May 28, 2026 18:03

ethanwee1 added 2 commits May 28, 2026 18:09

[CI] Preserve parity default CSV naming

1495013

Stop passing csv_name from parity-auto so auto-dispatched parity reports use the same output names as direct parity.yml runs.

[CI] Keep multi-arch auto parity dispatch

d7ff112

Restore the auto parity ready-arch dispatch behavior while preserving parity.yml's default CSV naming by omitting csv_name.

pablo-garay reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/workflows/parity-auto.yml Outdated

pablo-garay reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/workflows/parity-auto.yml Outdated

pablo-garay reviewed Jun 3, 2026

View reviewed changes

Comment thread .github/workflows/parity-auto.yml Outdated

parity-auto: collapse echo blocks into printf for readability

608493b

Addresses review feedback: replace the per-line echo runs in the config-summary and step-summary blocks with single printf '%s\n' calls so the printing logic is a few lines instead of many.

jithunnair-amd mentioned this pull request Jun 5, 2026

[CI] Add parity auto-trigger workflow #3247

Closed

1 task

Conversation

ethanwee1 commented May 18, 2026 • edited by jithunnair-amd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How it works

Test plan

Dispatch cadence note

Uh oh!

rocm-repo-management-api Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd May 21, 2026

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd May 26, 2026

Choose a reason for hiding this comment

Uh oh!

ethanwee1 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ethanwee1 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ethanwee1 Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

jithunnair-amd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd May 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jithunnair-amd May 26, 2026

Choose a reason for hiding this comment

Uh oh!

rocm-repo-management-api Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethanwee1 commented May 27, 2026

Uh oh!

rocm-repo-management-api Bot commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

ethanwee1 commented May 18, 2026 •

edited by jithunnair-amd

Loading

rocm-repo-management-api Bot commented May 18, 2026 •

edited

Loading

rocm-repo-management-api Bot commented May 27, 2026 •

edited

Loading

rocm-repo-management-api Bot commented May 27, 2026 •

edited

Loading

rocm-repo-management-api Bot commented May 28, 2026 •

edited

Loading

rocm-repo-management-api Bot commented May 28, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 4, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 4, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 4, 2026 •

edited

Loading

rocm-repo-management-api Bot commented Jun 4, 2026 •

edited

Loading