[ci] Guard downstream consumers on empty upstream output URLs#5227
[ci] Guard downstream consumers on empty upstream output URLs#5227FIM43-Redeye wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates several GitHub Actions workflows to avoid unauthenticated S3 upload attempts on fork PRs (where OIDC role assumption is skipped), and to prevent downstream consumer jobs from running when the upstream upload-produced URLs are empty.
Changes:
- Add fork/repo guards to S3 upload steps so fork PRs no-op instead of failing with
NoCredentialsError. - Add downstream job
if:guards to skip install/wheel/PyTorch jobs whenpackage_repository_url/package_find_links_urloutputs are empty. - Document the “empty output on forks” behavior inline in the workflows to clarify why skips occur.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| .github/workflows/multi_arch_build_native_linux_packages.yml | Mirrors the existing AWS-credential fork guard on the native package repo upload step. |
| .github/workflows/build_portable_linux_python_packages.yml | Adds an upload-step guard to prevent unauthenticated S3 uploads on forks (but currently too strict for rockrel release callers). |
| .github/workflows/build_windows_python_packages.yml | Adds an upload-step guard to prevent unauthenticated S3 uploads on forks (but currently too strict for rockrel release callers). |
| .github/workflows/multi_arch_ci_linux.yml | Skips install tests / wheel tests / PyTorch builds when the upstream URL outputs are empty. |
| .github/workflows/multi_arch_ci_windows.yml | Skips wheel tests / PyTorch builds when the upstream find-links URL output is empty. |
| .github/workflows/ci_linux.yml | Skips wheel tests / PyTorch builds when the upstream find-links URL output is empty. |
| .github/workflows/ci_windows.yml | Skips wheel tests / PyTorch builds when the upstream find-links URL output is empty. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Address Copilot review on ROCm#5227: ROCm/rockrel calls the Python package build workflows via workflow_call for scheduled releases (documented in multi_arch_release.yml). When that happens, github.repository inside the called workflow is "ROCm/rockrel", not "ROCm/TheRock". The original guard `github.repository == 'ROCm/TheRock'` therefore force-skipped the upload step on rockrel runs even though those runs have working AWS credentials via the runner-mounted AWS_SHARED_CREDENTIALS_FILE (the OIDC composite's repo allowlist is narrower than the actual credential reachability). Widen the allowlist to {ROCm/TheRock, ROCm/rockrel}, matching the existing condition used by the Configure AWS Credentials step in multi_arch_build_native_linux_packages.yml. Generated using Claude Code.
marbre
left a comment
There was a problem hiding this comment.
Thanks for raising this. We had a mechanism to authenticate differently when a request comes from a fork. Something seems broken here and it should indeed be supported on approved runs to push builds when the PR comes from a fork.
@geomin12 @ScottTodd, I saw you two have been partly involved in related fixes, can you comment too?
ScottTodd
left a comment
There was a problem hiding this comment.
All workflows that run on pull requests must work from PRs sent from forks, without losing any functionality.
Not uploading essential files is not an option.
Geo has #5066 trying to fix the deb/rpm upload, and I've seen a few instances recently where the runner base credentials don't work. We need to fix in those ways, not by skipping uploads.
|
There are some scenarios where skipping uploads could make sense, like running workflows in forks (as I do occasionally for testing, see https://github.com/ROCm/TheRock/blob/main/docs/development/github_actions_debugging.md#working-effectively-from-forks), but in those cases I'm happy enough seeing that the workflows generally work (inputs get propagated correctly, scripts handle them, no syntax errors, builds start running, etc.). GitHub-hosted runners are slow and can run out of disk space on many of our workflows, so I don't really see much value in trying to make the workflows run to successful completion outside of the ROCm org. |
Adds `&& outputs.X != ''` to install tests, wheel tests, and PyTorch builds in ci_linux.yml, ci_windows.yml, multi_arch_ci_linux.yml, and multi_arch_ci_windows.yml so these jobs cleanly skip when the upstream upload produces no URL, rather than running and failing during the install step. Defensive on the consumer side only; the upstream auth gap that produces empty URLs on fork PRs is being addressed separately in ROCm#5066.
13deea7 to
ff1a6e6
Compare
|
@ScottTodd @marbre That's a lot more sensible for sure. Old mindsets from old times, I'm used to treating anything involving credentials as set in stone. Scrapped the fork guards on upload steps. Are the downstream checks helpful or should I close this one? |
ScottTodd
left a comment
There was a problem hiding this comment.
Ehhh this doesn't look right either.
Consider the case of running in a personal fork without self-hosted runners or credentials to upload:
- The builds on CPU machines are likely to fail due to low disk space or slow build times
- The uploads are attempted and will fail
- The tests on GPU machines would fail due to not having files to download and not having self-hosted runners registered to process the jobs
I don't want to sprinkle so many conditions in the workflows - they are designed for this repository where they have everything they need.
A few of the lighter weight workflows like https://github.com/ROCm/TheRock/blob/main/.github/workflows/build_portable_linux_python_packages.yml that consume already built artifacts from your choice of repository + run id and package them are at least possible to test from repository forks without self-hosted runners or credentials - they just fail on the final upload step.
Here's a recent run from my fork for example: https://github.com/ScottTodd/TheRock/actions/runs/24217435275/job/70701188683. That fetched artifacts and built tarballs from them in ~10 minutes:
python build_tools/build_tarballs.py \
--run-id="24187929660" \
--run-github-repo="ROCm/TheRock" \
--dist-amdgpu-families="gfx1151;gfx1100" \
--platform="linux" \
--package-version="7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3" \
--output-dir="/home/runner/work/TheRock/TheRock/tarballs"then it failed to upload (as expected) at the end:
[INFO] Uploading to s3://therock-ci-artifacts-external/ScottTodd-TheRock/24217435275-linux/tarballs
S3 upload attempt 1/3 failed for s3://therock-ci-artifacts-external/ScottTodd-TheRock/24217435275-linux/tarballs/therock-dist-linux-gfx1100-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz: Unable to locate credentials. Retrying in 1.0s...
S3 upload attempt 1/3 failed for s3://therock-ci-artifacts-external/ScottTodd-TheRock/24217435275-linux/tarballs/therock-dist-linux-gfx1151-7.13.0.dev0+83ae8235312791cd7302e3f50c9935887d62b5a3.tar.gz: Unable to locate credentials. Retrying in 1.0s...
(there were three 3GB tarballs, so too large for github artifacts and would need to go to some cloud storage like our S3 buckets)
That workflow was functional enough for me to test there before I moved over to a PR in this repository to contribute it.
|
Got it, the credential layer is where the fixes live. Thanks for walking me through the fork-test model, that was the bit I was missing. Closing as inapplicable. |
Summary
When upstream upload steps (S3 wheel/package uploads) produce no URL
output — e.g. fork PRs without credentials, or transient upload
failures — downstream consumer jobs (install tests, wheel tests,
PyTorch builds) still run and then fail at the install step with
"no such package" errors. This PR adds
&& steps.X.outputs.Y != ''to each consumer's
if:condition so they skip cleanly when theupstream URL is empty, rather than starting and erroring.
The guard is robust to any cause of a missing upload URL, not only
forks. It is purely a consumer-side defensive check; the upstream
auth gap that produces empty URLs on fork PRs is being addressed
separately at the credential layer in #5066.
Changes
ci_linux.ymlci_windows.ymlmulti_arch_ci_linux.ymlmulti_arch_ci_windows.ymlOut of scope
NoCredentialsErroron fork PRs against AWSuploads) is being fixed at the credential layer in [ci] Fixing dep rpm packages for forked upload #5066. This PR
does not skip any upload step.
build_portable_linux_pytorch_wheels.yml/build_windows_pytorch_wheels.ymluse directaws-actions/configure-aws-credentialswithout a fork-guard, soon forks they fail at Configure, not Upload — different signature,
separate work.
separately in [ci] Skip already-exited processes in Windows runner prejob cleanup #5228.
Test plan
pre-commit runclean on all 4 files (including actionlint)skip cleanly when upstream outputs are empty rather than
erroring during install
Generated using Claude Code.