Skip to content

Fix Python PostCommit Dependency#37725

Open
aIbrahiim wants to merge 6 commits intoapache:masterfrom
aIbrahiim:fix-python-postcommit-dependency-1
Open

Fix Python PostCommit Dependency#37725
aIbrahiim wants to merge 6 commits intoapache:masterfrom
aIbrahiim:fix-python-postcommit-dependency-1

Conversation

@aIbrahiim
Copy link
Contributor

@aIbrahiim aIbrahiim commented Feb 27, 2026

Fixes: #30799
Successful Run: https://github.com/aIbrahiim/beam/actions/runs/22507665882
Fix Python PostCommit Dependency by excluding Vertex AI tests from the embeddings suite and updating pyarrow/py310 transformers config.

  • Exclude vertex_ai_postcommit tests from py310-embeddings and TFHub embeddings (run in PostCommit Python instead).
  • Add @pytest.mark.vertex_ai_postcommit to Vertex AI embedding tests.
  • Adjust pyarrow and transformers tox envs for py310 dep suite.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses issues with Python post-commit dependency checks by updating the pyarrow versions tested in the py310 environment and refining the tox.ini configuration for transformers test environments. These changes ensure that the CI system accurately reflects current dependency requirements and improves the reliability of Python SDK tests.

Highlights

  • Python PyArrow Dependency Updates: Updated the py310 post-commit dependency tests to include pyarrow-6 and pyarrow-19 through pyarrow-23, while removing older versions pyarrow-9 through pyarrow-14.
  • Transformers Test Environment Refinement: Modified the tox.ini configuration for py{310,311}-transformers test environments by setting pip_pre = False, simplifying extras to test, and correcting the dependency definition for transformers version 448.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • sdks/python/test-suites/tox/py310/build.gradle
    • Removed testPy310pyarrow-9 through testPy310pyarrow-14 tasks and their dependencies from postCommitPyDep.
    • Added testPy310pyarrow-6 task and its dependencies to postCommitPyDep.
    • Added testPy310pyarrow-19 through testPy310pyarrow-23 tasks and their dependencies to postCommitPyDep.
  • sdks/python/tox.ini
    • Introduced pip_pre = False to the py{310,311}-transformers test environment.
    • Changed extras from test,gcp,ml_test to test in the py{310,311}-transformers test environment.
    • Corrected the DEPS definition for 448 in the py{310,311}-transformers test environment, replacing a 455 entry.
Activity
  • No specific activity (comments, reviews, progress updates) was found in the provided context.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@aIbrahiim aIbrahiim force-pushed the fix-python-postcommit-dependency-1 branch from bcb737e to 590684f Compare March 2, 2026 20:59
@aIbrahiim aIbrahiim marked this pull request as ready for review March 2, 2026 21:03
@aIbrahiim
Copy link
Contributor Author

cc @kennknowles @tvalentyn

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

/bin/sh -c "pip freeze | grep -E transformers"
# Allow exit code 5 (no tests run) so that we can run this command safely on arbitrary subdirectories.
/bin/sh -c 'pytest apache_beam/ml/transforms/embeddings -o junit_suite_name={envname} --junitxml=pytest_{envname}.xml -n 6 {posargs}; ret=$?; [ $ret = 5 ] && exit 0 || exit $ret'
/bin/sh -c 'pytest apache_beam/ml/transforms/embeddings -o junit_suite_name={envname} --junitxml=pytest_{envname}.xml -n 6 -m "not vertex_ai_postcommit" {posargs}; ret=$?; [ $ret = 5 ] && exit 0 || exit $ret'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't like that we might have to remember to exclude some tests here; can we think of an approach where tests are not run (for example, test is skipped because it detects that a dep is not installed, or test is placed in a folder that is not picked up by this suite), rather than having to exclude certain tests?

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

Assigning reviewers:

R: @claudevdm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

exec {
executable 'sh'
args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs"
args '-c', ". ${envdir}/bin/activate && RUN_VERTEX_AI_TESTS=1 ${runScriptsDir}/run_integration_test.sh $cmdArgs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously this RUN_VERTEX_AI_TESTS was read by a pytest_collection_modifyitems hook in sdks/python/conftest.py which used it to skip or run tests marked vertex_ai_postcommit and in this PR i removed that hook and vertex AI tests now decide whether to run based only on whether their dependencies can be imported as a result nothing reads RUN_VERTEX_AI_TESTS anymore so setting RUN_VERTEX_AI_TESTS=1 in the gradle task no longer changes behavior and is effectively a no op, it is harmless but a bit confusing so I will follow up with a small cleanup to remove that env var from the gradle config if possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite follow this - i've searched RUN_VERTEX_AI_TESTS and it doesn't come up anywhere in the codebase. Do you have any pointers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh corrent as it's already dead code from the refactor, i will remove it

428: DEPS = sentence-transformers==2.2.2 'transformers>=4.28.0,<4.29.0' 'torch>=1.9.0,<1.14.0'
447: DEPS = 'transformers>=4.47.0,<4.48.0' 'torch>=1.9.0,<1.14.0'
455: DEPS = 'transformers>=4.55.0,<4.56.0' 'torch>=2.0.0,<2.1.0'
448: DEPS = 'transformers>=4.48.0,<4.49.0' 'torch>=1.9.0,<1.14.0'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we no longer interested in testing newer versions ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

those bounds describe the supported window for this tox environment today rather than the absolute latest possible versions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the point of this dependency workflow is to run test scenarios using multiple versions of a dependency. We only do this for dependencies we care about and for versions we care about.

For example, for pyarrow, we have an extensive test suite that exercises every version. It is an outlier. In the vast majority of tests, we only test the lastest supported versions of a dependency, in some cases we test the earliest version we support, and the latest version we support.

Looking at this change it is not obvious to me why we making a change to test transformers 4.48.x instead of transformers 4.55.x.

Looking at

we do use transformers 4.55.x

Hence I asked the question about the motivation of your change since it isn't obvious.

Copy link
Contributor

@tvalentyn tvalentyn Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i looked some more, looks like there is 'latest' version spec on line 526. I am comfortable with testing only 4.28, 4.47 and latest . i doubt adding 4.48 would be valuable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest we verify that we have a suite that runs with 'latest' version spec for transformer suite, and remove 4.48 and 4.55 suites.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh sounds good so i will keep only 428, 447, and latest for the transformers suite and remove the 448 (and 4.55-only) env from tox and drop the corresponding gradle tasks and will doube check that the latest suite runs with the current config

deps =
# Environment dependencies are defined in the `setenv` section and installed in the `commands` section.
extras = test,gcp,ml_test
pip_pre = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if we don't do this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it keeps tox from installing pre release versions of dependencies which tends to make the test environments more stable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is true; but the point of using --pre was to test upcoming releases to have a heads-up when things might start breaking before things actually break.

are test environments unstable because pip cannot resolve dependencies or for some other reasons (pre-release deps have bugs, etc.)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if using --pre becomes too much of a hassle for us, its ok to disable, just wanted to understand the reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes mostly it was pip having trouble resolving things when pre releases happening as i saw installs fail or get flaky so i turned it off to make CI more stable

all(isinstance(x, float) for x in actual.embedding.dense_embedding))


@skip_if_vertex_ai_disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already try to import vertext ai in line 41 -- what does adding this decorator change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh right that i already gate on VERTEX_AI_AVAILABLE via the local unittest.skipIf, so the decorator doesnt add new behavior there it was meant to align all vertex AI tests on a shared helper but in this specific file it is redundant so i will drop @skip_if_vertex_ai_disabled here

from apache_beam.ml.inference.base import RunInference
from apache_beam.ml.transforms import base
from apache_beam.ml.transforms.base import MLTransform
from apache_beam.testing.vertex_ai_skip import skip_if_vertex_ai_disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this decorator is only about one line import, why not to add this import around line 40?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i created skip_if_vertex_ai_disabled as a check for if vertex AI dependencies available in one place (vertex_ai_skip.py) so i dont have to repeat similar import and try/except checks in every vertex test file but the local imports around line 40 are still needed for the test logic itself but the decorator let us keep the skip condition shared

Copy link
Contributor

@tvalentyn tvalentyn Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reads very strange seeing both

@skip_if_vertex_ai_disabled
@unittest.skipIf(
    VertexAITextEmbeddings is None, 'Vertex AI Python SDK is not installed.')

A reader of this code would likely think, why both checks are necessary. We should find a way to check this once.

@@ -58,6 +61,8 @@
model_name: str = "text-embedding-005"


@skip_if_vertex_ai_disabled
@pytest.mark.vertex_ai_postcommit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that adding
vertex_ai_postcommit would make these tests run on Dataflow, instead of Beam direct runner:

"collect": "vertex_ai_postcommit" ,

Was that your intent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is intentional as these vertex AI tests are integration tests and kept it to run in the dataflow vertexAIInferenceTest suite rather than the standard DirectRunner unit test jobs and the vertex_ai_postcommit marker makes sure the dataflow gradle task collects them and at the same time local runs can skip them by markers if needed.

Copy link
Contributor

@tvalentyn tvalentyn Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, we typically name dataflow tests as _it_test (integration test) in the file name, in this case I'd suggest we also rename this file to vertex_ai_it_test.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, in this particular case: we already import


try:

  from vertexai.vision_models import Image
  from vertexai.vision_models import Video
  from vertexai.vision_models import VideoSegmentConfig

i would then ask - why do we also need to try to import vertexai -- doesn't a successful import of from vertexai.vision_models import Image already imply that vertexai is importable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay sure i will rename it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh the decorator runs at collection time and only checks if the sdk is there at all so the test file’s imports run when the test runs and are for the specific stuff that test needs soo the skip is one generic check as the test file does its own imports for the APIs it uses

@aIbrahiim aIbrahiim requested a review from tvalentyn March 4, 2026 20:39

@pytest.mark.vertex_ai_postcommit
@unittest.skipIf(
not VERTEX_AI_AVAILABLE, "Vertex AI dependencies not available")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned elsewhere, let's standartize on these checks and not test twice. I would probably stick with trying to import vertex ai as this is a common pattern in many other tests, and for many other dependencies.

def _is_vertex_ai_available() -> bool:
"""Return True if Vertex AI client dependencies are importable."""
try:
import vertexai # pylint: disable=unused-import
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would support this if this check was more involved (like, we need the dependency, check credential, check something else, exclude a scenario where the dependency was installed by someother package in advertently, but we really don't want to run this test, etc).
For a single import it feels like an overhead to add this decorator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we already use that in other places in beam.

@tvalentyn
Copy link
Contributor

changes look good to me. Let's confirm that all affected unit tests pass and also trigger vertex_ai_postcommit test suite and check that it passes.

@aIbrahiim aIbrahiim requested a review from tvalentyn March 5, 2026 01:52
@aIbrahiim
Copy link
Contributor Author

changes look good to me. Let's confirm that all affected unit tests pass and also trigger vertex_ai_postcommit test suite and check that it passes.

okay sure, postcommit dependency passed https://github.com/aIbrahiim/beam/actions/runs/22696450438/job/65803811462 and will trigger postcommit python

@Amar3tto
Copy link
Collaborator

Amar3tto commented Mar 5, 2026

@Amar3tto Amar3tto self-requested a review March 5, 2026 13:56
@tvalentyn
Copy link
Contributor

filed #37779 for anomaly detection test that failed, will rerun.

@tvalentyn
Copy link
Contributor

coverage suite is picking up signal from python 3.13 and is actually passing.

@tvalentyn
Copy link
Contributor

Python PostCommit: apache/beam/actions/runs/22715484309

SKIPPED [1] apache_beam/ml/inference/vertex_ai_inference_it_test.py:38: unittest.case.SkipTest: Vertex AI model handler dependencies are not installed

@tvalentyn
Copy link
Contributor

tvalentyn commented Mar 5, 2026

Python PostCommit: apache/beam/actions/runs/22715484309

SKIPPED [1] apache_beam/ml/inference/vertex_ai_inference_it_test.py:38: unittest.case.SkipTest: Vertex AI model handler dependencies are not installed

Admittedly I haven't looked closer to the logs but we should verify that the vertex tests we moved to run on Dataflow are actually running, and passing.

@aIbrahiim
Copy link
Contributor Author

I just checked the logs and found out that the Vertex AI tests do run in the vertexAIInferenceTest step: 16 passed, 1 failed (see and the failure is test_image_embedding_pipeline_from_path (local file path not available on Dataflow workers) so i think the skip found (‘Vertex AI model handler dependencies are not installed’) is from a different step like postCommitIT that doesnt install Vertex deps so that skip is expected and the job that is supposed to run Vertex tests is vertexAIInferenceTest and there they run and mostly pass
@tvalentyn @Amar3tto

@aIbrahiim
Copy link
Contributor Author

aIbrahiim commented Mar 5, 2026

@tvalentyn
Copy link
Contributor

looks like py310 suite is still stuck in dependency resolution

@aIbrahiim
Copy link
Contributor Author

looks like py310 suite is still stuck in dependency resolution

Where did you find it? @tvalentyn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The PostCommit Python Dependency job is flaky

3 participants