Fix Python PostCommit Dependency by aIbrahiim · Pull Request #37725 · apache/beam

aIbrahiim · 2026-02-27T22:30:03Z

Fixes: #30799
Successful Run: https://github.com/aIbrahiim/beam/actions/runs/22507665882
Fix Python PostCommit Dependency by excluding Vertex AI tests from the embeddings suite and updating pyarrow/py310 transformers config.

Exclude vertex_ai_postcommit tests from py310-embeddings and TFHub embeddings (run in PostCommit Python instead).
Add @pytest.mark.vertex_ai_postcommit to Vertex AI embedding tests.
Adjust pyarrow and transformers tox envs for py310 dep suite.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

gemini-code-assist · 2026-02-27T22:30:21Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses issues with Python post-commit dependency checks by updating the pyarrow versions tested in the py310 environment and refining the tox.ini configuration for transformers test environments. These changes ensure that the CI system accurately reflects current dependency requirements and improves the reliability of Python SDK tests.

Highlights

Python PyArrow Dependency Updates: Updated the py310 post-commit dependency tests to include pyarrow-6 and pyarrow-19 through pyarrow-23, while removing older versions pyarrow-9 through pyarrow-14.
Transformers Test Environment Refinement: Modified the tox.ini configuration for py{310,311}-transformers test environments by setting pip_pre = False, simplifying extras to test, and correcting the dependency definition for transformers version 448.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

sdks/python/test-suites/tox/py310/build.gradle
- Removed testPy310pyarrow-9 through testPy310pyarrow-14 tasks and their dependencies from postCommitPyDep.
- Added testPy310pyarrow-6 task and its dependencies to postCommitPyDep.
- Added testPy310pyarrow-19 through testPy310pyarrow-23 tasks and their dependencies to postCommitPyDep.
sdks/python/tox.ini
- Introduced pip_pre = False to the py{310,311}-transformers test environment.
- Changed extras from test,gcp,ml_test to test in the py{310,311}-transformers test environment.
- Corrected the DEPS definition for 448 in the py{310,311}-transformers test environment, replacing a 455 entry.

Activity

No specific activity (comments, reviews, progress updates) was found in the provided context.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

aIbrahiim · 2026-03-02T21:04:38Z

cc @kennknowles @tvalentyn

github-actions · 2026-03-02T22:11:31Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

tvalentyn · 2026-03-02T22:56:37Z

sdks/python/tox.ini

  /bin/sh -c "pip freeze | grep -E transformers"
  # Allow exit code 5 (no tests run) so that we can run this command safely on arbitrary subdirectories.
-  /bin/sh -c 'pytest apache_beam/ml/transforms/embeddings -o junit_suite_name={envname} --junitxml=pytest_{envname}.xml -n 6 {posargs}; ret=$?; [ $ret = 5 ] && exit 0 || exit $ret'
+  /bin/sh -c 'pytest apache_beam/ml/transforms/embeddings -o junit_suite_name={envname} --junitxml=pytest_{envname}.xml -n 6 -m "not vertex_ai_postcommit" {posargs}; ret=$?; [ $ret = 5 ] && exit 0 || exit $ret'


i don't like that we might have to remember to exclude some tests here; can we think of an approach where tests are not run (for example, test is skipped because it detects that a dep is not installed, or test is placed in a folder that is not picked up by this suite), rather than having to exclude certain tests?

sdks/python/test-suites/tox/py310/build.gradle

github-actions · 2026-03-03T14:21:31Z

Assigning reviewers:

R: @claudevdm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

tvalentyn · 2026-03-03T21:58:46Z

sdks/python/test-suites/dataflow/common.gradle

      exec {
        executable 'sh'
-        args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs"
+        args '-c', ". ${envdir}/bin/activate && RUN_VERTEX_AI_TESTS=1 ${runScriptsDir}/run_integration_test.sh $cmdArgs"


what does this do?

previously this RUN_VERTEX_AI_TESTS was read by a pytest_collection_modifyitems hook in sdks/python/conftest.py which used it to skip or run tests marked vertex_ai_postcommit and in this PR i removed that hook and vertex AI tests now decide whether to run based only on whether their dependencies can be imported as a result nothing reads RUN_VERTEX_AI_TESTS anymore so setting RUN_VERTEX_AI_TESTS=1 in the gradle task no longer changes behavior and is effectively a no op, it is harmless but a bit confusing so I will follow up with a small cleanup to remove that env var from the gradle config if possible

I don't quite follow this - i've searched RUN_VERTEX_AI_TESTS and it doesn't come up anywhere in the codebase. Do you have any pointers?

Ahh corrent as it's already dead code from the refactor, i will remove it

tvalentyn · 2026-03-03T21:59:23Z

sdks/python/tox.ini

    428: DEPS = sentence-transformers==2.2.2 'transformers>=4.28.0,<4.29.0' 'torch>=1.9.0,<1.14.0'
    447: DEPS = 'transformers>=4.47.0,<4.48.0' 'torch>=1.9.0,<1.14.0'
-    455: DEPS = 'transformers>=4.55.0,<4.56.0' 'torch>=2.0.0,<2.1.0'
+    448: DEPS = 'transformers>=4.48.0,<4.49.0' 'torch>=1.9.0,<1.14.0'


are we no longer interested in testing newer versions ?

those bounds describe the supported window for this tox environment today rather than the absolute latest possible versions

the point of this dependency workflow is to run test scenarios using multiple versions of a dependency. We only do this for dependencies we care about and for versions we care about.

For example, for pyarrow, we have an extensive test suite that exercises every version. It is an outlier. In the vast majority of tests, we only test the lastest supported versions of a dependency, in some cases we test the earliest version we support, and the latest version we support.

Looking at this change it is not obvious to me why we making a change to test transformers 4.48.x instead of transformers 4.55.x.

Looking at

beam/sdks/python/container/ml/py312/gpu_image_requirements.txt

Line 290 in e7a97c2

transformers==4.55.4

we do use transformers 4.55.x

Hence I asked the question about the motivation of your change since it isn't obvious.

i looked some more, looks like there is 'latest' version spec on line 526. I am comfortable with testing only 4.28, 4.47 and latest . i doubt adding 4.48 would be valuable.

i suggest we verify that we have a suite that runs with 'latest' version spec for transformer suite, and remove 4.48 and 4.55 suites.

Ahh sounds good so i will keep only 428, 447, and latest for the transformers suite and remove the 448 (and 4.55-only) env from tox and drop the corresponding gradle tasks and will doube check that the latest suite runs with the current config

tvalentyn · 2026-03-03T22:01:27Z

sdks/python/tox.ini

 deps =
  # Environment dependencies are defined in the `setenv` section and installed in the `commands` section.
-extras = test,gcp,ml_test
+pip_pre = False


what happens if we don't do this?

it keeps tox from installing pre release versions of dependencies which tends to make the test environments more stable

that is true; but the point of using --pre was to test upcoming releases to have a heads-up when things might start breaking before things actually break.

are test environments unstable because pip cannot resolve dependencies or for some other reasons (pre-release deps have bugs, etc.)?

if using --pre becomes too much of a hassle for us, its ok to disable, just wanted to understand the reasons.

yes mostly it was pip having trouble resolving things when pre releases happening as i saw installs fail or get flaky so i turned it off to make CI more stable

tvalentyn · 2026-03-03T22:10:26Z

sdks/python/apache_beam/ml/rag/embeddings/vertex_ai_test.py

      all(isinstance(x, float) for x in actual.embedding.dense_embedding))


+@skip_if_vertex_ai_disabled


we already try to import vertext ai in line 41 -- what does adding this decorator change?

ahh right that i already gate on VERTEX_AI_AVAILABLE via the local unittest.skipIf, so the decorator doesnt add new behavior there it was meant to align all vertex AI tests on a shared helper but in this specific file it is redundant so i will drop @skip_if_vertex_ai_disabled here

tvalentyn · 2026-03-03T22:12:00Z

sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai_test.py

 from apache_beam.ml.inference.base import RunInference
 from apache_beam.ml.transforms import base
 from apache_beam.ml.transforms.base import MLTransform
+from apache_beam.testing.vertex_ai_skip import skip_if_vertex_ai_disabled


if this decorator is only about one line import, why not to add this import around line 40?

i created skip_if_vertex_ai_disabled as a check for if vertex AI dependencies available in one place (vertex_ai_skip.py) so i dont have to repeat similar import and try/except checks in every vertex test file but the local imports around line 40 are still needed for the test logic itself but the decorator let us keep the skip condition shared

It reads very strange seeing both

@skip_if_vertex_ai_disabled @unittest.skipIf( VertexAITextEmbeddings is None, 'Vertex AI Python SDK is not installed.')

A reader of this code would likely think, why both checks are necessary. We should find a way to check this once.

tvalentyn · 2026-03-03T22:26:06Z

sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai_it_test.py

@@ -58,6 +61,8 @@
 model_name: str = "text-embedding-005"


+@skip_if_vertex_ai_disabled
+@pytest.mark.vertex_ai_postcommit


Note that adding
vertex_ai_postcommit would make these tests run on Dataflow, instead of Beam direct runner:

beam/sdks/python/test-suites/dataflow/common.gradle

Line 500 in 3197d88

"collect": "vertex_ai_postcommit" ,

Was that your intent?

yes that is intentional as these vertex AI tests are integration tests and kept it to run in the dataflow vertexAIInferenceTest suite rather than the standard DirectRunner unit test jobs and the vertex_ai_postcommit marker makes sure the dataflow gradle task collects them and at the same time local runs can skip them by markers if needed.

ok, we typically name dataflow tests as _it_test (integration test) in the file name, in this case I'd suggest we also rename this file to vertex_ai_it_test.py

for example, in this particular case: we already import

try: from vertexai.vision_models import Image from vertexai.vision_models import Video from vertexai.vision_models import VideoSegmentConfig

i would then ask - why do we also need to try to import vertexai -- doesn't a successful import of from vertexai.vision_models import Image already imply that vertexai is importable?

okay sure i will rename it

ahh the decorator runs at collection time and only checks if the sdk is there at all so the test file’s imports run when the test runs and are for the specific stuff that test needs soo the skip is one generic check as the test file does its own imports for the APIs it uses

tvalentyn · 2026-03-04T23:50:13Z

sdks/python/apache_beam/ml/rag/embeddings/vertex_ai_test.py


+@pytest.mark.vertex_ai_postcommit
 @unittest.skipIf(
    not VERTEX_AI_AVAILABLE, "Vertex AI dependencies not available")


as mentioned elsewhere, let's standartize on these checks and not test twice. I would probably stick with trying to import vertex ai as this is a common pattern in many other tests, and for many other dependencies.

tvalentyn · 2026-03-04T23:55:48Z

sdks/python/apache_beam/testing/vertex_ai_skip.py

+def _is_vertex_ai_available() -> bool:
+  """Return True if Vertex AI client dependencies are importable."""
+  try:
+    import vertexai  # pylint: disable=unused-import


i would support this if this check was more involved (like, we need the dependency, check credential, check something else, exclude a scenario where the dependency was installed by someother package in advertently, but we really don't want to run this test, etc).
For a single import it feels like an overhead to add this decorator

I wonder if we can use https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-importorskip ?

looks like we already use that in other places in beam.

…elper

tvalentyn · 2026-03-05T01:14:34Z

changes look good to me. Let's confirm that all affected unit tests pass and also trigger vertex_ai_postcommit test suite and check that it passes.

aIbrahiim · 2026-03-05T08:12:44Z

changes look good to me. Let's confirm that all affected unit tests pass and also trigger vertex_ai_postcommit test suite and check that it passes.

okay sure, postcommit dependency passed https://github.com/aIbrahiim/beam/actions/runs/22696450438/job/65803811462 and will trigger postcommit python

Amar3tto · 2026-03-05T13:55:41Z

Python PostCommit: https://github.com/apache/beam/actions/runs/22715484309

tvalentyn · 2026-03-05T17:35:18Z

filed #37779 for anomaly detection test that failed, will rerun.

tvalentyn · 2026-03-05T17:36:37Z

coverage suite is picking up signal from python 3.13 and is actually passing.

tvalentyn · 2026-03-05T17:38:18Z

Python PostCommit: apache/beam/actions/runs/22715484309

SKIPPED [1] apache_beam/ml/inference/vertex_ai_inference_it_test.py:38: unittest.case.SkipTest: Vertex AI model handler dependencies are not installed

tvalentyn · 2026-03-05T17:40:37Z

Python PostCommit: apache/beam/actions/runs/22715484309

SKIPPED [1] apache_beam/ml/inference/vertex_ai_inference_it_test.py:38: unittest.case.SkipTest: Vertex AI model handler dependencies are not installed

Admittedly I haven't looked closer to the logs but we should verify that the vertex tests we moved to run on Dataflow are actually running, and passing.

aIbrahiim · 2026-03-05T19:25:56Z

I just checked the logs and found out that the Vertex AI tests do run in the vertexAIInferenceTest step: 16 passed, 1 failed (see and the failure is test_image_embedding_pipeline_from_path (local file path not available on Dataflow workers) so i think the skip found (‘Vertex AI model handler dependencies are not installed’) is from a different step like postCommitIT that doesnt install Vertex deps so that skip is expected and the job that is supposed to run Vertex tests is vertexAIInferenceTest and there they run and mostly pass
@tvalentyn @Amar3tto

aIbrahiim · 2026-03-05T19:41:58Z

Full job logs:
https://github.com/apache/beam/actions/runs/22715484309/job/65890648024

tvalentyn · 2026-03-05T22:52:14Z

looks like py310 suite is still stuck in dependency resolution

aIbrahiim · 2026-03-05T23:11:12Z

looks like py310 suite is still stuck in dependency resolution

Where did you find it? @tvalentyn

github-actions bot added the python label Feb 27, 2026

Exclude vertex_ai_postcommit tests from embeddings dep suite

590684f

aIbrahiim force-pushed the fix-python-postcommit-dependency-1 branch from bcb737e to 590684f Compare March 2, 2026 20:59

aIbrahiim marked this pull request as ready for review March 2, 2026 21:03

tvalentyn reviewed Mar 2, 2026

View reviewed changes

sdks/python/test-suites/tox/py310/build.gradle Show resolved Hide resolved

centralize skipping vertex ai test

417922c

github-actions bot added the Next Action: Reviewers label Mar 3, 2026

aIbrahiim requested a review from tvalentyn March 3, 2026 15:04

Fix pylint vertex ai test

81fe00c

tvalentyn reviewed Mar 3, 2026

View reviewed changes

remove RUN_VERTEX_AI_TESTS flag

d64c214

aIbrahiim requested a review from tvalentyn March 4, 2026 20:39

tvalentyn reviewed Mar 4, 2026

View reviewed changes

Use pytest.importorskip for Vertex AI tests & remove vertex_ai_skip h…

0f805b5

…elper

aIbrahiim requested a review from tvalentyn March 5, 2026 01:52

Amar3tto self-requested a review March 5, 2026 13:56

Amar3tto approved these changes Mar 5, 2026

View reviewed changes

Stabilize py310 ml tox env and disabling pre release deps

cd821e9

		all(isinstance(x, float) for x in actual.embedding.dense_embedding))


		@skip_if_vertex_ai_disabled

Conversation

aIbrahiim commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

gemini-code-assist bot commented Feb 27, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

aIbrahiim commented Mar 2, 2026

Uh oh!

github-actions bot commented Mar 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvalentyn Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvalentyn Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tvalentyn Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

aIbrahiim commented Feb 27, 2026 •

edited

Loading

tvalentyn Mar 4, 2026 •

edited

Loading

tvalentyn Mar 4, 2026 •

edited

Loading

tvalentyn Mar 4, 2026 •

edited

Loading

tvalentyn commented Mar 5, 2026 •

edited

Loading

aIbrahiim commented Mar 5, 2026 •

edited

Loading