Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses issues with Python post-commit dependency checks by updating the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
bcb737e to
590684f
Compare
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
sdks/python/tox.ini
Outdated
| /bin/sh -c "pip freeze | grep -E transformers" | ||
| # Allow exit code 5 (no tests run) so that we can run this command safely on arbitrary subdirectories. | ||
| /bin/sh -c 'pytest apache_beam/ml/transforms/embeddings -o junit_suite_name={envname} --junitxml=pytest_{envname}.xml -n 6 {posargs}; ret=$?; [ $ret = 5 ] && exit 0 || exit $ret' | ||
| /bin/sh -c 'pytest apache_beam/ml/transforms/embeddings -o junit_suite_name={envname} --junitxml=pytest_{envname}.xml -n 6 -m "not vertex_ai_postcommit" {posargs}; ret=$?; [ $ret = 5 ] && exit 0 || exit $ret' |
There was a problem hiding this comment.
i don't like that we might have to remember to exclude some tests here; can we think of an approach where tests are not run (for example, test is skipped because it detects that a dep is not installed, or test is placed in a folder that is not picked up by this suite), rather than having to exclude certain tests?
|
Assigning reviewers: R: @claudevdm for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| exec { | ||
| executable 'sh' | ||
| args '-c', ". ${envdir}/bin/activate && ${runScriptsDir}/run_integration_test.sh $cmdArgs" | ||
| args '-c', ". ${envdir}/bin/activate && RUN_VERTEX_AI_TESTS=1 ${runScriptsDir}/run_integration_test.sh $cmdArgs" |
There was a problem hiding this comment.
previously this RUN_VERTEX_AI_TESTS was read by a pytest_collection_modifyitems hook in sdks/python/conftest.py which used it to skip or run tests marked vertex_ai_postcommit and in this PR i removed that hook and vertex AI tests now decide whether to run based only on whether their dependencies can be imported as a result nothing reads RUN_VERTEX_AI_TESTS anymore so setting RUN_VERTEX_AI_TESTS=1 in the gradle task no longer changes behavior and is effectively a no op, it is harmless but a bit confusing so I will follow up with a small cleanup to remove that env var from the gradle config if possible
There was a problem hiding this comment.
I don't quite follow this - i've searched RUN_VERTEX_AI_TESTS and it doesn't come up anywhere in the codebase. Do you have any pointers?
There was a problem hiding this comment.
Ahh corrent as it's already dead code from the refactor, i will remove it
sdks/python/tox.ini
Outdated
| 428: DEPS = sentence-transformers==2.2.2 'transformers>=4.28.0,<4.29.0' 'torch>=1.9.0,<1.14.0' | ||
| 447: DEPS = 'transformers>=4.47.0,<4.48.0' 'torch>=1.9.0,<1.14.0' | ||
| 455: DEPS = 'transformers>=4.55.0,<4.56.0' 'torch>=2.0.0,<2.1.0' | ||
| 448: DEPS = 'transformers>=4.48.0,<4.49.0' 'torch>=1.9.0,<1.14.0' |
There was a problem hiding this comment.
are we no longer interested in testing newer versions ?
There was a problem hiding this comment.
those bounds describe the supported window for this tox environment today rather than the absolute latest possible versions
There was a problem hiding this comment.
the point of this dependency workflow is to run test scenarios using multiple versions of a dependency. We only do this for dependencies we care about and for versions we care about.
For example, for pyarrow, we have an extensive test suite that exercises every version. It is an outlier. In the vast majority of tests, we only test the lastest supported versions of a dependency, in some cases we test the earliest version we support, and the latest version we support.
Looking at this change it is not obvious to me why we making a change to test transformers 4.48.x instead of transformers 4.55.x.
Looking at
we do use transformers 4.55.xHence I asked the question about the motivation of your change since it isn't obvious.
There was a problem hiding this comment.
i looked some more, looks like there is 'latest' version spec on line 526. I am comfortable with testing only 4.28, 4.47 and latest . i doubt adding 4.48 would be valuable.
There was a problem hiding this comment.
i suggest we verify that we have a suite that runs with 'latest' version spec for transformer suite, and remove 4.48 and 4.55 suites.
There was a problem hiding this comment.
Ahh sounds good so i will keep only 428, 447, and latest for the transformers suite and remove the 448 (and 4.55-only) env from tox and drop the corresponding gradle tasks and will doube check that the latest suite runs with the current config
| deps = | ||
| # Environment dependencies are defined in the `setenv` section and installed in the `commands` section. | ||
| extras = test,gcp,ml_test | ||
| pip_pre = False |
There was a problem hiding this comment.
what happens if we don't do this?
There was a problem hiding this comment.
it keeps tox from installing pre release versions of dependencies which tends to make the test environments more stable
There was a problem hiding this comment.
that is true; but the point of using --pre was to test upcoming releases to have a heads-up when things might start breaking before things actually break.
are test environments unstable because pip cannot resolve dependencies or for some other reasons (pre-release deps have bugs, etc.)?
There was a problem hiding this comment.
if using --pre becomes too much of a hassle for us, its ok to disable, just wanted to understand the reasons.
There was a problem hiding this comment.
yes mostly it was pip having trouble resolving things when pre releases happening as i saw installs fail or get flaky so i turned it off to make CI more stable
| all(isinstance(x, float) for x in actual.embedding.dense_embedding)) | ||
|
|
||
|
|
||
| @skip_if_vertex_ai_disabled |
There was a problem hiding this comment.
we already try to import vertext ai in line 41 -- what does adding this decorator change?
There was a problem hiding this comment.
ahh right that i already gate on VERTEX_AI_AVAILABLE via the local unittest.skipIf, so the decorator doesnt add new behavior there it was meant to align all vertex AI tests on a shared helper but in this specific file it is redundant so i will drop @skip_if_vertex_ai_disabled here
| from apache_beam.ml.inference.base import RunInference | ||
| from apache_beam.ml.transforms import base | ||
| from apache_beam.ml.transforms.base import MLTransform | ||
| from apache_beam.testing.vertex_ai_skip import skip_if_vertex_ai_disabled |
There was a problem hiding this comment.
if this decorator is only about one line import, why not to add this import around line 40?
There was a problem hiding this comment.
i created skip_if_vertex_ai_disabled as a check for if vertex AI dependencies available in one place (vertex_ai_skip.py) so i dont have to repeat similar import and try/except checks in every vertex test file but the local imports around line 40 are still needed for the test logic itself but the decorator let us keep the skip condition shared
There was a problem hiding this comment.
It reads very strange seeing both
@skip_if_vertex_ai_disabled
@unittest.skipIf(
VertexAITextEmbeddings is None, 'Vertex AI Python SDK is not installed.')
A reader of this code would likely think, why both checks are necessary. We should find a way to check this once.
| @@ -58,6 +61,8 @@ | |||
| model_name: str = "text-embedding-005" | |||
|
|
|||
|
|
|||
| @skip_if_vertex_ai_disabled | |||
| @pytest.mark.vertex_ai_postcommit | |||
There was a problem hiding this comment.
Note that adding
vertex_ai_postcommit would make these tests run on Dataflow, instead of Beam direct runner:
Was that your intent?
There was a problem hiding this comment.
yes that is intentional as these vertex AI tests are integration tests and kept it to run in the dataflow vertexAIInferenceTest suite rather than the standard DirectRunner unit test jobs and the vertex_ai_postcommit marker makes sure the dataflow gradle task collects them and at the same time local runs can skip them by markers if needed.
There was a problem hiding this comment.
ok, we typically name dataflow tests as _it_test (integration test) in the file name, in this case I'd suggest we also rename this file to vertex_ai_it_test.py
There was a problem hiding this comment.
for example, in this particular case: we already import
try:
from vertexai.vision_models import Image
from vertexai.vision_models import Video
from vertexai.vision_models import VideoSegmentConfig
i would then ask - why do we also need to try to import vertexai -- doesn't a successful import of from vertexai.vision_models import Image already imply that vertexai is importable?
There was a problem hiding this comment.
okay sure i will rename it
There was a problem hiding this comment.
ahh the decorator runs at collection time and only checks if the sdk is there at all so the test file’s imports run when the test runs and are for the specific stuff that test needs soo the skip is one generic check as the test file does its own imports for the APIs it uses
|
|
||
| @pytest.mark.vertex_ai_postcommit | ||
| @unittest.skipIf( | ||
| not VERTEX_AI_AVAILABLE, "Vertex AI dependencies not available") |
There was a problem hiding this comment.
as mentioned elsewhere, let's standartize on these checks and not test twice. I would probably stick with trying to import vertex ai as this is a common pattern in many other tests, and for many other dependencies.
| def _is_vertex_ai_available() -> bool: | ||
| """Return True if Vertex AI client dependencies are importable.""" | ||
| try: | ||
| import vertexai # pylint: disable=unused-import |
There was a problem hiding this comment.
i would support this if this check was more involved (like, we need the dependency, check credential, check something else, exclude a scenario where the dependency was installed by someother package in advertently, but we really don't want to run this test, etc).
For a single import it feels like an overhead to add this decorator
There was a problem hiding this comment.
I wonder if we can use https://docs.pytest.org/en/7.1.x/reference/reference.html#pytest-importorskip ?
There was a problem hiding this comment.
looks like we already use that in other places in beam.
|
changes look good to me. Let's confirm that all affected unit tests pass and also trigger vertex_ai_postcommit test suite and check that it passes. |
okay sure, postcommit dependency passed https://github.com/aIbrahiim/beam/actions/runs/22696450438/job/65803811462 and will trigger postcommit python |
|
Python PostCommit: https://github.com/apache/beam/actions/runs/22715484309 |
|
filed #37779 for anomaly detection test that failed, will rerun. |
|
coverage suite is picking up signal from python 3.13 and is actually passing. |
SKIPPED [1] apache_beam/ml/inference/vertex_ai_inference_it_test.py:38: unittest.case.SkipTest: Vertex AI model handler dependencies are not installed |
Admittedly I haven't looked closer to the logs but we should verify that the vertex tests we moved to run on Dataflow are actually running, and passing. |
|
I just checked the logs and found out that the Vertex AI tests do run in the vertexAIInferenceTest step: 16 passed, 1 failed (see and the failure is test_image_embedding_pipeline_from_path (local file path not available on Dataflow workers) so i think the skip found (‘Vertex AI model handler dependencies are not installed’) is from a different step like postCommitIT that doesnt install Vertex deps so that skip is expected and the job that is supposed to run Vertex tests is vertexAIInferenceTest and there they run and mostly pass |
|
looks like py310 suite is still stuck in dependency resolution |
Where did you find it? @tvalentyn |

Fixes: #30799
Successful Run: https://github.com/aIbrahiim/beam/actions/runs/22507665882
Fix Python PostCommit Dependency by excluding Vertex AI tests from the embeddings suite and updating pyarrow/py310 transformers config.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.