refactor: Make WANDB optional by ko3n1g · Pull Request #2144 · NVIDIA-NeMo/Megatron-Bridge

ko3n1g · 2026-01-30T09:08:33Z

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

Related to # (issue)

Summary by CodeRabbit

Refactor
- Updated performance evaluation system to support optional WandB integration, allowing performance testing to proceed independently.

Signed-off-by: oliver könig <okoenig@nvidia.com>

copy-pr-bot · 2026-01-30T09:08:36Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-02-04T13:38:42Z

📝 Walkthrough

Walkthrough

The changes decouple WandB integration from performance validation by making wandb_run an optional parameter across multiple functions. The setup script now calls convergence/performance calculation unconditionally, while WandB-specific operations are guarded by None checks throughout the evaluation utilities.

Changes

Cohort / File(s)	Summary
Setup script refactoring `scripts/performance/setup_experiment.py`	Made `calc_convergence_and_performance` call unconditional and moved WandB cleanup (finish/teardown) inside a `wandb_run` null check. Removed the else branch that previously set `is_testing_passed = True` when WandB was inactive.
Optional WandB integration `scripts/performance/utils/evaluate.py`	Added `wandb_run: Optional["wandb.Run"] = None` parameter to `validate_convergence`, `validate_performance`, `write_golden_values_to_disk`, and `calc_convergence_and_performance`. All WandB-specific operations (summary updates, artifact logging, metric logging) are now guarded with `wandb_run is not None` checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Results For Major Changes	⚠️ Warning	PR description lacks test results for major refactoring of performance evaluation functionality affecting convergence and metrics calculations.	Update PR description with test results demonstrating backward compatibility, correct functionality without WANDB, and numerical validation showing no regression in convergence metrics.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'refactor: Make WANDB optional' accurately summarizes the main change in the pull request, which is making WANDB dependencies optional across the performance evaluation modules.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ko3n1g/refactor/remove-need-for-wandb-key

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

✨ 136.113.208.247/32 (new)
34.170.211.100/32
35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

scripts/performance/utils/evaluate.py (1)

119-140: ⚠️ Potential issue | 🟠 Major

Update type hints to PEP 604 unions and built-in generics across multiple functions; document wandb_run in docstrings.

The new wandb_run parameter and existing type hints need updates across four functions to align with Python 3.10+ guidelines:

validate_convergence (line 119): Use "wandb.Run" | None and dict[str, Any] instead of Optional["wandb.Run"] and Dict[str, Any]. Also use list[str] for the steps parameter. Add wandb_run to the Args docstring.

validate_performance (line 305): Apply the same typing updates and expand the minimal docstring to include parameter documentation for wandb_run.

write_golden_values_to_disk (line 386): Update Dict[str, Any] to dict[str, Any] and Optional["wandb.Run"] to "wandb.Run" | None. Expand the docstring to document all parameters.

calc_convergence_and_performance (line 404): Update List[str] to list[str], Dict[str, Any] to dict[str, Any], and Optional["wandb.Run"] to "wandb.Run" | None.

Per coding guidelines, use built-in generics and PEP 604 unions for all type hints, and maintain comprehensive Google-style docstrings for all public functions.

🤖 Fix all issues with AI agents

In `@scripts/performance/setup_experiment.py`:
- Around line 462-473: The ImportError guard in
scripts/performance/utils/evaluate.py is too strict: change the logic so
HAVE_WANDB is only required when a wandb_run is actually provided to
calc_convergence_and_performance; specifically, in the top-level check that
currently raises ImportError when HAVE_WANDB is False, relax it to only raise if
wandb_run is not None and HAVE_WANDB is False, and keep all existing in-function
guards (if wandb_run is not None) unchanged so non-W&B calls with wandb_run=None
succeed; update any error message to mention wandb_run being required when
provided.

🧹 Nitpick comments (1)

scripts/performance/utils/evaluate.py (1)
385-390: Add return type hint, update nullable type syntax, and document the wandb_run parameter.

The function should use the T | None syntax for Python 3.10+ and include a return type hint. The docstring should also document the wandb_run parameter per Google style guide conventions.
♻️ Proposed update
-def write_golden_values_to_disk(
-    current_values: Dict[str, Any], golden_values_path: str, wandb_run: Optional["wandb.Run"] = None
-):
+def write_golden_values_to_disk(
+    current_values: Dict[str, Any], golden_values_path: str, wandb_run: "wandb.Run" | None = None
+) -> None:
     """
     Write golden values to a file.
+
+    Args:
+        current_values: Dictionary of current metric values.
+        golden_values_path: Path to write the golden values file.
+        wandb_run: Optional Weights & Biases run object for logging.
     """

coderabbitai · 2026-02-04T13:38:45Z

+            is_testing_passed, error_msg = calc_convergence_and_performance(
+                model_family_name=model_family_name,
+                model_recipe_name=model_recipe_name,
+                assets_dir=os.path.join(job_dir, exp_name),
+                log_paths=log_paths,
+                loss_metric="lm loss",
+                timing_metric="elapsed time per iteration (ms)",
+                golden_values_path=golden_values_path,
+                convergence_config=convergence_params,
+                performance_config=performance_params,
+                wandb_run=wandb_run,
+            )


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check HAVE_WANDB guard in evaluate.py rg -n "HAVE_WANDB" scripts/performance/utils/evaluate.py -A 2 -B 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 347

🏁 Script executed:

#!/bin/bash # Find and examine calc_convergence_and_performance function rg -n "def calc_convergence_and_performance" scripts/performance/utils/ -A 5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 485

🏁 Script executed:

#!/bin/bash # Check the current setup_experiment.py code around lines 462-473 cat -n scripts/performance/setup_experiment.py | sed -n '460,475p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 837

🏁 Script executed:

#!/bin/bash # Check broader context around line 462 to see if call is guarded cat -n scripts/performance/setup_experiment.py | sed -n '440,480p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2012

🏁 Script executed:

#!/bin/bash # Check full function signature of calc_convergence_and_performance cat -n scripts/performance/utils/evaluate.py | sed -n '404,425p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1066

🏁 Script executed:

#!/bin/bash # Check what happens at the HAVE_WANDB guard location cat -n scripts/performance/utils/evaluate.py | sed -n '430,450p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1320

🏁 Script executed:

#!/bin/bash # Check how wandb_run is used in the function rg -n "wandb_run" scripts/performance/utils/evaluate.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1196

🏁 Script executed:

#!/bin/bash # Check if there are any other wandb references in the function besides the guard cat -n scripts/performance/utils/evaluate.py | sed -n '436,550p' | rg "wandb"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 428

Unconditional W&B check prevents non-W&B runs despite optional parameter.

The guard at line 436 in scripts/performance/utils/evaluate.py raises an ImportError unconditionally if HAVE_WANDB is False. However, wandb_run is declared as optional with a default of None, and all W&B operations in the function are guarded by if wandb_run is not None. Since setup_experiment.py passes wandb_run=None when W&B is not available, the function can execute without W&B. Relax the guard to only require W&B when wandb_run is actually provided:

Fix in scripts/performance/utils/evaluate.py

- if not HAVE_WANDB: - raise ImportError("wandb is required for calculating perf and convergence metrics") + if wandb_run is not None and not HAVE_WANDB: + raise ImportError("wandb is required for calculating perf and convergence metrics")

🤖 Prompt for AI Agents

In `@scripts/performance/setup_experiment.py` around lines 462 - 473, The ImportError guard in scripts/performance/utils/evaluate.py is too strict: change the logic so HAVE_WANDB is only required when a wandb_run is actually provided to calc_convergence_and_performance; specifically, in the top-level check that currently raises ImportError when HAVE_WANDB is False, relax it to only raise if wandb_run is not None and HAVE_WANDB is False, and keep all existing in-function guards (if wandb_run is not None) unchanged so non-W&B calls with wandb_run=None succeed; update any error message to mention wandb_run being required when provided.

Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>

refactor: Make WANDB optional

435d286

Signed-off-by: oliver könig <okoenig@nvidia.com>

ko3n1g marked this pull request as ready for review February 4, 2026 13:31

ko3n1g requested review from a team, erhoo82 and malay-nagda as code owners February 4, 2026 13:31

ko3n1g enabled auto-merge (squash) February 4, 2026 13:32

copy-pr-bot Bot temporarily deployed to nemo-ci February 4, 2026 13:32 Inactive

copy-pr-bot Bot temporarily deployed to test February 4, 2026 13:32 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 4, 2026 13:37 Inactive

coderabbitai Bot reviewed Feb 4, 2026

View reviewed changes

malay-nagda approved these changes Feb 4, 2026

View reviewed changes

copy-pr-bot Bot temporarily deployed to nemo-ci February 4, 2026 13:44 Inactive

copy-pr-bot Bot temporarily deployed to nemo-ci February 4, 2026 13:55 Inactive

copy-pr-bot Bot had a problem deploying to nemo-ci February 4, 2026 13:55 Failure

copy-pr-bot Bot requested a deployment to nemo-ci February 4, 2026 14:40 Abandoned

copy-pr-bot Bot temporarily deployed to nemo-ci February 4, 2026 14:40 Inactive

ko3n1g disabled auto-merge February 4, 2026 14:42

ko3n1g enabled auto-merge (squash) February 4, 2026 14:42

ko3n1g merged commit 02364d5 into main Feb 4, 2026
82 of 85 checks passed

ko3n1g deleted the ko3n1g/refactor/remove-need-for-wandb-key branch February 4, 2026 14:58

sowmen pushed a commit to sowmen/Megatron-Bridge that referenced this pull request Feb 11, 2026

refactor: Make WANDB optional (NVIDIA-NeMo#2144)

9d30be2

Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>

coderabbitai Bot mentioned this pull request Feb 14, 2026

ci: track memory #2386

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Make WANDB optional#2144

refactor: Make WANDB optional#2144
ko3n1g merged 1 commit intomainfrom
ko3n1g/refactor/remove-need-for-wandb-key

ko3n1g commented Jan 30, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

copy-pr-bot Bot commented Jan 30, 2026

Uh oh!

coderabbitai Bot commented Feb 4, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ko3n1g commented Jan 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do ?

Changelog

GitHub Actions CI

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot Bot commented Jan 30, 2026

Uh oh!

coderabbitai Bot commented Feb 4, 2026

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ko3n1g commented Jan 30, 2026 •

edited by coderabbitai Bot

Loading