Skip to content

refactor: Make WANDB optional#2144

Merged
ko3n1g merged 1 commit intomainfrom
ko3n1g/refactor/remove-need-for-wandb-key
Feb 4, 2026
Merged

refactor: Make WANDB optional#2144
ko3n1g merged 1 commit intomainfrom
ko3n1g/refactor/remove-need-for-wandb-key

Conversation

@ko3n1g
Copy link
Copy Markdown
Contributor

@ko3n1g ko3n1g commented Jan 30, 2026

What does this PR do ?

Add a one line overview of what this PR aims to accomplish.

Changelog

  • Add specific line by line info of high level changes in this PR.

GitHub Actions CI

See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

If you haven't finished some of the above items you can still open "Draft" PR.

Additional Information

  • Related to # (issue)

Summary by CodeRabbit

  • Refactor
    • Updated performance evaluation system to support optional WandB integration, allowing performance testing to proceed independently.

Signed-off-by: oliver könig <okoenig@nvidia.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jan 30, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@ko3n1g ko3n1g marked this pull request as ready for review February 4, 2026 13:31
@ko3n1g ko3n1g requested review from a team, erhoo82 and malay-nagda as code owners February 4, 2026 13:31
@ko3n1g ko3n1g enabled auto-merge (squash) February 4, 2026 13:32
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Feb 4, 2026

📝 Walkthrough

Walkthrough

The changes decouple WandB integration from performance validation by making wandb_run an optional parameter across multiple functions. The setup script now calls convergence/performance calculation unconditionally, while WandB-specific operations are guarded by None checks throughout the evaluation utilities.

Changes

Cohort / File(s) Summary
Setup script refactoring
scripts/performance/setup_experiment.py
Made calc_convergence_and_performance call unconditional and moved WandB cleanup (finish/teardown) inside a wandb_run null check. Removed the else branch that previously set is_testing_passed = True when WandB was inactive.
Optional WandB integration
scripts/performance/utils/evaluate.py
Added wandb_run: Optional["wandb.Run"] = None parameter to validate_convergence, validate_performance, write_golden_values_to_disk, and calc_convergence_and_performance. All WandB-specific operations (summary updates, artifact logging, metric logging) are now guarded with wandb_run is not None checks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Test Results For Major Changes ⚠️ Warning PR description lacks test results for major refactoring of performance evaluation functionality affecting convergence and metrics calculations. Update PR description with test results demonstrating backward compatibility, correct functionality without WANDB, and numerical validation showing no regression in convergence metrics.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor: Make WANDB optional' accurately summarizes the main change in the pull request, which is making WANDB dependencies optional across the performance evaluation modules.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ko3n1g/refactor/remove-need-for-wandb-key

Important

Action Needed: IP Allowlist Update

If your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:

  • 136.113.208.247/32 (new)
  • 34.170.211.100/32
  • 35.222.179.152/32

Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
scripts/performance/utils/evaluate.py (1)

119-140: ⚠️ Potential issue | 🟠 Major

Update type hints to PEP 604 unions and built-in generics across multiple functions; document wandb_run in docstrings.

The new wandb_run parameter and existing type hints need updates across four functions to align with Python 3.10+ guidelines:

  1. validate_convergence (line 119): Use "wandb.Run" | None and dict[str, Any] instead of Optional["wandb.Run"] and Dict[str, Any]. Also use list[str] for the steps parameter. Add wandb_run to the Args docstring.

  2. validate_performance (line 305): Apply the same typing updates and expand the minimal docstring to include parameter documentation for wandb_run.

  3. write_golden_values_to_disk (line 386): Update Dict[str, Any] to dict[str, Any] and Optional["wandb.Run"] to "wandb.Run" | None. Expand the docstring to document all parameters.

  4. calc_convergence_and_performance (line 404): Update List[str] to list[str], Dict[str, Any] to dict[str, Any], and Optional["wandb.Run"] to "wandb.Run" | None.

Per coding guidelines, use built-in generics and PEP 604 unions for all type hints, and maintain comprehensive Google-style docstrings for all public functions.

🤖 Fix all issues with AI agents
In `@scripts/performance/setup_experiment.py`:
- Around line 462-473: The ImportError guard in
scripts/performance/utils/evaluate.py is too strict: change the logic so
HAVE_WANDB is only required when a wandb_run is actually provided to
calc_convergence_and_performance; specifically, in the top-level check that
currently raises ImportError when HAVE_WANDB is False, relax it to only raise if
wandb_run is not None and HAVE_WANDB is False, and keep all existing in-function
guards (if wandb_run is not None) unchanged so non-W&B calls with wandb_run=None
succeed; update any error message to mention wandb_run being required when
provided.
🧹 Nitpick comments (1)
scripts/performance/utils/evaluate.py (1)

385-390: Add return type hint, update nullable type syntax, and document the wandb_run parameter.

The function should use the T | None syntax for Python 3.10+ and include a return type hint. The docstring should also document the wandb_run parameter per Google style guide conventions.

♻️ Proposed update
-def write_golden_values_to_disk(
-    current_values: Dict[str, Any], golden_values_path: str, wandb_run: Optional["wandb.Run"] = None
-):
+def write_golden_values_to_disk(
+    current_values: Dict[str, Any], golden_values_path: str, wandb_run: "wandb.Run" | None = None
+) -> None:
     """
     Write golden values to a file.
+
+    Args:
+        current_values: Dictionary of current metric values.
+        golden_values_path: Path to write the golden values file.
+        wandb_run: Optional Weights & Biases run object for logging.
     """

Comment on lines +462 to +473
is_testing_passed, error_msg = calc_convergence_and_performance(
model_family_name=model_family_name,
model_recipe_name=model_recipe_name,
assets_dir=os.path.join(job_dir, exp_name),
log_paths=log_paths,
loss_metric="lm loss",
timing_metric="elapsed time per iteration (ms)",
golden_values_path=golden_values_path,
convergence_config=convergence_params,
performance_config=performance_params,
wandb_run=wandb_run,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check HAVE_WANDB guard in evaluate.py
rg -n "HAVE_WANDB" scripts/performance/utils/evaluate.py -A 2 -B 2

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 347


🏁 Script executed:

#!/bin/bash
# Find and examine calc_convergence_and_performance function
rg -n "def calc_convergence_and_performance" scripts/performance/utils/ -A 5

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 485


🏁 Script executed:

#!/bin/bash
# Check the current setup_experiment.py code around lines 462-473
cat -n scripts/performance/setup_experiment.py | sed -n '460,475p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 837


🏁 Script executed:

#!/bin/bash
# Check broader context around line 462 to see if call is guarded
cat -n scripts/performance/setup_experiment.py | sed -n '440,480p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 2012


🏁 Script executed:

#!/bin/bash
# Check full function signature of calc_convergence_and_performance
cat -n scripts/performance/utils/evaluate.py | sed -n '404,425p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1066


🏁 Script executed:

#!/bin/bash
# Check what happens at the HAVE_WANDB guard location
cat -n scripts/performance/utils/evaluate.py | sed -n '430,450p'

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1320


🏁 Script executed:

#!/bin/bash
# Check how wandb_run is used in the function
rg -n "wandb_run" scripts/performance/utils/evaluate.py | head -20

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 1196


🏁 Script executed:

#!/bin/bash
# Check if there are any other wandb references in the function besides the guard
cat -n scripts/performance/utils/evaluate.py | sed -n '436,550p' | rg "wandb"

Repository: NVIDIA-NeMo/Megatron-Bridge

Length of output: 428


Unconditional W&B check prevents non-W&B runs despite optional parameter.

The guard at line 436 in scripts/performance/utils/evaluate.py raises an ImportError unconditionally if HAVE_WANDB is False. However, wandb_run is declared as optional with a default of None, and all W&B operations in the function are guarded by if wandb_run is not None. Since setup_experiment.py passes wandb_run=None when W&B is not available, the function can execute without W&B. Relax the guard to only require W&B when wandb_run is actually provided:

Fix in scripts/performance/utils/evaluate.py
-    if not HAVE_WANDB:
-        raise ImportError("wandb is required for calculating perf and convergence metrics")
+    if wandb_run is not None and not HAVE_WANDB:
+        raise ImportError("wandb is required for calculating perf and convergence metrics")
🤖 Prompt for AI Agents
In `@scripts/performance/setup_experiment.py` around lines 462 - 473, The
ImportError guard in scripts/performance/utils/evaluate.py is too strict: change
the logic so HAVE_WANDB is only required when a wandb_run is actually provided
to calc_convergence_and_performance; specifically, in the top-level check that
currently raises ImportError when HAVE_WANDB is False, relax it to only raise if
wandb_run is not None and HAVE_WANDB is False, and keep all existing in-function
guards (if wandb_run is not None) unchanged so non-W&B calls with wandb_run=None
succeed; update any error message to mention wandb_run being required when
provided.

@copy-pr-bot copy-pr-bot Bot requested a deployment to nemo-ci February 4, 2026 14:40 Abandoned
@copy-pr-bot copy-pr-bot Bot requested a deployment to nemo-ci February 4, 2026 14:40 Abandoned
@ko3n1g ko3n1g disabled auto-merge February 4, 2026 14:42
@ko3n1g ko3n1g enabled auto-merge (squash) February 4, 2026 14:42
@ko3n1g ko3n1g merged commit 02364d5 into main Feb 4, 2026
82 of 85 checks passed
@ko3n1g ko3n1g deleted the ko3n1g/refactor/remove-need-for-wandb-key branch February 4, 2026 14:58
sowmen pushed a commit to sowmen/Megatron-Bridge that referenced this pull request Feb 11, 2026
Signed-off-by: oliver könig <okoenig@nvidia.com>
Signed-off-by: sowmen <sowmendipta@gmail.com>
@coderabbitai coderabbitai Bot mentioned this pull request Feb 14, 2026
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants