Conversation
Signed-off-by: oliver könig <okoenig@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
📝 WalkthroughWalkthroughThe changes decouple WandB integration from performance validation by making Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Important Action Needed: IP Allowlist UpdateIf your organization protects your Git platform with IP whitelisting, please add the new CodeRabbit IP address to your allowlist:
Reviews will stop working after February 8, 2026 if the new IP is not added to your allowlist. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/performance/utils/evaluate.py (1)
119-140:⚠️ Potential issue | 🟠 MajorUpdate type hints to PEP 604 unions and built-in generics across multiple functions; document
wandb_runin docstrings.The new
wandb_runparameter and existing type hints need updates across four functions to align with Python 3.10+ guidelines:
validate_convergence (line 119): Use
"wandb.Run" | Noneanddict[str, Any]instead ofOptional["wandb.Run"]andDict[str, Any]. Also uselist[str]for thestepsparameter. Addwandb_runto the Args docstring.validate_performance (line 305): Apply the same typing updates and expand the minimal docstring to include parameter documentation for
wandb_run.write_golden_values_to_disk (line 386): Update
Dict[str, Any]todict[str, Any]andOptional["wandb.Run"]to"wandb.Run" | None. Expand the docstring to document all parameters.calc_convergence_and_performance (line 404): Update
List[str]tolist[str],Dict[str, Any]todict[str, Any], andOptional["wandb.Run"]to"wandb.Run" | None.Per coding guidelines, use built-in generics and PEP 604 unions for all type hints, and maintain comprehensive Google-style docstrings for all public functions.
🤖 Fix all issues with AI agents
In `@scripts/performance/setup_experiment.py`:
- Around line 462-473: The ImportError guard in
scripts/performance/utils/evaluate.py is too strict: change the logic so
HAVE_WANDB is only required when a wandb_run is actually provided to
calc_convergence_and_performance; specifically, in the top-level check that
currently raises ImportError when HAVE_WANDB is False, relax it to only raise if
wandb_run is not None and HAVE_WANDB is False, and keep all existing in-function
guards (if wandb_run is not None) unchanged so non-W&B calls with wandb_run=None
succeed; update any error message to mention wandb_run being required when
provided.
🧹 Nitpick comments (1)
scripts/performance/utils/evaluate.py (1)
385-390: Add return type hint, update nullable type syntax, and document thewandb_runparameter.The function should use the
T | Nonesyntax for Python 3.10+ and include a return type hint. The docstring should also document thewandb_runparameter per Google style guide conventions.♻️ Proposed update
-def write_golden_values_to_disk( - current_values: Dict[str, Any], golden_values_path: str, wandb_run: Optional["wandb.Run"] = None -): +def write_golden_values_to_disk( + current_values: Dict[str, Any], golden_values_path: str, wandb_run: "wandb.Run" | None = None +) -> None: """ Write golden values to a file. + + Args: + current_values: Dictionary of current metric values. + golden_values_path: Path to write the golden values file. + wandb_run: Optional Weights & Biases run object for logging. """
| is_testing_passed, error_msg = calc_convergence_and_performance( | ||
| model_family_name=model_family_name, | ||
| model_recipe_name=model_recipe_name, | ||
| assets_dir=os.path.join(job_dir, exp_name), | ||
| log_paths=log_paths, | ||
| loss_metric="lm loss", | ||
| timing_metric="elapsed time per iteration (ms)", | ||
| golden_values_path=golden_values_path, | ||
| convergence_config=convergence_params, | ||
| performance_config=performance_params, | ||
| wandb_run=wandb_run, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check HAVE_WANDB guard in evaluate.py
rg -n "HAVE_WANDB" scripts/performance/utils/evaluate.py -A 2 -B 2Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 347
🏁 Script executed:
#!/bin/bash
# Find and examine calc_convergence_and_performance function
rg -n "def calc_convergence_and_performance" scripts/performance/utils/ -A 5Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 485
🏁 Script executed:
#!/bin/bash
# Check the current setup_experiment.py code around lines 462-473
cat -n scripts/performance/setup_experiment.py | sed -n '460,475p'Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 837
🏁 Script executed:
#!/bin/bash
# Check broader context around line 462 to see if call is guarded
cat -n scripts/performance/setup_experiment.py | sed -n '440,480p'Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 2012
🏁 Script executed:
#!/bin/bash
# Check full function signature of calc_convergence_and_performance
cat -n scripts/performance/utils/evaluate.py | sed -n '404,425p'Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1066
🏁 Script executed:
#!/bin/bash
# Check what happens at the HAVE_WANDB guard location
cat -n scripts/performance/utils/evaluate.py | sed -n '430,450p'Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1320
🏁 Script executed:
#!/bin/bash
# Check how wandb_run is used in the function
rg -n "wandb_run" scripts/performance/utils/evaluate.py | head -20Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 1196
🏁 Script executed:
#!/bin/bash
# Check if there are any other wandb references in the function besides the guard
cat -n scripts/performance/utils/evaluate.py | sed -n '436,550p' | rg "wandb"Repository: NVIDIA-NeMo/Megatron-Bridge
Length of output: 428
Unconditional W&B check prevents non-W&B runs despite optional parameter.
The guard at line 436 in scripts/performance/utils/evaluate.py raises an ImportError unconditionally if HAVE_WANDB is False. However, wandb_run is declared as optional with a default of None, and all W&B operations in the function are guarded by if wandb_run is not None. Since setup_experiment.py passes wandb_run=None when W&B is not available, the function can execute without W&B. Relax the guard to only require W&B when wandb_run is actually provided:
Fix in scripts/performance/utils/evaluate.py
- if not HAVE_WANDB:
- raise ImportError("wandb is required for calculating perf and convergence metrics")
+ if wandb_run is not None and not HAVE_WANDB:
+ raise ImportError("wandb is required for calculating perf and convergence metrics")🤖 Prompt for AI Agents
In `@scripts/performance/setup_experiment.py` around lines 462 - 473, The
ImportError guard in scripts/performance/utils/evaluate.py is too strict: change
the logic so HAVE_WANDB is only required when a wandb_run is actually provided
to calc_convergence_and_performance; specifically, in the top-level check that
currently raises ImportError when HAVE_WANDB is False, relax it to only raise if
wandb_run is not None and HAVE_WANDB is False, and keep all existing in-function
guards (if wandb_run is not None) unchanged so non-W&B calls with wandb_run=None
succeed; update any error message to mention wandb_run being required when
provided.
Signed-off-by: oliver könig <okoenig@nvidia.com> Signed-off-by: sowmen <sowmendipta@gmail.com>
What does this PR do ?
Add a one line overview of what this PR aims to accomplish.
Changelog
GitHub Actions CI
See the CI sectionin the Contributing doc for how to trigger the CI. A Nvidia developer will need to approve and trigger the CI for external contributors.
Before your PR is "Ready for review"
Pre checks:
If you haven't finished some of the above items you can still open "Draft" PR.
Additional Information
Summary by CodeRabbit