Releases: NVIDIA-NeMo/Export-Deploy
Releases · NVIDIA-NeMo/Export-Deploy
NVIDIA NeMo-Export-Deploy 0.5.0
Changelog Details
- Version bump to
0.5.0rc0.dev0by @github-actions[bot] :: PR: #580 - ci: Add secrets detector by @chtruong814 :: PR: #578
- Add apply_chat_template to HF vllm Ray deployment by @athitten :: PR: #581
- Onur/remove nemo2 trtllm support by @oyilmaz-nvidia :: PR: #576
- Remove MM trt-llm files for nemo2 by @oyilmaz-nvidia :: PR: #583
- ci: Adding to codeowners by @chtruong814 :: PR: #585
- Remove more nemo2 and unused code. by @oyilmaz-nvidia :: PR: #584
- docs: Remove uv sync with uv_args by @thomasdhc :: PR: #586
- Update to use latest MBridge by @chtruong814 :: PR: #589
- Add inference_max_seq_len to ray mbridge deployment path by @athitten :: PR: #588
- Remove nemo imports by @oyilmaz-nvidia :: PR: #594
- ci: Fix wheel build test and publish by @chtruong814 :: PR: #595
- ci: Re-enable onnx test by @chtruong814 :: PR: #597
- ci: Update release-docs workflow to use FW-CI-templates v0.72.0 by @chtruong814 :: PR: #599
- feat: Pass ETP and Sequence Parallel to inframework Ray deployment by @ko3n1g :: PR: #600
- ci: Update release workflows to include changelog and docs by @chtruong814 :: PR: #604
- build: Remove torchao by @chtruong814 :: PR: #606
- build: Upgrade vllm to 0.14.1 by @chtruong814 :: PR: #609
- Add support for stop_words in Ray MBridge deployment by @athitten :: PR: #605
- Add vllm docs for mbridge ckpt by @oyilmaz-nvidia :: PR: #573
- Docs update: remove nemo2 and fix import by @oyilmaz-nvidia :: PR: #608
- Update CI docker image and set vllm eager enforce_eager to False by @chtruong814 :: PR: #614
- Fix building doc and remove all nemo 2.0 docs by @oyilmaz-nvidia :: PR: #615
- Fix multimodal deployment sampling params by @meatybobby :: PR: #602
- docs: Enable nightly docs build on main branch by @chtruong814 :: PR: #619
- Set materialize_only_last_token_logits=False when log_probs = True by @athitten :: PR: #613
- ci: Add-credentials-for-docs by @ko3n1g :: PR: #623
- Fix release workflow reference by @chtruong814 :: PR: #625
- Fix mbridge inference for latest mbridge by @oyilmaz-nvidia :: PR: #627
- feat: Add support for batching of Ray Serve requests by @pthombre :: PR: #629
- Remove all nemo2 imports from old repo by @oyilmaz-nvidia :: PR: #628
- build: Bump export-deploy dependencies for 26.04 by @chtruong814 :: PR: #633
- Docs: remove vLLM install step from mbridge vllm quickstart by @oyilmaz-nvidia :: PR: #618
- Announce Python 3.12 migration by @ko3n1g :: PR: #630
- ci: Enable claude review by @thomasdhc :: PR: #635
- ci: Fix sso user check by @chtruong814 :: PR: #637
- chore: test FW-CI-templates ko3n1g/fix/linkcheck-retry-backoff by @ko3n1g :: PR: #638
- ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g :: PR: #639
- Add legacy_model_format param by @oyilmaz-nvidia :: PR: #641
- chore: Move to Py3.12 by @ko3n1g :: PR: #631
- cp:
build: Bump vLLM to address CVE (644)intor0.5.0by @svcnvidia-nemo-ci :: PR: #645 - cp:
Fix MLA model issues (647)intor0.5.0by @svcnvidia-nemo-ci :: PR: #649 - cp:
build: Set trt-llm and vllm for 26.04 (650)intor0.5.0by @svcnvidia-nemo-ci :: PR: #651
NVIDIA NeMo-Export-Deploy 0.4.0
Highlights
- vLLM support for Megatron-Bridge LLM checkpoints.
- Remove NeMo 2.0 support.
- Deployment of Megatron-Bridge VLM checkpoints
Changelog Details
- Eval logprob benchmarks support for HF via vLLM with Ray by @athitten :: PR: #479
- feat: add labeler by @pablo-garay :: PR: #483
- Support apply_chat_template in NeMo MM in-framework deployment by @meatybobby :: PR: #440
- NeMo-Export-Deploy 0.2.1 changelog by @pablo-garay :: PR: #489
- Add torch_dtype and default values by @oyilmaz-nvidia :: PR: #466
- Fix max token input by @oyilmaz-nvidia :: PR: #478
- Remove scheduled cron job from release workflow by @pablo-garay :: PR: #494
- feat: Add anchor by @pablo-garay :: PR: #495
- [Eval] Fixes for compatibility between Pytriton, Ray deployments with nemo-run by @athitten :: PR: #501
- New script path by @oyilmaz-nvidia :: PR: #487
- Update trt-llm doc for nemo 2 by @oyilmaz-nvidia :: PR: #506
- Change type for --runtime_env in ray in-fw deployment script by @athitten :: PR: #505
- fix : New peft release adjust fix by @pablo-garay :: PR: #514
- fix: ensure vLLM receives valid params regardless of env changes by @pablo-garay :: PR: #516
- Fix minor doc issue by @oyilmaz-nvidia :: PR: #521
- Update changelog for release 0.3.0 by @oyilmaz-nvidia :: PR: #522
- Update nvidia-sphinx-theme by @chtruong814 :: PR: #528
- Update changelog for version 0.3.1 by @pablo-garay :: PR: #537
- Minor fixes for MBridge nemotron deployment by @athitten :: PR: #518
- docs: Update docs version to latest by @chtruong814 :: PR: #553
- docs: Fixing version1.json by @aschilling-nv :: PR: #554
- Properly Handle DynamicInferenceRequestRecord with latest Mcore by @chtruong814 :: PR: #559
- Add vllm support for mbridge by @oyilmaz-nvidia :: PR: #555
- Temp fix for k8s issue by @ko3n1g :: PR: #565
- ci: Enable AWS runners by @chtruong814 :: PR: #557
- docs: Release docs by @ko3n1g :: PR: #566
- Remove nemo from in-framework deployment by @oyilmaz-nvidia :: PR: #568
- Fix chat endpoint support for Ray in-framework MBridge deployment by @athitten :: PR: #572
- build: Update dependencies for 26.02 by @chtruong814 :: PR: #567
- Remove nemo2 vllm support by @oyilmaz-nvidia :: PR: #571
- Update multimodal in-framework FastAPI from NeMo to Megatron Bridge by @meatybobby :: PR: #511
- Fix chat endpoint support for HF deployment with Ray by @athitten :: PR: #575
- Add Ray Serve Deployment Support for Multimodal Models by @meatybobby :: PR: #574
- cp:
Add apply_chat_template to HF vllm Ray deployment (581)intor0.4.0by @ko3n1g :: PR: #582 - cp:
Remove more nemo2 and unused code. (584)intor0.4.0by @ko3n1g :: PR: #587 - cp:
docs: Remove uv sync with uv_args (586)intor0.4.0by @ko3n1g :: PR: #591 - cp:
Add inference_max_seq_len to ray mbridge deployment path (588)intor0.4.0by @ko3n1g :: PR: #593 - cp: Fix wheel build test and publish (#595) in r0.4.0 by @chtruong814 :: PR: #596
- cp: Re-enable onnx test (#597) in r0.4.0 by @chtruong814 :: PR: #598
- cp:
ci: Update release-docs workflow to use FW-CI-templates v0.72.0 (599)intor0.4.0by @ko3n1g :: PR: #601 - cp:
ci: Update release workflows to include changelog and docs (604)intor0.4.0by @ko3n1g :: PR: #607 - cp:
build: Remove torchao (606)intor0.4.0by @ko3n1g :: PR: #610 - cp: build: Upgrade vllm to 0.14.1 (#609) into r0.4.0 by @chtruong814 :: PR: #611
- docs: Update docs for 0.4.0 by @chtruong814 :: PR: #612
- cp:
Update CI docker image and set vllm eager enforce_eager to False (614)intor0.4.0by @svcnvidia-nemo-ci :: PR: #617 - docs: Update docs version for 0.4.0 release by @chtruong814 :: PR: #620
NVIDIA NeMo-Export-Deploy 0.3.1
NVIDIA NeMo-Export-Deploy 0.3.0
- Update TensorRT-LLM export to use NeMo->HF->TensorRT-LLM export path
- Add chat template support for VLM deployment.
- Bug fixes and folder name updates such as updating nlp to llm.
NVIDIA NeMo-Export-Deploy 0.2.1
NVIDIA NeMo-Export-Deploy 0.2.0
- MegatronLM and Megatron-Bridge model deployment support with Triton Inference Server and Ray Serve
- Multi-node multi-instance Ray Serve based deployment for NeMo 2, Megatron-Bridge, and Megatron-LM models.
- Update vLLM export to use NeMo->HF->vLLM export path
- Multi-Modal deployment for NeMo 2 models with Triton Inference Server
- NeMo Retriever Text Reranking ONNX and TensorRT export support
NVIDIA NeMo-Export-Deploy 0.2.0rc2
Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc2 (2025-08-18)
NVIDIA NeMo-Export-Deploy 0.1.1
ci: Mock DCO check Signed-off-by: oliver könig <okoenig@nvidia.com>
NVIDIA NeMo-Export-Deploy 0.2.0rc1
Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc1 (2025-08-14)
NVIDIA NeMo-Export-Deploy 0.2.0rc0
Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc0 (2025-08-03)