fix: preserve tool_calls in _stream_llm for native tool calling by getglad · Pull Request #1864 · NVIDIA/NeMo-Agent-Toolkit

getglad · 2026-04-12T17:57:44Z

_stream_llm reconstructed AIMessage from content parts only, dropping tool_calls from streamed chunks. This broke use_native_tool_calling on the ReAct agent - the agent treated every response as a final text answer because output_message.tool_calls was always empty.

Use LangChain chunk accumulation (chunk + chunk) which preserves tool_call_chunks, then convert via _chunk_to_message. Move _chunk_to_message from tool_calling_agent to base so both agent types share the implementation.

Description

Closes #1865

By Submitting this PR I confirm:

I am familiar with the Contributing Guidelines.
We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
- Any contribution which contains commits that are not Signed-Off will not be accepted.
When the PR is ready for review, new or existing tests cover these changes.
When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved streaming message handling to reliably preserve tool calls
- Fixed edge case handling for scenarios with empty streaming responses
- Enhanced message reconstruction logic for better consistency during streaming operations

_stream_llm reconstructed AIMessage from content parts only, dropping tool_calls from streamed chunks. This broke use_native_tool_calling on the ReAct agent — the agent treated every response as a final text answer because output_message.tool_calls was always empty. Use LangChain chunk accumulation (chunk + chunk) which preserves tool_call_chunks, then convert via _chunk_to_message. Move _chunk_to_message from tool_calling_agent to base so both agent types share the implementation.

copy-pr-bot · 2026-04-12T17:57:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-04-12T17:58:08Z

Walkthrough

The changes refactor streaming chunk handling in the LangChain agent codebase by extracting duplicate _chunk_to_message logic into a centralized helper function in the base module. This consolidates chunk accumulation and message reconstruction, standardizes tool call handling via convert_to_openai_messages, and adds explicit handling for edge cases such as empty streams.

Changes

Cohort / File(s)	Summary
Core Streaming Refactor `packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py`	Added `_chunk_to_message` helper function to reconstruct `AIMessage` from accumulated `AIMessageChunk` objects. Updated `_stream_llm` to accumulate chunks via LangChain's `+` operator and apply the new helper. Integrated explicit tool_calls handling via `convert_to_openai_messages` and added guard for empty chunk streams.
Agent Integration `packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py`	Removed local `_chunk_to_message` implementation and `convert_to_openai_messages` import. Updated to import and reuse the centralized `_chunk_to_message` from base module, eliminating code duplication.
Test Updates `packages/nvidia_nat_langchain/tests/agent/test_base.py`	Updated streaming test mocks to yield `AIMessageChunk` objects instead of custom `Mock` events. Added new tests for tool_calls preservation during streaming and empty chunk stream scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: preserve tool_calls in _stream_llm for native tool calling' is concise, descriptive, uses imperative mood (implicitly through 'fix'), is 63 characters long, and directly describes the main change of preserving tool_calls in streaming.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py (1)

134-145: Consider one-pass accumulation to reduce peak memory on long streams.

You can preserve behavior while avoiding the intermediate chunks list and second accumulation loop.

Refactor sketch

-        chunks: list[AIMessageChunk] = []
-        async for chunk in runnable.astream(inputs, config=self._runnable_config):
-            chunks.append(chunk)
-
-        if not chunks:
-            return AIMessage(content="")
-
-        # Accumulate using LangChain's + operator (preserves tool_call_chunks)
-        accumulated = chunks[0]
-        for c in chunks[1:]:
-            accumulated = accumulated + c
+        accumulated: AIMessageChunk | None = None
+        async for chunk in runnable.astream(inputs, config=self._runnable_config):
+            accumulated = chunk if accumulated is None else (accumulated + chunk)
+
+        if accumulated is None:
+            return AIMessage(content="")
 
         return _chunk_to_message(accumulated)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py` around
lines 134 - 145, The current code collects all chunks into a list (chunks) and
then performs a second loop to fold them with the LangChain + operator, which
increases peak memory for long streams; change to one-pass accumulation while
iterating runnable.astream: initialize a sentinel (e.g., accumulated = None)
before the async for, on first chunk set accumulated = chunk, on subsequent
chunks do accumulated = accumulated + chunk, and after the loop return
AIMessage(content="") if accumulated is still None otherwise use accumulated;
update references to chunks, runnable.astream, accumulated, and AIMessage
accordingly so behavior (including preservation of tool_call_chunks) is
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_langchain/tests/agent/test_base.py`:
- Around line 114-139: Update the test_streaming_preserves_tool_calls regression
test to also assert that the reconstructed OpenAI wire-format is present in the
returned message: after calling base_agent._stream_llm(mock_runnable, inputs)
and validating result.tool_calls, add an assertion that
result.additional_kwargs["tool_calls"] exists, is a list, and contains an entry
whose "name" == "get_time" (or otherwise matches the expected wire-format
payload produced by mock_astream). This change ensures _stream_llm preserves
both the parsed tool_calls and the serialized wire-format under
additional_kwargs["tool_calls"] for downstream compatibility.

---

Nitpick comments:
In `@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py`:
- Around line 134-145: The current code collects all chunks into a list (chunks)
and then performs a second loop to fold them with the LangChain + operator,
which increases peak memory for long streams; change to one-pass accumulation
while iterating runnable.astream: initialize a sentinel (e.g., accumulated =
None) before the async for, on first chunk set accumulated = chunk, on
subsequent chunks do accumulated = accumulated + chunk, and after the loop
return AIMessage(content="") if accumulated is still None otherwise use
accumulated; update references to chunks, runnable.astream, accumulated, and
AIMessage accordingly so behavior (including preservation of tool_call_chunks)
is unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 58f3c097-1fa9-4892-a3ae-f0af1782d00d

📥 Commits

Reviewing files that changed from the base of the PR and between ee7ab31 and eee4b70.

📒 Files selected for processing (3)

packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py
packages/nvidia_nat_langchain/tests/agent/test_base.py

coderabbitai · 2026-04-12T18:02:59Z

packages/nvidia_nat_langchain/tests/agent/test_base.py

+    async def test_streaming_preserves_tool_calls(self, base_agent):
+        """Test that tool_calls from native tool calling are preserved."""
+        mock_runnable = Mock()
+
+        async def mock_astream(inputs, **kwargs):
+            yield AIMessageChunk(
+                content="I'll check the time.",
+                tool_call_chunks=[{
+                    "name": "get_time",
+                    "args": '{"tz": "UTC"}',
+                    "id": "call_123",
+                    "index": 0,
+                    "type": "tool_call_chunk",
+                }],
+            )
+
+        mock_runnable.astream = mock_astream
+
+        inputs = {"messages": [HumanMessage(content="test")]}
+        result = await base_agent._stream_llm(mock_runnable, inputs)
+
+        assert isinstance(result, AIMessage)
+        assert result.content == "I'll check the time."
+        assert len(result.tool_calls) == 1
+        assert result.tool_calls[0]["name"] == "get_time"
+


⚠️ Potential issue | 🟡 Minor

Assert OpenAI wire-format tool_calls in this regression test.

This test confirms result.tool_calls, but it does not verify result.additional_kwargs["tool_calls"], which is the critical reconstructed wire format for downstream provider compatibility.

Suggested test addition

assert isinstance(result, AIMessage) assert result.content == "I'll check the time." assert len(result.tool_calls) == 1 assert result.tool_calls[0]["name"] == "get_time" + assert result.additional_kwargs.get("tool_calls"), "Expected OpenAI wire-format tool_calls to be preserved"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@packages/nvidia_nat_langchain/tests/agent/test_base.py` around lines 114 - 139, Update the test_streaming_preserves_tool_calls regression test to also assert that the reconstructed OpenAI wire-format is present in the returned message: after calling base_agent._stream_llm(mock_runnable, inputs) and validating result.tool_calls, add an assertion that result.additional_kwargs["tool_calls"] exists, is a list, and contains an entry whose "name" == "get_time" (or otherwise matches the expected wire-format payload produced by mock_astream). This change ensures _stream_llm preserves both the parsed tool_calls and the serialized wire-format under additional_kwargs["tool_calls"] for downstream compatibility.

getglad requested a review from a team as a code owner April 12, 2026 17:57

coderabbitai bot reviewed Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: preserve tool_calls in _stream_llm for native tool calling#1864

fix: preserve tool_calls in _stream_llm for native tool calling#1864
getglad wants to merge 1 commit intoNVIDIA:developfrom
getglad:fix/react-agent-stream-tool-calls

getglad commented Apr 12, 2026 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Apr 12, 2026

Uh oh!

coderabbitai bot commented Apr 12, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

getglad commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

By Submitting this PR I confirm:

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Apr 12, 2026

Uh oh!

coderabbitai bot commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

getglad commented Apr 12, 2026 •

edited

Loading

coderabbitai bot commented Apr 12, 2026 •

edited

Loading