Skip to content

fix: preserve tool_calls in _stream_llm for native tool calling#1864

Open
getglad wants to merge 1 commit intoNVIDIA:developfrom
getglad:fix/react-agent-stream-tool-calls
Open

fix: preserve tool_calls in _stream_llm for native tool calling#1864
getglad wants to merge 1 commit intoNVIDIA:developfrom
getglad:fix/react-agent-stream-tool-calls

Conversation

@getglad
Copy link
Copy Markdown

@getglad getglad commented Apr 12, 2026

_stream_llm reconstructed AIMessage from content parts only, dropping tool_calls from streamed chunks. This broke use_native_tool_calling on the ReAct agent - the agent treated every response as a final text answer because output_message.tool_calls was always empty.

Use LangChain chunk accumulation (chunk + chunk) which preserves tool_call_chunks, then convert via _chunk_to_message. Move _chunk_to_message from tool_calling_agent to base so both agent types share the implementation.

Description

Closes #1865

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

Release Notes

  • Bug Fixes
    • Improved streaming message handling to reliably preserve tool calls
    • Fixed edge case handling for scenarios with empty streaming responses
    • Enhanced message reconstruction logic for better consistency during streaming operations

 _stream_llm reconstructed AIMessage from content parts only, dropping tool_calls from streamed chunks. This broke use_native_tool_calling on the ReAct agent — the agent treated every response as a final text answer because output_message.tool_calls was always empty.

Use LangChain chunk accumulation (chunk + chunk) which preserves tool_call_chunks, then convert via _chunk_to_message. Move _chunk_to_message from tool_calling_agent to base so both agent types share the implementation.
@getglad getglad requested a review from a team as a code owner April 12, 2026 17:57
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 12, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 12, 2026

Walkthrough

The changes refactor streaming chunk handling in the LangChain agent codebase by extracting duplicate _chunk_to_message logic into a centralized helper function in the base module. This consolidates chunk accumulation and message reconstruction, standardizes tool call handling via convert_to_openai_messages, and adds explicit handling for edge cases such as empty streams.

Changes

Cohort / File(s) Summary
Core Streaming Refactor
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
Added _chunk_to_message helper function to reconstruct AIMessage from accumulated AIMessageChunk objects. Updated _stream_llm to accumulate chunks via LangChain's + operator and apply the new helper. Integrated explicit tool_calls handling via convert_to_openai_messages and added guard for empty chunk streams.
Agent Integration
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py
Removed local _chunk_to_message implementation and convert_to_openai_messages import. Updated to import and reuse the centralized _chunk_to_message from base module, eliminating code duplication.
Test Updates
packages/nvidia_nat_langchain/tests/agent/test_base.py
Updated streaming test mocks to yield AIMessageChunk objects instead of custom Mock events. Added new tests for tool_calls preservation during streaming and empty chunk stream scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: preserve tool_calls in _stream_llm for native tool calling' is concise, descriptive, uses imperative mood (implicitly through 'fix'), is 63 characters long, and directly describes the main change of preserving tool_calls in streaming.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py (1)

134-145: Consider one-pass accumulation to reduce peak memory on long streams.

You can preserve behavior while avoiding the intermediate chunks list and second accumulation loop.

Refactor sketch
-        chunks: list[AIMessageChunk] = []
-        async for chunk in runnable.astream(inputs, config=self._runnable_config):
-            chunks.append(chunk)
-
-        if not chunks:
-            return AIMessage(content="")
-
-        # Accumulate using LangChain's + operator (preserves tool_call_chunks)
-        accumulated = chunks[0]
-        for c in chunks[1:]:
-            accumulated = accumulated + c
+        accumulated: AIMessageChunk | None = None
+        async for chunk in runnable.astream(inputs, config=self._runnable_config):
+            accumulated = chunk if accumulated is None else (accumulated + chunk)
+
+        if accumulated is None:
+            return AIMessage(content="")
 
         return _chunk_to_message(accumulated)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py` around
lines 134 - 145, The current code collects all chunks into a list (chunks) and
then performs a second loop to fold them with the LangChain + operator, which
increases peak memory for long streams; change to one-pass accumulation while
iterating runnable.astream: initialize a sentinel (e.g., accumulated = None)
before the async for, on first chunk set accumulated = chunk, on subsequent
chunks do accumulated = accumulated + chunk, and after the loop return
AIMessage(content="") if accumulated is still None otherwise use accumulated;
update references to chunks, runnable.astream, accumulated, and AIMessage
accordingly so behavior (including preservation of tool_call_chunks) is
unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_langchain/tests/agent/test_base.py`:
- Around line 114-139: Update the test_streaming_preserves_tool_calls regression
test to also assert that the reconstructed OpenAI wire-format is present in the
returned message: after calling base_agent._stream_llm(mock_runnable, inputs)
and validating result.tool_calls, add an assertion that
result.additional_kwargs["tool_calls"] exists, is a list, and contains an entry
whose "name" == "get_time" (or otherwise matches the expected wire-format
payload produced by mock_astream). This change ensures _stream_llm preserves
both the parsed tool_calls and the serialized wire-format under
additional_kwargs["tool_calls"] for downstream compatibility.

---

Nitpick comments:
In `@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py`:
- Around line 134-145: The current code collects all chunks into a list (chunks)
and then performs a second loop to fold them with the LangChain + operator,
which increases peak memory for long streams; change to one-pass accumulation
while iterating runnable.astream: initialize a sentinel (e.g., accumulated =
None) before the async for, on first chunk set accumulated = chunk, on
subsequent chunks do accumulated = accumulated + chunk, and after the loop
return AIMessage(content="") if accumulated is still None otherwise use
accumulated; update references to chunks, runnable.astream, accumulated, and
AIMessage accordingly so behavior (including preservation of tool_call_chunks)
is unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 58f3c097-1fa9-4892-a3ae-f0af1782d00d

📥 Commits

Reviewing files that changed from the base of the PR and between ee7ab31 and eee4b70.

📒 Files selected for processing (3)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py
  • packages/nvidia_nat_langchain/tests/agent/test_base.py

Comment on lines +114 to +139
async def test_streaming_preserves_tool_calls(self, base_agent):
"""Test that tool_calls from native tool calling are preserved."""
mock_runnable = Mock()

async def mock_astream(inputs, **kwargs):
yield AIMessageChunk(
content="I'll check the time.",
tool_call_chunks=[{
"name": "get_time",
"args": '{"tz": "UTC"}',
"id": "call_123",
"index": 0,
"type": "tool_call_chunk",
}],
)

mock_runnable.astream = mock_astream

inputs = {"messages": [HumanMessage(content="test")]}
result = await base_agent._stream_llm(mock_runnable, inputs)

assert isinstance(result, AIMessage)
assert result.content == "I'll check the time."
assert len(result.tool_calls) == 1
assert result.tool_calls[0]["name"] == "get_time"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assert OpenAI wire-format tool_calls in this regression test.

This test confirms result.tool_calls, but it does not verify result.additional_kwargs["tool_calls"], which is the critical reconstructed wire format for downstream provider compatibility.

Suggested test addition
         assert isinstance(result, AIMessage)
         assert result.content == "I'll check the time."
         assert len(result.tool_calls) == 1
         assert result.tool_calls[0]["name"] == "get_time"
+        assert result.additional_kwargs.get("tool_calls"), "Expected OpenAI wire-format tool_calls to be preserved"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/nvidia_nat_langchain/tests/agent/test_base.py` around lines 114 -
139, Update the test_streaming_preserves_tool_calls regression test to also
assert that the reconstructed OpenAI wire-format is present in the returned
message: after calling base_agent._stream_llm(mock_runnable, inputs) and
validating result.tool_calls, add an assertion that
result.additional_kwargs["tool_calls"] exists, is a list, and contains an entry
whose "name" == "get_time" (or otherwise matches the expected wire-format
payload produced by mock_astream). This change ensures _stream_llm preserves
both the parsed tool_calls and the serialized wire-format under
additional_kwargs["tool_calls"] for downstream compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReAct Agent: _stream_llm drops tool_calls when use_native_tool_calling=True

1 participant