Skip to content

Feature: token streaming support for ReAct Agent#1851

Open
thepatrickchin wants to merge 7 commits intoNVIDIA:developfrom
thepatrickchin:fix/react-token-streaming
Open

Feature: token streaming support for ReAct Agent#1851
thepatrickchin wants to merge 7 commits intoNVIDIA:developfrom
thepatrickchin:fix/react-token-streaming

Conversation

@thepatrickchin
Copy link
Copy Markdown
Member

@thepatrickchin thepatrickchin commented Apr 8, 2026

Description

Previously, ReAct's _stream_llm explicitly passed self._runnable_config to the LLM, overriding LangGraph's injected runtime config and dropping its streaming callbacks. This was fixed by merging both configs so LangGraph can observe tokens.

The ReAct Agent was also missing _stream_fn, so the framework had no streaming entry point and always fell back to the non-streaming path. This was fixed by propagating the injected node config through agent_node down to _stream_llm, and registering a _stream_fn via FunctionInfo.create that yields ChatResponseChunk tokens from graph.astream

Closes #1850

By Submitting this PR I confirm:

  • I am familiar with the Contributing Guidelines.
  • We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
    • Any contribution which contains commits that are not Signed-Off will not be accepted.
  • When the PR is ready for review, new or existing tests cover these changes.
  • When the PR is ready for review, the documentation is up to date with these changes.

Summary by CodeRabbit

  • New Features

    • Agent responses can be streamed in real time as message chunks for improved responsiveness
    • Workflows now register a streaming handler alongside the existing single-response handler
  • Improvements

    • Streaming now merges and respects runtime configuration to ensure callbacks and event handling behave as expected
    • Agent system prompt enforces a consistent "Final Answer:" formatting
  • Tests

    • Added unit tests covering streaming behavior and configuration propagation

ReAct's _stream_llm explicitly passed self._runnable_config to the LLM,
overriding LangGraph's injected runtime config and dropping its streaming
callbacks. Fixed by merging both configs so LangGraph can observe tokens.

Without a _stream_fn, the framework had no streaming entry point and always
fell back to the non-streaming path. Fixed by propagating the injected node
config through agent_node down to _stream_llm, and register a _stream_fn via
FunctionInfo.create that yields ChatResponseChunk tokens from
graph.astream(stream_mode="messages").

Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
Buffer streamed tokens until FINAL_ANSWER_PATTERN is detected, then
yield only the content after the marker. Falls back to yielding the
full buffer for direct answers that omit the ReAct format.

Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
@thepatrickchin thepatrickchin requested a review from a team as a code owner April 8, 2026 10:10
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 8, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 8, 2026

Walkthrough

Adds optional RunnableConfig propagation and merging into agent streaming paths, implements a streaming workflow for the ReAct agent that yields message chunks after detecting a final-answer marker, and adds tests for streaming and config propagation.

Changes

Cohort / File(s) Summary
Agent base infra
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
Added merge_configs import and optional `config: RunnableConfig
ReAct agent node
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py
agent_node now accepts `config: RunnableConfig
ReAct streaming registration
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py
Replaced FunctionInfo.from_fn registration with FunctionInfo.create(single_fn=_response_fn, stream_fn=_stream_fn, ...); added _stream_fn that streams AIMessageChunk events, buffers interim content, strips R1 tags, detects FINAL_ANSWER_PATTERN to emit post-marker content, and handles GraphRecursionError with a warning chunk.
ReAct prompt
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/prompt.py
Adjusted system prompt to require final responses to include Final Answer: exactly (no surrounding markdown).
Dual / ToolCall agents
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/dual_node.py, packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py
Updated abstract/implemented agent_node signatures to accept optional `config: RunnableConfig
Tests
packages/nvidia_nat_langchain/tests/agent/test_base.py, packages/nvidia_nat_langchain/tests/agent/test_react.py
Added async tests verifying _stream_llm config forwarding/merging and ReAct graph streaming message production and config propagation.

Sequence Diagram

sequenceDiagram
    participant Client
    participant ReActGraph
    participant AgentNode
    participant BaseAgent
    participant Runnable
    participant Stream

    Client->>ReActGraph: astream(inputs, config)
    ReActGraph->>AgentNode: agent_node(state, config)
    AgentNode->>AgentNode: build inputs dict
    AgentNode->>BaseAgent: _stream_llm(runnable, inputs, config)
    BaseAgent->>BaseAgent: effective_config = merge_configs(self._runnable_config, config)
    BaseAgent->>Runnable: astream(inputs, effective_config)
    Runnable->>Stream: emit AIMessageChunk events
    Stream->>AgentNode: deliver chunks (metadata: node="agent")
    AgentNode->>ReActGraph: forward buffered/filtered chunks
    ReActGraph->>Client: emit post-FINAL_ANSWER_PATTERN chunks
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 58.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely describes the main feature being added: token streaming support for the ReAct Agent, which aligns with the changeset's primary objective.
Linked Issues check ✅ Passed The PR successfully implements all key requirements from issue #1850: merges configs to preserve LangGraph streaming callbacks [base.py, react_agent/agent.py, dual_node.py], registers a _stream_fn entry point [react_agent/register.py], propagates config through agent_node, and strips think tags before yielding.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing token streaming for the ReAct Agent: base config merging, agent_node signature updates, prompt formatting, streaming registration, and comprehensive tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py (1)

238-251: Consider using logger.exception() when re-raising or clarify intent.

Per coding guidelines: when catching and logging exceptions without re-raising, use logger.exception() to capture the full stack trace. Line 250 uses logger.error() before re-raising, which is correct per the guideline for re-raising. However, verify this is intentional.

The GraphRecursionError handling (lines 238-248) gracefully yields an error message instead of raising, which is appropriate for streaming—users receive a clear message rather than a broken stream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py`
around lines 238 - 251, The current except Exception block logs with
logger.error("%s ReAct Agent streaming failed with exception: %s",
AGENT_LOG_PREFIX, ex) and then re-raises; if you want the full stack trace
captured in logs change this to logger.exception(...) in the except Exception as
ex handler (preserving AGENT_LOG_PREFIX and the raise), otherwise explicitly
document with an inline comment that error-level logging without stack trace is
intentional; refer to the except Exception handler, logger.error call,
GraphRecursionError handling, and ChatResponseChunk.create_streaming_chunk to
locate the area to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py`:
- Around line 238-251: The current except Exception block logs with
logger.error("%s ReAct Agent streaming failed with exception: %s",
AGENT_LOG_PREFIX, ex) and then re-raises; if you want the full stack trace
captured in logs change this to logger.exception(...) in the except Exception as
ex handler (preserving AGENT_LOG_PREFIX and the raise), otherwise explicitly
document with an inline comment that error-level logging without stack trace is
intentional; refer to the except Exception handler, logger.error call,
GraphRecursionError handling, and ChatResponseChunk.create_streaming_chunk to
locate the area to change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0e3d555c-285d-4252-b495-0bfda33dc121

📥 Commits

Reviewing files that changed from the base of the PR and between 998d535 and dac3b82.

📒 Files selected for processing (6)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/base.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/agent.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/prompt.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py
  • packages/nvidia_nat_langchain/tests/agent/test_base.py
  • packages/nvidia_nat_langchain/tests/agent/test_react.py

@willkill07 willkill07 added feature request New feature or request non-breaking Non-breaking change labels Apr 8, 2026
@willkill07 willkill07 changed the title Fix token streaming with ReAct Agent Feature: token streaming support for ReAct Agent Apr 8, 2026
Copy link
Copy Markdown
Member

@willkill07 willkill07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Structurally, this is good.

We just need to ensure consistent interfaces which are currently being violated as well as ensure behavior is consistent between streaming/non-streaming.

…eter

- Update agent_node of tool calling agent to match

Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
Apply remove_r1_think_tags to the streaming buffer before searching for
the Final Answer marker and before yielding the fallback response, so
think tags are not leaked to the client when the LLM answers directly
without ReAct format, matching non-streaming behavior

Signed-off-by: Patrick Chin <8509935+thepatrickchin@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/dual_node.py`:
- Around line 48-49: Add a Google-style docstring to the public async method
agent_node(self, state: BaseModel, config: RunnableConfig | None = None) ->
BaseModel explaining the expected types and semantics: describe the purpose of
the method, the meaning and required shape/fields of the state parameter
(BaseModel), the optional config parameter (RunnableConfig) and when it may be
None, the contract of the returned BaseModel (what fields/side-effects callers
should expect), any exceptions that implementations may raise, and thread/async
considerations; place the docstring immediately under the async def and ensure
it follows Google-style param/returns/raises sections so implementers have a
clear, consistent contract.
🪄 Autofix (Beta)

❌ Autofix failed (check again to retry)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e133c126-9700-426a-b026-7c303c96084a

📥 Commits

Reviewing files that changed from the base of the PR and between dac3b82 and a5096df.

📒 Files selected for processing (3)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/dual_node.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py
✅ Files skipped from review due to trivial changes (1)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/tool_calling_agent/agent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/nvidia_nat_langchain/src/nat/plugins/langchain/agent/react_agent/register.py

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 10, 2026

Note

Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it.

❌ Failed to clone repository into sandbox. Please try again.

@willkill07
Copy link
Copy Markdown
Member

/ok to test 2c371ee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ReAct Agent should support streaming output

2 participants