Make notification queue capacity configurable#50
Open
carlosflorencio wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why This Is Needed
Some ACP agents replay the full conversation on session resume as a burst of
session/updatenotifications. In large sessions, that replay can produce thousands of frames in well under a second.The SDK read loop can enqueue inbound notifications much faster than a downstream consumer can process them, especially when each update has to be decoded, synchronized with local state, queued again, and fanned out to other workers. With the current fixed 1024-deep queue, a healthy consumer can still temporarily fall behind during a replay burst and cause the SDK to close the connection with
notification queue overflow.Once the connection closes, any in-flight requests such as
session/load,session/new, orsession/promptfail withpeer disconnected before response. The consumer then has no structured way to distinguish this expected transport failure mode from other request errors without string matching.The fix is to keep the queue bounded but make the bound configurable per connection. Consumers that expect large replay bursts can opt into a larger capacity, while the upstream default remains
1024so existing users do not get a behavior or memory change unless they explicitly choose one.Summary
WithMaxQueuedNotifications(n)as a per-connection option for inbound notification queue capacity.NewConnection,NewAgentSideConnection, andNewClientSideConnectionwhile preserving the 1024 default andn <= 0fallback.ErrNotificationQueueOverflowandErrPeerDisconnected; peer-disconnect*RequestErrorvalues now wrap the sentinel forerrors.Is.Release notes
acp.WithMaxQueuedNotifications(n)configures the per-connection inbound notification queue capacity. Outbound requests and notifications are unaffected.acp.ErrNotificationQueueOverflowandacp.ErrPeerDisconnectedforerrors.Ischecks.v0.13.5; the next upstream patch release would bev0.13.6rather thanv0.13.1.Verification
go test ./...make testPATH=/root/go/bin:/root/.cargo/bin:/root/.local/bin:$PATH make checknix flake checkwas not runnable in this container becausenixis not installed.