Skip to content

Make notification queue capacity configurable#50

Open
carlosflorencio wants to merge 1 commit into
coder:mainfrom
carlosflorencio:feature/acp-go-sg4
Open

Make notification queue capacity configurable#50
carlosflorencio wants to merge 1 commit into
coder:mainfrom
carlosflorencio:feature/acp-go-sg4

Conversation

@carlosflorencio

Copy link
Copy Markdown

Why This Is Needed

Some ACP agents replay the full conversation on session resume as a burst of session/update notifications. In large sessions, that replay can produce thousands of frames in well under a second.

The SDK read loop can enqueue inbound notifications much faster than a downstream consumer can process them, especially when each update has to be decoded, synchronized with local state, queued again, and fanned out to other workers. With the current fixed 1024-deep queue, a healthy consumer can still temporarily fall behind during a replay burst and cause the SDK to close the connection with notification queue overflow.

Once the connection closes, any in-flight requests such as session/load, session/new, or session/prompt fail with peer disconnected before response. The consumer then has no structured way to distinguish this expected transport failure mode from other request errors without string matching.

The fix is to keep the queue bounded but make the bound configurable per connection. Consumers that expect large replay bursts can opt into a larger capacity, while the upstream default remains 1024 so existing users do not get a behavior or memory change unless they explicitly choose one.

Summary

  • Add WithMaxQueuedNotifications(n) as a per-connection option for inbound notification queue capacity.
  • Thread connection options through NewConnection, NewAgentSideConnection, and NewClientSideConnection while preserving the 1024 default and n <= 0 fallback.
  • Export ErrNotificationQueueOverflow and ErrPeerDisconnected; peer-disconnect *RequestError values now wrap the sentinel for errors.Is.
  • Add a goroutine-fed burst regression test that overflows at default capacity and delivers all notifications with a larger configured capacity.

Release notes

  • New option: acp.WithMaxQueuedNotifications(n) configures the per-connection inbound notification queue capacity. Outbound requests and notifications are unaffected.
  • New sentinels: acp.ErrNotificationQueueOverflow and acp.ErrPeerDisconnected for errors.Is checks.
  • This checkout is currently at v0.13.5; the next upstream patch release would be v0.13.6 rather than v0.13.1.

Verification

  • go test ./...
  • make test
  • PATH=/root/go/bin:/root/.cargo/bin:/root/.local/bin:$PATH make check

nix flake check was not runnable in this container because nix is not installed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant