1. Overview:
Implement mechanisms to control the frequency of operations performed by agents, particularly calls to external services like LLM providers and tools. This helps prevent exceeding API rate limits imposed by third-party services, ensures fair usage in multi-tenant environments, manages costs, and improves overall system stability by avoiding request throttling or blocking.
2. Goals:
- Provide configurable rate limiting strategies (e.g., requests per second/minute, token usage per minute).
- Allow defining rate limits at different granularities (e.g., per agent, per tool, per LLM provider, global).
- Integrate rate limiting checks seamlessly into the agent execution flow (before LLM calls and tool executions).
- Offer clear feedback or error handling when a rate limit is hit (e.g., delayed execution, specific error messages).
- Ensure the rate limiting mechanism is efficient and doesn't introduce significant overhead.
- Allow enabling/disabling rate limiting easily.
3. Proposed Architecture & Components:
RateLimiter Interface/Base Class: Defines the core methods for rate limiting (e.g., acquire(), check()). Concrete implementations could include:
TokenBucketLimiter: Classic token bucket algorithm.
LeakyBucketLimiter: Leaky bucket algorithm.
FixedWindowCounterLimiter: Simple counter within fixed time windows.
RateLimitManager: A central service (potentially part of VoltAgent or configurable per Agent) responsible for:
- Loading rate limit configurations.
- Instantiating and managing
RateLimiter instances based on configuration.
- Providing methods for agents/tools to check and acquire permits before making calls.
- Configuration: A way to define rate limit rules (e.g., in the agent options or a separate configuration file). This should specify:
- The scope (agent ID, tool name, provider type, 'global').
- The limit (e.g., 10 requests per minute).
- The strategy (e.g., 'token_bucket').
- Integration Points: Modify core agent logic to consult the
RateLimitManager:
- LLM Calls: Before calling
llm.generateText, llm.streamText, etc.
- Tool Calls: Within the
ToolManager or before executing a tool's _call method.
4. Affected Core Modules:
Agent: Core execution logic needs modification to check limits before LLM calls.
ToolManager / AgentTool: Tool execution logic needs modification to check limits.
LLMProvider: Might need adjustments or hooks to integrate checks.
VoltAgent / Agent Options: Configuration needs to be handled.
- Potentially new utility modules for rate limiter implementations.
5. Acceptance Criteria (Initial MVP):
- Users can configure a simple global rate limit (e.g., max N requests per minute) for all LLM calls across all agents.
- The framework prevents exceeding this limit by introducing delays or throwing specific errors.
- A basic
FixedWindowCounterLimiter is implemented.
- Configuration is possible via
Agent options.
- Documentation explains how to enable and configure the global LLM rate limit.
6. Potential Challenges & Considerations:
- Handling distributed rate limiting if
voltagent is run across multiple instances.
- Accurately measuring token usage for token-based limits, especially with streaming.
- Choosing appropriate default limits and strategies.
- Balancing strictness of limits with agent responsiveness.
- Providing clear and actionable feedback to the developer/user when limits are hit.
- Performance impact of the rate limiting checks.
1. Overview:
Implement mechanisms to control the frequency of operations performed by agents, particularly calls to external services like LLM providers and tools. This helps prevent exceeding API rate limits imposed by third-party services, ensures fair usage in multi-tenant environments, manages costs, and improves overall system stability by avoiding request throttling or blocking.
2. Goals:
3. Proposed Architecture & Components:
RateLimiterInterface/Base Class: Defines the core methods for rate limiting (e.g.,acquire(),check()). Concrete implementations could include:TokenBucketLimiter: Classic token bucket algorithm.LeakyBucketLimiter: Leaky bucket algorithm.FixedWindowCounterLimiter: Simple counter within fixed time windows.RateLimitManager: A central service (potentially part ofVoltAgentor configurable perAgent) responsible for:RateLimiterinstances based on configuration.RateLimitManager:llm.generateText,llm.streamText, etc.ToolManageror before executing a tool's_callmethod.4. Affected Core Modules:
Agent: Core execution logic needs modification to check limits before LLM calls.ToolManager/AgentTool: Tool execution logic needs modification to check limits.LLMProvider: Might need adjustments or hooks to integrate checks.VoltAgent/Agent Options: Configuration needs to be handled.5. Acceptance Criteria (Initial MVP):
FixedWindowCounterLimiteris implemented.Agentoptions.6. Potential Challenges & Considerations:
voltagentis run across multiple instances.