-
Notifications
You must be signed in to change notification settings - Fork 34
Improve backoff #1920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve backoff #1920
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to improve client robustness by expanding the retry/backoff handling for transient network failures and API-side capacity limits.
Changes:
- Increased authentication-related retry attempts (max tries from 2 to 5).
- Added new retry policies for connection failures and
ExecutionQueueFullException. - Applied the new retry policies to
fetch_eventsand_execute_action_group_direct.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| retry_on_auth_error = backoff.on_exception( | ||
| backoff.expo, | ||
| (NotAuthenticatedException, ServerDisconnectedError, ClientConnectorError), | ||
| max_tries=2, | ||
| max_tries=5, | ||
| on_backoff=relogin, | ||
| ) | ||
|
|
||
| retry_on_connection_failure = backoff.on_exception( | ||
| backoff.expo, | ||
| (TimeoutError, ClientConnectorError), | ||
| max_tries=10, | ||
| ) |
Copilot
AI
Jan 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClientConnectorError is included in both retry_on_auth_error and the new retry_on_connection_failure. When both decorators are applied (e.g., fetch_events), retries become nested/multiplicative (up to 5 * 10 attempts) and relogin may run for pure connection failures. Consider making the exception sets disjoint (e.g., remove ClientConnectorError from retry_on_auth_error and handle it only via retry_on_connection_failure) so connection outages don't trigger re-auth and total retry budget is predictable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed
| @retry_on_concurrent_requests | ||
| @retry_on_auth_error | ||
| @retry_on_listener_error | ||
| @retry_on_connection_failure | ||
| async def fetch_events(self) -> list[Event]: |
Copilot
AI
Jan 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetch_events is a polling-style method, but with the current decorator stack a persistent ClientConnectorError/timeout can result in very long blocking retry sequences (nested retries across retry_on_connection_failure + retry_on_auth_error). After making exception sets disjoint, consider bounding retries for this method (e.g., max_time or lower max_tries) so callers can recover/continue their polling loop promptly when the network is down.
| @retry_on_too_many_executions | ||
| @retry_on_execution_queue_full | ||
| @retry_on_auth_error |
Copilot
AI
Jan 25, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New retry behavior for ExecutionQueueFullException is introduced here, but there doesn't appear to be unit coverage asserting that _execute_action_group_direct retries (and eventually succeeds/fails) when the API returns EXEC_QUEUE_FULL. Adding a test that forces __post to raise ExecutionQueueFullException a few times (and stubbing backoff sleep to keep the test fast) would prevent regressions and validate the retry budget.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
This pull request introduces improvements to the reliability and robustness of the
pyoverkiz/client.pyclient by enhancing retry logic for various network and API error scenarios. The changes mainly focus on increasing retry attempts and handling additional failure cases to reduce transient errors during client operations.Enhanced retry logic and error handling:
refresh_listenerfunction from 2 to 5, and added a new retry policy for connection failures (timeouts and connector errors) with up to 10 retries.ExecutionQueueFullExceptionwith up to 5 retries.Application of new retry policies to client methods:
fetch_eventsmethod to improve resilience against network issues._execute_action_group_directmethod to better handle API-side execution queue limitations.