Expose copy default transient error list#3903
Expose copy default transient error list#3903MatthiasHuygelen wants to merge 9 commits intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR exposes the previously private default transient error list as a public property to address issue #2342, allowing consumers to extend the error code list without needing to copy/paste the base set from the repository.
Changes:
- Adds a new public property
DefaultTransientErrorstoSqlConfigurableRetryFactorythat returns a read-only copy of the default transient error list - Removes duplicated error list from test helper class
RetryLogicTestHelperand migrates to use the new public API - Adds XML documentation for the new public property and documents additional error codes (64, 20, 0, -2, 207, 18456, 42108, 42109)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs | Adds public DefaultTransientErrors property that returns a ReadOnlyCollection<int> copy of the private transient error list; adds missing error codes to the private list |
| src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/RetryLogic/RetryLogicTestHelper.cs | Removes duplicated private error list and updates code to reference the new public SqlConfigurableRetryFactory.DefaultTransientErrors property |
| doc/snippets/Microsoft.Data.SqlClient/SqlConfigurableRetryFactory.xml | Adds XML documentation for the new property and documents the previously undocumented error codes |
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
@dotnet-policy-service agree |
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
paulmedynski
left a comment
There was a problem hiding this comment.
Thanks for the submission. The team will need to discuss the public API changes, and I have one implementation suggestion.
| }; | ||
| 233, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) | ||
| 64, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: TCP Provider, error: 0 - The specified network name is no longer available.) | ||
| 20, // The instance of SQL Server you attempted to connect to does not support encryption. |
There was a problem hiding this comment.
I'm not sure how this error (or 207 or 18456) are transient. I'll discuss with the team.
| 997, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - Overlapped I/O operation is in progress) | ||
| 233 // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) | ||
| }; | ||
| 233, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) |
There was a problem hiding this comment.
These changes constitute a public API change (altering which errors we consider transient by default), so this will need some extra scrutiny.
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
doc/snippets/Microsoft.Data.SqlClient/SqlConfigurableRetryFactory.xml
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netfx/ref/Microsoft.Data.SqlClient.cs
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netcore/ref/Microsoft.Data.SqlClient.cs
Outdated
Show resolved
Hide resolved
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
doc/snippets/Microsoft.Data.SqlClient/SqlConfigurableRetryFactory.xml
Outdated
Show resolved
Hide resolved
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
|
@David-Engel Since this changes our public API, could you give this a look to see if it's something we should take? |
benrr101
left a comment
There was a problem hiding this comment.
I like it and don't see any issues - just want to get some PM approval on it before we add to our public API!
|
#649 also discusses transient errors. |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #3903 +/- ##
==========================================
- Coverage 76.66% 69.45% -7.22%
==========================================
Files 269 263 -6
Lines 43246 66178 +22932
==========================================
+ Hits 33156 45962 +12806
- Misses 10090 20216 +10126
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
My main hesitancy is I don't want changes to the list to be considered breaking changes. We already have enough things that require us to major vbump. If we can expose them in a way that changes to the error list is not considered breaking, I'm all for it. |
|
@paulmedynski So how should we proceed? Are these considered breaking? I've only added the ones that were already used in the tests. |
|
Hi @MatthiasHuygelen - The team will be meeting to discuss on Thursday, and I'll post an update after that. |
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
cheenamalhotra
left a comment
There was a problem hiding this comment.
Do not add new error codes as transient without documentation proofs.
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
paulmedynski
left a comment
There was a problem hiding this comment.
We have discussed these changes, and would like to back out the additions to the error code list. All of the other changes are great!
doc/snippets/Microsoft.Data.SqlClient/SqlConfigurableRetryFactory.xml
Outdated
Show resolved
Hide resolved
src/Microsoft.Data.SqlClient/netcore/ref/Microsoft.Data.SqlClient.cs
Outdated
Show resolved
Hide resolved
...osoft.Data.SqlClient/src/Microsoft/Data/SqlClient/Reliability/SqlConfigurableRetryFactory.cs
Outdated
Show resolved
Hide resolved
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
| /// Default known transient error numbers. | ||
| private static readonly HashSet<int> s_defaultTransientErrors | ||
| = new HashSet<int> | ||
| { | ||
| /// <include file='../../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlConfigurableRetryFactory.xml' path='docs/members[@name="SqlConfigurableRetryFactory"]/IntrinsicTransientErrors/*' /> | ||
| public static ReadOnlyCollection<int> IntrinsicTransientErrors { get; } | ||
| = new( | ||
| [ | ||
| 233, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) | ||
| 997, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - Overlapped I/O operation is in progress) | ||
| 1204, // The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions. | ||
| 1205, // Transaction (Process ID) was deadlocked on resources with another process and has been chosen as the deadlock victim. Rerun the transaction | ||
| 1222, // Lock request time out period exceeded. | ||
| 49918, // Cannot process request. Not enough resources to process request. | ||
| 49919, // Cannot process create or update request. Too many create or update operations in progress for subscription "%ld". | ||
| 49920, // Cannot process request. Too many operations in progress for subscription "%ld". | ||
| 4060, // Cannot open database "%.*ls" requested by the login. The login failed. | ||
| 4221, // Login to read-secondary failed due to long wait on 'HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING'. The replica is not available for login because row versions are missing for transactions that were in-flight when the replica was recycled. The issue can be resolved by rolling back or committing the active transactions on the primary replica. Occurrences of this condition can be minimized by avoiding long write transactions on the primary. | ||
| 10060, // An error has occurred while establishing a connection to the server. When connecting to SQL Server, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.) (Microsoft SQL Server, Error: 10060) | ||
| 10928, // Resource ID: %d. The %s limit for the database is %d and has been reached. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. | ||
| 10929, // Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. However, the server is currently too busy to support requests greater than %d for this database. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. Otherwise, please try again later. | ||
| 40143, // The service has encountered an error processing your request. Please try again. | ||
| 40613, // Database '%.*ls' on server '%.*ls' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '%.*ls'. | ||
| 40197, // The service has encountered an error processing your request. Please try again. Error code %d. | ||
| 40501, // The service is currently busy. Retry the request after 10 seconds. Incident ID: %ls. Code: %d. | ||
| 40540, // The service has encountered an error processing your request. Please try again. | ||
| 40197, // The service has encountered an error processing your request. Please try again. Error code %d. | ||
| 40613, // Database '%.*ls' on server '%.*ls' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '%.*ls'. | ||
| 42108, // Can not connect to the SQL pool since it is paused. Please resume the SQL pool and try again. | ||
| 42109, // The SQL pool is warming up. Please try again. | ||
| 10929, // Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. However, the server is currently too busy to support requests greater than %d for this database. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. Otherwise, please try again later. | ||
| 10928, // Resource ID: %d. The %s limit for the database is %d and has been reached. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. | ||
| 10060, // An error has occurred while establishing a connection to the server. When connecting to SQL Server, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.) (Microsoft SQL Server, Error: 10060) | ||
| 997, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - Overlapped I/O operation is in progress) | ||
| 233 // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) | ||
| }; | ||
| 49918, // Cannot process request. Not enough resources to process request. | ||
| 49919, // Cannot process create or update request. Too many create or update operations in progress for subscription "%ld". | ||
| 49920 // Cannot process request. Too many operations in progress for subscription "%ld". | ||
| ]); |
There was a problem hiding this comment.
Changing the default transient error collection from a HashSet<int> to a ReadOnlyCollection<int> means TransientErrorsCondition now does linear Contains checks for the built-in defaults instead of hash-based lookups. To preserve the previous performance characteristics for the intrinsic default while still exposing a read-only API, consider keeping a private HashSet<int> for membership tests and exposing IntrinsicTransientErrors as a read-only view over that set (or otherwise backing the public property with a hash-based collection).
|
|
||
| using System; | ||
| using System.Collections.Generic; | ||
| using System.Collections.ObjectModel; |
There was a problem hiding this comment.
System.Collections.ObjectModel is imported but not used in this file, which will generate an unnecessary using-warning. Please remove this namespace import if it is not needed.
| using System.Collections.ObjectModel; |
| /// Default known transient error numbers. | ||
| private static readonly HashSet<int> s_defaultTransientErrors | ||
| = new HashSet<int> | ||
| { | ||
| /// <include file='../../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlConfigurableRetryFactory.xml' path='docs/members[@name="SqlConfigurableRetryFactory"]/IntrinsicTransientErrors/*' /> | ||
| public static ReadOnlyCollection<int> IntrinsicTransientErrors { get; } | ||
| = new( | ||
| [ | ||
| 233, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) | ||
| 997, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - Overlapped I/O operation is in progress) | ||
| 1204, // The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions. | ||
| 1205, // Transaction (Process ID) was deadlocked on resources with another process and has been chosen as the deadlock victim. Rerun the transaction | ||
| 1222, // Lock request time out period exceeded. | ||
| 49918, // Cannot process request. Not enough resources to process request. | ||
| 49919, // Cannot process create or update request. Too many create or update operations in progress for subscription "%ld". | ||
| 49920, // Cannot process request. Too many operations in progress for subscription "%ld". | ||
| 4060, // Cannot open database "%.*ls" requested by the login. The login failed. | ||
| 4221, // Login to read-secondary failed due to long wait on 'HADR_DATABASE_WAIT_FOR_TRANSITION_TO_VERSIONING'. The replica is not available for login because row versions are missing for transactions that were in-flight when the replica was recycled. The issue can be resolved by rolling back or committing the active transactions on the primary replica. Occurrences of this condition can be minimized by avoiding long write transactions on the primary. | ||
| 10060, // An error has occurred while establishing a connection to the server. When connecting to SQL Server, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.) (Microsoft SQL Server, Error: 10060) | ||
| 10928, // Resource ID: %d. The %s limit for the database is %d and has been reached. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. | ||
| 10929, // Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. However, the server is currently too busy to support requests greater than %d for this database. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. Otherwise, please try again later. | ||
| 40143, // The service has encountered an error processing your request. Please try again. | ||
| 40613, // Database '%.*ls' on server '%.*ls' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '%.*ls'. | ||
| 40197, // The service has encountered an error processing your request. Please try again. Error code %d. | ||
| 40501, // The service is currently busy. Retry the request after 10 seconds. Incident ID: %ls. Code: %d. | ||
| 40540, // The service has encountered an error processing your request. Please try again. | ||
| 40197, // The service has encountered an error processing your request. Please try again. Error code %d. | ||
| 40613, // Database '%.*ls' on server '%.*ls' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of '%.*ls'. | ||
| 42108, // Can not connect to the SQL pool since it is paused. Please resume the SQL pool and try again. | ||
| 42109, // The SQL pool is warming up. Please try again. | ||
| 10929, // Resource ID: %d. The %s minimum guarantee is %d, maximum limit is %d and the current usage for the database is %d. However, the server is currently too busy to support requests greater than %d for this database. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. Otherwise, please try again later. | ||
| 10928, // Resource ID: %d. The %s limit for the database is %d and has been reached. For more information, see http://go.microsoft.com/fwlink/?LinkId=267637. | ||
| 10060, // An error has occurred while establishing a connection to the server. When connecting to SQL Server, this failure may be caused by the fact that under the default settings SQL Server does not allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.) (Microsoft SQL Server, Error: 10060) | ||
| 997, // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Named Pipes Provider, error: 0 - Overlapped I/O operation is in progress) | ||
| 233 // A connection was successfully established with the server, but then an error occurred during the login process. (provider: Shared Memory Provider, error: 0 - No process is on the other end of the pipe.) (Microsoft SQL Server, Error: 233) | ||
| }; | ||
| 49918, // Cannot process request. Not enough resources to process request. | ||
| 49919, // Cannot process create or update request. Too many create or update operations in progress for subscription "%ld". | ||
| 49920 // Cannot process request. Too many operations in progress for subscription "%ld". | ||
| ]); |
There was a problem hiding this comment.
The new public IntrinsicTransientErrors property does not appear to be directly covered by tests (for example, verifying that it matches the built-in default transient error set and that the returned collection cannot be mutated). Given that other members of SqlConfigurableRetryFactory are exercised by tests, please consider adding a focused test for this property so regressions in the intrinsic error list or its immutability are caught early.
|
@paulmedynski I also considered Open to other suggestions, too. |
Description
The default set of transient error codes is private. This seems to make it difficult to extend error codes without copy/pasting the base set from this repository. So we expose a copy.
Issues
#2342