feat(spanner): provide opt-in auto-tagging feature in client library#13214
feat(spanner): provide opt-in auto-tagging feature in client library#13214sakthivelmanii wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an automatic tagging feature for Spanner transactions and requests, allowing the library to derive tags from the application's stack trace when they are not explicitly provided. Key changes include the addition of AutoTagHelper for stack trace analysis and updates to AbstractReadContext, SessionImpl, and SpannerOptions to support and configure this mechanism. Review feedback focuses on performance optimizations, specifically suggesting that the auto-tag be cached within the request context to avoid redundant stack trace captures and recommending the use of StackWalker for more efficient frame retrieval. Additionally, a recommendation was made to defensively copy the package list in SpannerOptions to maintain immutability.
| RequestOptions buildRequestOptions(Options options) { | ||
| RequestOptions.Builder builder = options.toRequestOptionsProto(false).toBuilder(); | ||
| if (builder.getRequestTag().isEmpty()) { | ||
| String autoTag = AutoTagHelper.getAutoTag(session.getSpanner().getOptions()); |
There was a problem hiding this comment.
Calling AutoTagHelper.getAutoTag on every request within a ReadContext (such as a ReadOnlyTransaction) can be expensive due to the underlying stack trace capture. Since the auto-tag is likely to be identical for all requests within the same context, consider computing it once and caching it in an instance field to avoid redundant performance overhead for multi-request transactions.
| if (!options.isAutoTagTransactionsEnabled()) { | ||
| return null; | ||
| } | ||
| StackTraceElement[] stackTrace = new Throwable().getStackTrace(); |
There was a problem hiding this comment.
new Throwable().getStackTrace() is a relatively expensive operation as it captures the entire execution stack. While this is the standard approach for Java 8, for environments running on Java 9 or later, StackWalker provides a much more efficient way to retrieve a limited number of stack frames. Consider using StackWalker if the library's runtime environment supports it to reduce the latency impact of auto-tagging.
There was a problem hiding this comment.
Yes. but since we use animal-sniffer, it's not possible to do it without using reflection which might be difficult and adds latency given that we execute for every transaction
| public Builder setAutoTagTransactionsPackages(List<String> autoTagTransactionsPackages) { | ||
| this.autoTagTransactionsPackages = Preconditions.checkNotNull(autoTagTransactionsPackages); | ||
| return this; | ||
| } |
There was a problem hiding this comment.
The autoTagTransactionsPackages list should be defensively copied to ensure that the SpannerOptions instance remains immutable and thread-safe, even if the list passed to the builder is modified externally later.
public Builder setAutoTagTransactionsPackages(List<String> autoTagTransactionsPackages) {
this.autoTagTransactionsPackages =
Collections.unmodifiableList(
new ArrayList<>(Preconditions.checkNotNull(autoTagTransactionsPackages)));
return this;
}fadc3ec to
a09b9ce
Compare
| RequestOptions.Builder builder = options.toRequestOptionsProto(false).toBuilder(); | ||
| if (builder.getRequestTag().isEmpty()) { | ||
| if (this.cachedRequestTag == null) { | ||
| this.cachedRequestTag = AutoTagHelper.getAutoTag(session.getSpanner().getOptions()); |
There was a problem hiding this comment.
This seems to add auto-request tags, although the option name seems to indicate that this is only for transactions. So we either need to change the name of the option (and maybe also add this in more places than only here), or remove it from here.
Also, unconditionally calling getAutoTag(..) here without checking whether auto-tagging has been enabled, is not really readable. I had to look into getAutoTag(..) to check whether it checks the option first or not. Instead, we should add a check before calling the method. That would also fix a small caching error here: If this.cachedRequestTag == null, then we call getAutoTag(..) for every invocation. And while it returns quickly, it slightly defies the entire idea of caching it here.
| boolean isInternal = false; | ||
| for (String internalPackage : INTERNAL_PACKAGES) { | ||
| if (className.startsWith(internalPackage)) { | ||
| isInternal = true; | ||
| break; | ||
| } | ||
| } | ||
| if (isInternal) { | ||
| continue; | ||
| } | ||
| return formatTag(className, element.getMethodName()); |
There was a problem hiding this comment.
Can we simplify this a bit.
- Move the check whether it is an internal package to a separate helper method:
private static boolean isInternalPackage(String className) {
for (String internalPackage : INTERNAL_PACKAGES) {
if (className.startsWith(internalPackage)) {
return true;
}
}
return false;
}Then simplify the block to something like this:
if (isInternalPackage(className)) {
continue;
}
return formatTag(className, element.getMethodName());| if (tag.length() > 50) { | ||
| tag = tag.substring(0, 50); | ||
| } |
There was a problem hiding this comment.
This returns the 50 first characters if the generated name is too long. I think that taking the last 50 characters would make more sense, as that will then include the method name. I would expect that it is more useful for a customer to know the method name, than (part of) the package and class name. This is especially true if the reason that the 50-chars limit was exceeded was caused by a very long package name that is reused across the entire application.
| if (requestOptions.hasPriority() || requestOptions.hasTag()) { | ||
| String autoTag = null; | ||
| if (!requestOptions.hasTag()) { | ||
| autoTag = AutoTagHelper.getAutoTag(spanner.getOptions()); |
There was a problem hiding this comment.
I'm not sure if this really is the best place to capture the auto-tag. Do we know all the possible code paths for getting to this point? Could it be that it is also called from a worker thread in the client library? Or from some internal method that is called when a transaction is retried? Etc...
Would it maybe make more sense to move the actual capturing of the auto-tag as close as possible to the entry point in the client library. Best case would basically be the first line of code in the implementation of DatabaseClient#readWriteTransaction(..) (although that literally might not be feasible)
| RequestOptions.Builder optionsBuilder = | ||
| transactionOptions.toRequestOptionsProto(true).toBuilder(); | ||
| if (!transactionOptions.hasTag()) { | ||
| String autoTag = AutoTagHelper.getAutoTag(spanner.getOptions()); |
There was a problem hiding this comment.
Same question as above, especially as this is an async method: Is this really the best place to capture the auto-tag?
| return Boolean.parseBoolean(System.getenv("SPANNER_DISABLE_AUTO_TAGGING")) | ||
| || Boolean.parseBoolean(System.getProperty("spanner.disable_auto_tagging")); |
There was a problem hiding this comment.
I would rather expect this feature to be something that you want to be able to enable with an environment variable or system property. The most common case is:
- Someone writes an application and it works just fine in test.
- They deploy it to prod and stuff goes wrong and everyone panics
- During debugging, the question comes up: Do you have tagging?
- ...
Could we instead add an env var / system property that allows you to flip it both ways? (And should it then really always override what is set in code?)
| return this.options.tag(); | ||
| } | ||
| return null; | ||
| if (this.cachedTransactionTag == null) { |
There was a problem hiding this comment.
Here also the same as for request tag above: I think it would be much more readable if we do the check for whether auto-tagging is enabled here, instead of having getAutoTag(..) implicitly return null in that case. That then also fixes the (small) caching issue that repeatedly assigns null to this variable when auto-tagging is disabled.
| import org.junit.runners.JUnit4; | ||
|
|
||
| @RunWith(JUnit4.class) | ||
| public class MockAutoTaggingTest extends AbstractMockServerTest { |
There was a problem hiding this comment.
- These tests do not cover the async versions (so
runAsyncand if we also want to support requests;executeQueryAsync) - There are no tests for combining auto-tagging with explicit tags
- Can we test for truncation (auto-tag longer than 50 chars)
- Can we test the system property / env var function for disabling (and enabling?) the feature?
f036bcd to
0ea0f9d
Compare
|
Can we update both the pull request title and add a new paragraph at the start of the pull request description? The current title and description are very technical and 'internal', instead of focusing on what's in it for a customer. Suggestion: |
olavloite
left a comment
There was a problem hiding this comment.
LGTM, but please also take a look at #13214 (comment)
| // Verify transaction tag populated on ExecuteSqlRequest | ||
| assertEquals(1, mockSpanner.countRequestsOfType(ExecuteSqlRequest.class)); | ||
| ExecuteSqlRequest sqlRequest = mockSpanner.getRequestsOfType(ExecuteSqlRequest.class).get(0); | ||
| String transactionTag = sqlRequest.getRequestOptions().getTransactionTag(); |
There was a problem hiding this comment.
Can we add a test verification here that asserts that the statement does not contain a request tag? See also my comment above about my confusion as to why request tags are not being added to read/write transactions.
| } | ||
| if (getTransactionTag() != null) { | ||
| builder.setTransactionTag(getTransactionTag()); | ||
| } else if (session.getSpanner().getOptions().isAutoTagTransactionsEnabled() |
There was a problem hiding this comment.
I had to look long and hard to understand why this in in the end leads to the following behavior:
- Multi-use and single-use read-only transactions set a request tag when auto-tagging is enabled (unless an explicit tag has been set)
- Read/write transactions do not set a request tag. The reason for that is that
getTransactionTag()in most cases returns a non-null value when the transaction type isReadWriteTransaction.
The only case where getTransactionTag() returns null for a read/write transaction is when the following happens:
If AutoTagHelper.getAutoTag(...) cannot find a matching application stack frame:
- cachedTransactionTag gets set to "".
- getTransactionTag() returns null.
- buildRequestOptions falls into the else if block.
- cachedRequestTag evaluates AutoTagHelper.getAutoTag(...), which again returns null.
- cachedRequestTag gets set to "".
- Since cachedRequestTag is empty, no request tag is added.
Can we add a test verification to the MockAutoTaggingTest to verify this behavior, and prevent accidental changes in the future? (And/or make the behavior here a bit more explicit)
569c30a to
f829149
Compare
Spanner transaction and lock debugging heavily relies on transaction and request tags. This introduces an opt-in mechanism via `enableAutoTagTransactions()` in `SpannerOptions` to automatically discover and append tags from calling runtime stack frames when explicit tags are absent. Additionally, a dedicated emergency override environment variable `SPANNER_DISABLE_AUTO_TAGGING` is provided to allow disabling auto-tagging globally.
f829149 to
b85fb82
Compare
Problem & Context
Spanner transaction and lock debugging heavily relies on transaction and request tags. However, many customers do not explicitly include tags in their database client calls. During critical investigations or code freezes, modifying application code to add tags can be highly complex.
Solution & Architecture
This pull request provides an opt-in auto-tagging mechanism directly within the Spanner client library layer. When enabled, if an explicit tag is absent, the client library inspects the runtime calling stack to extract the application class and method, automatically formatting it into a descriptive tag (
SimpleClassName.methodName).Key Design Implementations
SpannerOptions.Builder#enableAutoTagTransactions().setAutoTagTransactionsPackages(List<String>).setAutoTagTransactionsTracerLimit(int)(defaults to 50).java.,javax.,io.grpc.,com.google.cloud.spanner., etc.).SPANNER_DISABLE_AUTO_TAGGINGenvironment variable andspanner.disable_auto_taggingsystem property.Performance & JMH Benchmarking Results
Since stack trace capture happens at runtime, performance optimization is highly critical. By leveraging
new Throwable().getStackTrace()to bypass thread security manager checks and VM thread lookups, and short-circuiting ignore-lists when custom packages are defined, overhead is kept well inside the single-digit microsecond budget.Benchmark Execution Details
Benchmark Mode Cnt Score Error Units AutoTagBenchmark.benchmarkGetAutoTag avgt 5 5.456 ± 0.186 us/op AutoTagBenchmark.benchmarkNewThrowableGetStackTrace avgt 5 5.340 ± 0.122 us/op
Microsecond Latency Impact
getAutoTag()):5.456 usper invocation.new Throwable().getStackTrace()):5.340 us.5.456 - 5.340 =0.116 us(116 nanoseconds).Key Insight: Over 97.8% of the execution time is spent entirely within the native JVM memory capture. The internal string inspection, matching, and 50-character truncation logic combined contributes just 116 nanoseconds of overhead, ensuring maximum throughput across high-frequency transactional pipelines.