Skip to content

feat: client side tracing upload path#545

Open
hweawer wants to merge 24 commits intomasterfrom
feat/tracing-origin-client-side
Open

feat: client side tracing upload path#545
hweawer wants to merge 24 commits intomasterfrom
feat/tracing-origin-client-side

Conversation

@hweawer
Copy link
Collaborator

@hweawer hweawer commented Jan 26, 2026

Summary

Implement end-to-end distributed tracing for Docker blob upload paths in Kraken, enabling observability from proxy → origin. This PR adds OpenTelemetry instrumentation to the client upload flow with hierarchical span creation and trace context propagation.

Motivation

Debugging slow or failed Docker pushes currently requires piecing together logs across multiple services with no correlation. This PR enables:

  • End-to-end visibility: Trace a blob upload from Docker client through proxy, origin, and writeback
  • Performance analysis: Identify bottlenecks in chunked uploads, replication, or backend writes
  • Retry debugging: See which origins were attempted and why they failed

@hweawer hweawer force-pushed the feat/tracing-origin-client-side branch from 3ac9e8f to f1c7884 Compare January 26, 2026 14:12
@hweawer hweawer changed the base branch from feat/tracing-writeback-executor-logging to feat/tracing-httputil-infrastructure January 26, 2026 14:12
@hweawer hweawer self-assigned this Jan 26, 2026
if err := t.originCluster.UploadBlob(ctx, namespace, d, blob); err != nil {
t.failureStats.Counter("upload_blob").Inc(1)
span.RecordError(err)
span.SetStatus(codes.Error, "upload failed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering we will be trying to debug which part of the upload failed, does it make sense to enrich this err message?

// UploadBlob uploads blob to origin cluster. See Client.UploadBlob for more details.
func (c *clusterClient) UploadBlob(namespace string, d core.Digest, blob io.ReadSeeker) (err error) {
func (c *clusterClient) UploadBlob(ctx context.Context, namespace string, d core.Digest, blob io.ReadSeeker) (err error) {
ctx, span := otel.Tracer("kraken-origin-cluster").Start(ctx, "cluster.upload_blob",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering the normal client's tracer is called kraken-origin-client, wouldnt it make sense to call this one kraken-origin-cluster-client?

@hweawer hweawer changed the base branch from feat/tracing-httputil-infrastructure to master February 12, 2026 09:35
Copilot AI review requested due to automatic review settings February 12, 2026 09:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds OpenTelemetry-based tracing to the Docker blob upload flow so a blob push can be correlated across proxy → origin via propagated trace context and structured spans.

Changes:

  • Introduces httputil.SendTracingContext to wrap HTTP transports with otelhttp for trace propagation (and HTTP client spans).
  • Threads context.Context through blob upload APIs (blobclient.Client.UploadBlob, blobclient.ClusterClient.UploadBlob, server replicate-to-remote) and updates relevant tests/mocks.
  • Adds upload-path spans/attributes in the proxy transferer and origin cluster/origin clients.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
utils/httputil/httputil.go Adds SendTracingContext option and wraps transport with otelhttp.NewTransport.
origin/blobclient/uploader.go Propagates tracing context through chunked upload HTTP calls (POST/PATCH/PUT).
origin/blobclient/client.go Adds tracer + span around upload and changes UploadBlob signature to accept context.Context.
origin/blobclient/cluster_client.go Adds client-side span around multi-origin upload attempts and threads context into per-origin UploadBlob calls.
lib/dockerregistry/transfer/rw_transferer.go Starts a new span for registry upload and passes context into origin cluster upload.
origin/blobserver/server.go Passes request context into replicate-to-remote upload call.
origin/blobserver/server_test.go Updates tests for new context parameter and gomock expectations.
origin/blobserver/cluster_client_test.go Updates tests to pass context and adjust gomock expectations.
mocks/origin/blobclient/client.go Updates mock for Client.UploadBlob(ctx, ...).
mocks/origin/blobclient/clusterclient.go Updates mock for ClusterClient.UploadBlob(ctx, ...).
lib/persistedretry/writeback/task.go Minor formatting change.
Comments suppressed due to low confidence (1)

origin/blobclient/uploader.go:178

  • runChunkedUploadHelper calls patch once per chunk; because each patch uses SendTracingContext, this will generate an otelhttp client span for every chunk request. For large uploads or small chunk sizes this can create very large traces and high telemetry volume. Consider propagating trace headers without creating per-chunk HTTP spans (or gating per-chunk spans behind sampling/config), while keeping higher-level spans for the overall upload.
func (c *uploadClient) patch(
	d core.Digest, uid string, start, stop int64, chunk io.Reader,
) error {
	_, err := httputil.Patch(
		fmt.Sprintf("http://%s/namespace/%s/blobs/%s/uploads/%s",
			c.addr, url.PathEscape(c.namespace), d, uid),
		httputil.SendTracingContext(c.ctx),
		httputil.SendBody(chunk),
		httputil.SendHeaders(map[string]string{
			"Content-Range": fmt.Sprintf("%d-%d", start, stop),
		}),
		httputil.SendTLS(c.tls))

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +287 to +295
// SendTracingContext sets the context and enables OpenTelemetry trace context
// propagation via HTTP headers. The transport will be wrapped with otelhttp
// after all options are applied.
func SendTracingContext(ctx context.Context) SendOption {
return func(o *sendOptions) {
o.ctx = ctx
o.tracingContext = true
}
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SendTracingContext wraps the request transport with otelhttp.NewTransport, which does more than header propagation (it also creates an HTTP client span per request). The comment currently says it only enables propagation; please update the doc to reflect span creation, or change the implementation to inject trace headers without generating per-request spans if propagation-only is the intent.

Copilot uses AI. Check for mistakes.

UploadBlob(namespace string, d core.Digest, blob io.Reader) error
UploadBlob(ctx context.Context, namespace string, d core.Digest, blob io.Reader) error
DuplicateUploadBlob(namespace string, d core.Digest, blob io.Reader, delay time.Duration) error
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface change requires updating all call sites to pass a context. The repo still has callers using the old signature (e.g., origin/blobclient/client_test.go calls client.UploadBlob(namespace, d, ...)), which will fail to compile.

Suggested change
DuplicateUploadBlob(namespace string, d core.Digest, blob io.Reader, delay time.Duration) error
DuplicateUploadBlob(ctx context.Context, namespace string, d core.Digest, blob io.Reader, delay time.Duration) error

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants