Bug description
When VirtualMCPServer.spec.sessionStorage is enabled and a gateway pod restart causes a session-cache miss, factory.RestoreSession reconstructs identity from storedMetadata[MetadataKeyIdentitySubject] only — leaving identity.Token empty. Token-exchange backends then short-circuit on identity has no token, every backend init fails, and the restored session is returned with count=0 tools. The bearer that the OIDC middleware just validated on the triggering request is sitting on the request context, unused.
Root cause
MultiSessionGetter.GetMultiSession(sessionID string) (pkg/vmcp/server/sessionmanager/session_manager.go:633) is the entry point for subsequent requests. Its signature takes only the session ID — no context.Context, no *http.Request, no auth.Identity. There's an in-file TODO at line 631 acknowledging the context-propagation gap.
On a cache miss, loadSession → factory.RestoreSession runs with no access to the live request. RestoreSession (pkg/vmcp/session/factory.go:537, identity reconstruction at 563–567) therefore reconstructs from stored metadata only:
var identity *auth.Identity
if subject := storedMetadata[MetadataKeyIdentitySubject]; subject != "" {
identity = &auth.Identity{}
identity.Subject = subject
}
The comment at line 560 makes the storage intent explicit: "The original bearer token is never persisted… so Token is empty." That part is fine. The gap is that the restore path has no fallback to the live identity on the triggering request.
The tokenless identity flows into makeBaseSession → backend connector → pkg/vmcp/auth/strategies/tokenexchange.go, where if identity.Token == "" { return "identity has no token" } fails every backend init.
Steps to reproduce
- Deploy a
VirtualMCPServer with incomingAuth.type: oidc, at least one backend MCPServer with externalAuthConfigRef of type tokenExchange, and spec.sessionStorage pointing at Redis.
- Connect a Streamable HTTP client (e.g. Claude Code) and successfully list/call tools.
- Cycle the gateway pod (
kubectl rollout restart, operator reconcile, image bump).
- With the same client still connected, send another tool request.
Expected behavior
The next request restores the session from Redis and uses the live bearer on the request context to mint per-backend exchanged tokens. Tool calls succeed without the client re-authenticating.
Actual behavior
Same session_id is restored from Redis, but audit logs show subjects.user: "anonymous". Pre-call logs show identity has no token for every backend, then All backends failed to initialise; session will have no capabilities, then prefix strategy created unique tools count=0. The next tool call returns tool not found. Manual /mcp re-auth fixes it.
Multi-replica symptom (same root cause)
In multi-replica deployments the same bug manifests as cross-pod cache eviction: pod B's failed RestoreSession returns a session with an empty backend list, loadSession writes that empty-list metadata back to Redis (session_manager.go:738), pod A's checkSession sees MetadataKeyBackendIDs drift and evicts its working session, and the next request to pod A returns "session is closed". The eviction mechanism itself (drift-based propagation of legitimate backend membership changes) is intentional and should stay — what's wrong is that pod B's restore failed in the first place. Fixing the identity restore should resolve this symptom too.
Out of scope
- Persisting exchanged backend tokens or refresh tokens. The per-backend tokens minted at restore still have their own expirations; session-scoped token bookkeeping with refresh is a separate piece of work and not required for this fix.
Acceptance criteria
Environment
- ToolHive operator v0.27.0
- Redis (Bitnami 25.4.1) for
sessionStorage
- Streamable HTTP client (Claude Code), OIDC (Keycloak), per-backend
tokenExchange
Additional context
Originally reported by Gaston in the Discord forum post with full logs and timeline.
Bug description
When
VirtualMCPServer.spec.sessionStorageis enabled and a gateway pod restart causes a session-cache miss,factory.RestoreSessionreconstructs identity fromstoredMetadata[MetadataKeyIdentitySubject]only — leavingidentity.Tokenempty. Token-exchange backends then short-circuit onidentity has no token, every backend init fails, and the restored session is returned withcount=0tools. The bearer that the OIDC middleware just validated on the triggering request is sitting on the request context, unused.Root cause
MultiSessionGetter.GetMultiSession(sessionID string)(pkg/vmcp/server/sessionmanager/session_manager.go:633) is the entry point for subsequent requests. Its signature takes only the session ID — nocontext.Context, no*http.Request, noauth.Identity. There's an in-file TODO at line 631 acknowledging the context-propagation gap.On a cache miss,
loadSession→factory.RestoreSessionruns with no access to the live request.RestoreSession(pkg/vmcp/session/factory.go:537, identity reconstruction at 563–567) therefore reconstructs from stored metadata only:The comment at line 560 makes the storage intent explicit: "The original bearer token is never persisted… so Token is empty." That part is fine. The gap is that the restore path has no fallback to the live identity on the triggering request.
The tokenless identity flows into
makeBaseSession→ backend connector →pkg/vmcp/auth/strategies/tokenexchange.go, whereif identity.Token == "" { return "identity has no token" }fails every backend init.Steps to reproduce
VirtualMCPServerwithincomingAuth.type: oidc, at least one backendMCPServerwithexternalAuthConfigRefof typetokenExchange, andspec.sessionStoragepointing at Redis.kubectl rollout restart, operator reconcile, image bump).Expected behavior
The next request restores the session from Redis and uses the live bearer on the request context to mint per-backend exchanged tokens. Tool calls succeed without the client re-authenticating.
Actual behavior
Same
session_idis restored from Redis, but audit logs showsubjects.user: "anonymous". Pre-call logs showidentity has no tokenfor every backend, thenAll backends failed to initialise; session will have no capabilities, thenprefix strategy created unique tools count=0. The next tool call returnstool not found. Manual/mcpre-auth fixes it.Multi-replica symptom (same root cause)
In multi-replica deployments the same bug manifests as cross-pod cache eviction: pod B's failed
RestoreSessionreturns a session with an empty backend list,loadSessionwrites that empty-list metadata back to Redis (session_manager.go:738), pod A'scheckSessionseesMetadataKeyBackendIDsdrift and evicts its working session, and the next request to pod A returns"session is closed". The eviction mechanism itself (drift-based propagation of legitimate backend membership changes) is intentional and should stay — what's wrong is that pod B's restore failed in the first place. Fixing the identity restore should resolve this symptom too.Out of scope
Acceptance criteria
Environment
sessionStoragetokenExchangeAdditional context
Originally reported by Gaston in the Discord forum post with full logs and timeline.