-
Notifications
You must be signed in to change notification settings - Fork 66
Description
How we found this
We have a Y.js collaboration server that extends Server with RPC methods for document operations (read, patch, etc.). Chat agents in separate Durable Objects call these RPC methods to modify shared documents on behalf of users.
Initially we used idFromName() + .get() to obtain stubs for the Y.Doc DOs. This worked — until a DO hibernated. After hibernation, RPC calls would execute but changes wouldn't broadcast to connected WebSocket clients, because onStart() (which rehydrates connection tracking) never ran.
We switched to getServerByName() which fixed the broadcasting issue by triggering setName() via the x-partykit-room header fetch. But this introduced a new problem: importing from partyserver caused Vite's SSR dependency optimizer to pre-bundle the agents package (which depends on partyserver). In that pre-bundle context, cloudflare:workers is unavailable, so DurableObject is undefined and the Agent class fails with "Class extends value undefined is not a constructor or null".
We dug into the partyserver source to understand why idFromName doesn't work after hibernation and found the root cause: #_name is only restored via HTTP paths, never for RPC or alarm entry points. We're now working around this in our own subclass by persisting the room name to ctx.storage (details below), which lets us go back to plain idFromName + .get().
Description
After a Durable Object hibernates and wakes via an RPC call or alarm, Server.#_name is undefined and onStart() never runs. This means:
- RPC methods execute on an uninitialized server instance
alarm()calls#initialize()without restoring#_name, so any code inonStart()that readsthis.namethrows- Any state set up in
onStart()(connection tracking, persistence callbacks, etc.) is missing
The root cause is that #_name is only restored through two paths, both of which require HTTP:
fetch()— readsx-partykit-roomfrom request headerswebSocketMessage/Close/Error— readsconnection.serverfrom the WS attachment
RPC and alarm bypass both paths. The DO wakes, constructs a fresh instance, and the method executes immediately with no name and no initialization.
Reproduction
class MyServer extends Server {
static options = { hibernate: true }
async onStart() {
console.log("initialized:", this.name)
// Set up connection tracking, persistence, etc.
}
// Public RPC method
async getData() {
// After hibernation, this.name throws:
// "Attempting to read .name on MyServer before it was set"
return { room: this.name, data: "..." }
}
}
// Caller:
const id = env.MY_SERVER.idFromName("room-123")
const stub = env.MY_SERVER.get(id)
await stub.getData() // Fails after hibernationalarm() variant
class MyServer extends Server {
static options = { hibernate: true }
async onStart() {
// this.name is undefined here when waking from alarm
await this.ctx.storage.setAlarm(Date.now() + 60_000)
}
onAlarm() {
console.log("alarm for room:", this.name) // throws
}
}The alarm() handler at index.ts:795-802 calls #initialize() without setting #_name first.
Current workaround
Use getServerByName() before making RPC calls. This sends a fetch with x-partykit-room to trigger setName(). But this:
- Requires an extra HTTP round-trip before every RPC interaction
- Forces callers to import from
partyserver, which can cause bundler issues (e.g., Vite SSR pre-bundling pulls in theagentspackage which extendsDurableObjectfromcloudflare:workers, failing in non-Worker contexts) - Defeats the purpose of RPC (direct method calls without HTTP overhead)
Suggested fix
Persist the room name to ctx.storage and restore it on cold start:
const ROOM_KEY = "__partyserver:room"
class Server extends DurableObject {
constructor(ctx, env) {
super(ctx, env)
ctx.blockConcurrencyWhile(async () => {
const name = await ctx.storage.get(ROOM_KEY)
if (name) await this.setName(name)
})
}
async setName(name) {
// ... existing logic ...
this.#_name = name
// Persist for cold-start recovery
if (this.#status !== "started") {
await this.ctx.storage.put(ROOM_KEY, name)
await this.#initialize()
}
}
}This:
- Gates all entry points (RPC, alarm, fetch, WebSocket) via
blockConcurrencyWhileuntil initialization completes setName()is already idempotent for the same name, so WebSocket handlers that also call it are a harmless no-op- Cost is one
storage.get()per cold start - Makes
getServerByName()'s fetch hack unnecessary for callers that only need RPC - Fixes the
alarm()bug as well
This aligns with the existing TODO comments in the codebase:
- Line 60:
// TODO: fix this to use RPC - Line 415:
// TODO: this is a hack to set the server name, it'll be replaced with RPC later
Our local workaround
We implemented the storage-based fix in our own Server subclass (YServer), which lets us use plain idFromName + .get() again:
const ROOM_KEY = "__partyserver:room"
class YServer extends Server {
constructor(ctx, env) {
super(ctx, env)
ctx.blockConcurrencyWhile(async () => {
const name = await ctx.storage.get(ROOM_KEY)
if (name) await this.setName(name)
})
}
async onStart() {
await this.ctx.storage.put(ROOM_KEY, this.name)
// ... rest of initialization (load doc, rehydrate connections, etc.)
}
}This works well but ideally belongs in Server itself so all subclasses benefit — and so alarm() is also fixed without every consumer needing to implement the same pattern.
Relation to workerd#2240
The partyserver source references cloudflare/workerd#2240 as the reason the x-partykit-room header hack exists — Durable Objects don't expose their name via the runtime API. If ctx.id.name were available, Server could read it directly and none of this would be needed.
Until that platform fix lands, persisting the name to ctx.storage is the pragmatic workaround. It's the standard DO pattern (in-memory state is a cache, storage is the source of truth) and costs one storage.get() per cold start.
Environment
- partyserver: 0.1.5
- Cloudflare Workers with hibernatable WebSockets