Skip to content

Memory + listener leak in firebase-frameworks Next.js adapter — nextApp.prepare() called per request causes process listener accumulation and OOM on long-lived Cloud Run instances #574

@MarkiErik

Description

@MarkiErik

Listener and heap accumulation in firebase-frameworks Next.js adapter — nextApp.prepare() re-initialization on every request

A. Problem

firebase-frameworks/dist/next.js/index.js calls await nextApp.prepare() on every request inside its handle() callback. On long-lived Cloud Run instances (Firebase Hosting frameworksBackend) we observe:

  • Linear growth of process.listenerCount('uncaughtException') and process.listenerCount('unhandledRejection'), ~1.12 each per request
  • Linear growth of heap, ~0.65 MB per request
  • Consistent Reached heap limit OOM after ~5500 requests on a 4 GiB instance

Production frequency before any mitigation on our service: roughly 50 OOM crashes per 24 hours.

Telemetry from production (Next 14.2.35, firebase-frameworks 0.11.8)

We instrumented process.on/.addListener/.prependListener to count and stack-trace each listener registration, plus periodic process.memoryUsage() snapshots:

Requests unhandledRejection listeners uncaughtException listeners Heap used RSS
50 52 52 173 MB 327 MB
100 102 102 172 MB 353 MB
200 202 202 265 MB 427 MB
500 530 530 452 MB 619 MB
1450 1573 1573 994 MB 1187 MB

Listener-source stack trace, repeating identically on every request:

at initialize (node_modules/next/dist/server/lib/router-server.js:438:13)
at async NextCustomServer.prepare (node_modules/next/dist/server/next.js:242:28)
at async handle (firebase-frameworks/dist/next.js/index.js:13:5)
at async handle (firebase-frameworks/dist/next.js/firebase-aware.js:9:5)

The listener registrations are the most easily observable signal. They grow at the same rate as heap, suggesting the retained state comes from each fresh initialize() call constructing the RenderServer, FsChecker, route resolver, and request handler tree without the previous one being fully released. We have not produced a heap-snapshot-level retain-graph, only the linear correlation between listener count and heap.

B. Root cause hypothesis

We believe the listener growth is caused by repeated prepare() re-initialization in the current adapter flow.

firebase-frameworks calls prepare() in a hot request path. In recent Next.js versions, prepare() performs heavier initialization work than it did in earlier releases — NextCustomServer.prepare() invokes routerserver.initialize(), which attaches process.on('uncaughtException', …) and process.on('unhandledRejection', …) and constructs the full router-server tree on each call. Next.js does not memoize this for the custom-server flow, so each await nextApp.prepare() from the adapter produces a fresh initialization.

In short: we read this as a lifecycle/contract mismatch between the adapter and Next.js' custom-server prepare() semantics, rather than an isolated bug in either side.

C. Why this regressed after Next.js 13.4.15

In Next.js 13.4.12 and earlier, NextCustomServer.prepare() was lightweight in production mode — it intentionally short-circuited because "we shouldn't prepare the server in production". There was no router-server.js and no per-call listener registration. Calling prepare() repeatedly was effectively a no-op.

Next.js 13.4.15 introduced the new router-server architecture and the NextCustomServer subclass whose prepare() performs the heavy getRequestHandlers() / routerserver.initialize() cycle on every call. From that release onward, repeated prepare() calls accumulate listeners and retained init state.

This also matches the user reports in vercel/next.js#54104 and firebase/firebase-tools#6349 (both closed without resolution), where downgrading Next.js to 13.4.12 was the only known fix.

This repo also previously had an LRU cache of nextApp instances (packages/firebase-frameworks/src/next.js/index.ts before #122) which would have masked the issue. PR #122 (Nov 2023) removed it as "no longer needed"; with the post-13.4.15 prepare() semantics it is no longer redundant.

D. Production workaround (not proposed as upstream implementation)

To stop the bleeding while a clean upstream fix is discussed, we monkey-patch NextCustomServer.prototype.prepare from inside the generated server.js so that the prepare-Promise is memoized per NextCustomServer instance via a WeakMap. This relies on undocumented Next.js prototype access and is intentionally local to our deployment. We are not proposing this approach upstream.

After the workaround, on the same instrumentation, sustained 4400 requests in 39 minutes on a single warm instance:

Metric Before After workaround
unhandledRejection listeners 1573 (linear growth) 3 (constant)
uncaughtException listeners 1573 (linear growth) 3 (constant)
Heap used 994 MB and growing 149 MB, oscillating with GC
RSS 1187 MB 327 MB
OOM events continuous 0

hitRatio (fraction of prepare() calls served from the memoized Promise) reached 0.9998 — exactly one cold initialization per Cloud Run instance, every subsequent call returns the memoized Promise.

E. Proposed adapter-level fix

A minimal module-level memoization in packages/firebase-frameworks/src/next.js/index.ts. This does not touch any Next.js internals and uses only the public custom-server API:

const nextApp = createNextServer({
    dev: false,
    dir: process.cwd(),
    hostname: "0.0.0.0",
    port: 8080,
});

const preparePromise = nextApp.prepare(); // once at module load

export const handle = async (req: Request, res: Response): Promise<void> => {
    await preparePromise;
    const parsedUrl = parse(req.url, true);
    const incomingMessage = incomingMessageFromExpress(req);
    await nextApp.getRequestHandler()(incomingMessage, res, parsedUrl);
};

This shape:

  • aligns with the Next.js custom-server usage pattern shown in the documented examples, where app.prepare() is awaited once before the server starts handling requests
  • aligns with the historical behavior of this adapter prior to Possible fix for readable being spent in functions #122 (which had the per-instance LRU cache effectively memoizing the prepared state)
  • aligns with the singleton server model used by Next.js' own standalone runtime

It does change two things relative to the current adapter, both of which we think are improvements but want to call out explicitly:

  • prepare() is now invoked once at module load, not per request. This is the whole intent of the change.
  • Rejection of prepare() propagates to all subsequent requests on the same instance (since they all await the same Promise), rather than each request retrying prepare() independently. We think this is desirable on Cloud Run / Cloud Functions: a failed cold init means the instance is unhealthy, and the platform will replace it. Per-request retry of a fundamentally broken init would risk init-storm CPU spikes and obscure the underlying failure. Happy to discuss alternative error semantics if the team has a different view.

An equivalent alternative is restoring the LRU cache from before #122, which would also preserve multi-tenant ergonomics where multiple nextApp instances exist per process. Either shape eliminates the per-request init.

Happy to open a PR once the team has a preference on direction.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions