Skip to content

perf: Optimize page splitting and relayout, so both per-iteration costs drop from O(N^2) to O(N)#3390

Open
exoego wants to merge 5 commits into
diegomura:masterfrom
exoego:optimize-pagenation
Open

perf: Optimize page splitting and relayout, so both per-iteration costs drop from O(N^2) to O(N)#3390
exoego wants to merge 5 commits into
diegomura:masterfrom
exoego:optimize-pagenation

Conversation

@exoego
Copy link
Copy Markdown
Contributor

@exoego exoego commented Apr 16, 2026

This PR drastically optimizes the pagination mentioned in

#3367 (review)
The major performance bottleneck is pagination, but the real improvement won't be avoiding 1 or 2 layout steps but at this point I need to fully redesign the algorithm to be O(N). But I been struggling to do so

I've added Vitest benchmark for pagination
(yarn vitest bench packages/layout/tests/steps/resolvePagination.bench.ts)
and compared p999 duration msec:

Num of elements Before After Speedup Scaling (Before) Scaling (After)
100 (~10 pages) 5.75 ms 2.49 ms 2.3x 1x 1x
500 (~50 pages) 127 ms 12.9 ms 9.8x 22x ≒ 25(5^2) x K 5.18x
1000 (~100 pages) 635 ms 34.2 ms 17.9x 110x ≒ 100(10^2) x K 13.7x
2000 (~200 pages) 3630 ms 89.2 ms 40.7x 631x ≒ 400(20^2) x 1.5 x K 35.8x

Scaling (Before) is ~O(N^2 * K).
It grows quadratically (1→22→110→631 roughly N^2 pattern but worse, since K also scales with N).

Scaling (After( is ~O(N * K).
It grows linearly with a constant overhead seemingly from per-page Yoga relayout, tracking close to the ideal 1→5→10→20 but slightlyWorse due to O(C) relayout per page.

Added several test cases on pagination to ensure the behavior of pagination is preserved.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 16, 2026

🦋 Changeset detected

Latest commit: ffaa476

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 7 packages
Name Type
@react-pdf/layout Minor
@react-pdf/renderer Patch
@react-pdf/math Patch
@react-pdf/mermaid Patch
next-14 Patch
next-15 Patch
@react-pdf/vite-example Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@exoego exoego force-pushed the optimize-pagenation branch 2 times, most recently from 77efc9b to f25b95e Compare April 16, 2026 23:52
@diegomura
Copy link
Copy Markdown
Owner

Thanks @exoego . Can you please explain the logic? Not sure I get what it does.

I think pagination needs to change more drastically though in order to support things like text columns. Right now it's hard to do because we first compute text in a big column and from there breaking is much harder

@exoego
Copy link
Copy Markdown
Contributor Author

exoego commented Apr 17, 2026

The below table explains my understanding of "Before" logic and how "After" logic optimizes it:

Unit Before: O(N^2) After: O(N)
splitNodes Each iteration performed O(N) operations, yielding O(N^2) total:
- nodes.slice(i + 1): O(N - i) copy
- futureNodes.filter(isFixed): O(N - i) scan
- shouldNodeBreak(...) internally re-filtered futureNodes O(N - i) + scanned previousElements O(i)
O(N) pre-computation reduces per-iteration cost to O(1):
-computeSuffixFurthestEnd: backward pass builds suffix-max array, replacing futureNodes filter+aggregation
- collectFixedIndices: pre-collects fixed node indices, replacing slice+filter
- hasNonFixedPrevious: boolean flag replaces previousElements scan
splitPage Called relayout(nextPage) after each split: a full Yoga layout pass over all remaining children. Since nextPage was only used as input to the next split (never in final output), this redundant relayout compounded to O(N^2) across all pages. Removed relayout entirely.Skipped relayout on nextPage. splitNodes already adjusts box.top, and nextPage is never in final output: only each currentPage gets properly relaid out.

I think pagination needs to change more drastically though in order to support things like text columns. Right now it's hard to do because we first compute text in a big column and from there breaking is much harder

Don't worry.
I already have a WIP branch for text columns, which works perfectly with this optimized pagination 😉
image

@diegomura
Copy link
Copy Markdown
Owner

backward pass builds suffix-max array, replacing futureNodes filter+aggregation

I'm not sure I fully understand how a suffix array is useful here

Removed relayout entirely.

Yoga (re)-layout should not add much overhead, also, how are dynamic nodes handled if there's no relayout?

I already have a WIP branch for text columns

Haha cool! I'd be curious to see how that looks, but still, I think the current pagination solution has become a bit too complex... I'm hesitant on keep adding complexity to it for perf improvements + new features. At this point I think it has to be redesigned completely as I feel there has to be a simpler solution. Most opened issues are due to pagination errors, wether it's text nodes layout, dynamic nodes, breaks, etc. Every thing I add at the current solution it's an extra layer of complexity or thing. that can break for a future migration.

Right now the algorithm works by rendering everything as if it were a big-long page, and then start breaking things into pieces. This is mostly due to the nature of yoga and flexbox. But some nodes are dynamic, some other's geometries unknown (like text), and these depend on page size, not full-height layout. So I feel we shuold move more towards a "streaming" solution where pages are filled one by one somehow rather than a break-nodes solution. Not yet sure if it's possible

@exoego
Copy link
Copy Markdown
Contributor Author

exoego commented Apr 18, 2026

I understand that you are more inclined to a full-rewrite of algorithm so it does not only boost performance, but also simplify introducing new features.
But I think such complete rewrite requires weeks or months to get mature.

It would be highly appreciated if this optimization get reviewed and merged as a short-term solution🙇 since there are seemingly many users, including myself, who are facing performance issues.

@diegomura
Copy link
Copy Markdown
Owner

diegomura commented Apr 18, 2026

I understand and I think it's a reasonable ask :) Can you confirm though that dynamic nodes work as expected? I'm a bit scared of removing the re-layout step. Not even myself sometimes get al the weird quirks of the current pagination algorithm :)

@diegomura
Copy link
Copy Markdown
Owner

@exoego just checked out the branch and tested the examples repo for a quick visual regression test. There's something odd in the mermaid example (if you run yarn dev and select the vite project -> http://localhost:5173/#mermaid). It has completely blank pages

exoego added 5 commits April 19, 2026 15:08
…aling behavior of resolvePagination

Baseline results show worse-than-quadratic scaling:
- 100 children: 5.1ms
- 500 children: 126ms (25x)
- 1000 children: 628ms (123x, worse than 100x)
- 2000 children: 3,597ms (707x, worse than 400x)
- Remove per-iteration nodes.slice() and futureNodes.filter() calls
- Add shouldBreakOptimized() that accepts pre-computed scalar values
  instead of scanning arrays each call (O(N) → O(1))
- The original shouldBreak is now a thin wrapper that computes¥
  pre-computed values from arrays and delegates to shouldBreakOptimized.
- Pre-compute suffix max array for furthest end of non-fixed future
  nodes in a single right-to-left pass (O(N))
- Pre-collect fixed node entries once instead of filtering per iteration
- Track hasNonFixedPrevious as a running boolean instead of filtering
  previousElements array each iteration
Skip the expensive relayoutPage() call on nextPage since it's only used
as input to the next splitPage iteration, never added to final output.
The currentPage (which IS in the output) is still fully relaid out.

Also fix splitNode to always set box.height on the next half, even for
auto-height nodes. Without relayout, an auto-height node would keep its
original (too large) box.height, causing infinite splitting loops.
…t relayout

When a node triggers a split in splitNodes, the remaining siblings
(via nodes.slice(i+1)) were pushed to nextChildren with their original
box.top values.

Previously relayout on nextPage corrected these positions, but after the
relayout removal optimization, nodes like footers with marginTop:'auto'
retained large top values and were incorrectly classified as "outside"
the next page, causing them to appear alone on a separate page.
@exoego exoego force-pushed the optimize-pagenation branch from f25b95e to ffaa476 Compare April 19, 2026 06:08
@exoego
Copy link
Copy Markdown
Contributor Author

exoego commented Apr 19, 2026

@diegomura

1. mermaid example ... has completely blank pages

Good catch. Fixed it in ffaa476 and performance is still very good.

Cause

When a node triggered a split in splitNodes, the remaining siblings were pushed to nextChildren via nodes.slice(i + 1) with their original box.top values.
Previously relayout on nextPage corrected these, but after the optimization, nodes like the footer (with marginTop: 'auto', positioned near the page bottom) retained their large top values.
On the next splitNodes pass, they were classified as isOutside (wrapArea <= top) and pushed to yet another page, producing pages with only a footer.

Fix

The fix adds adjustRemaining() which subtracts height from box.top for all non-fixed remaining siblings, consistent with how the breaking node itself and the isOutside path already adjust positions.

2. I'm not sure I fully understand how a suffix array is useful here

Problem

The original shouldNodeBreak needed to know: what is the furthest bottom edge among all future non-fixed siblings?
Previously this was computed per iteration by futureNodes.filter(isFixed) + Math.max(…map(n => n.box.top + n.box.height)), which is an O(N) scan each time, yielding O(N^2) total.

Solution (how suffix array helps)

computeSuffixFurthestEnd replaces this with a single right-to-left pass: suffixFurthestEnd[i] stores the max (top + height) of all non-fixed nodes after index i.
Then shouldBreakOptimized can look up the pre-computed value in O(1) instead of scanning the array each time.
It's basically the same data, just pre-computed.

3. how are dynamic nodes handled if there's no relayout?

Sorry for the explanation "Removed relayout entirely" confused you.
It meant Skipped relayout on nextPage.

Dynamic nodes are still fully relaid out.
The optimization only skips relayout on the intermediate nextPage.

At the top of each splitPage call, resolveDynamicPage checks for dynamic nodes and, if found, executes their render props and calls relayoutPage. So the new flow is:

  1. nextPage is created without relayout (the optimization)
  2. On the next iteration, splitPage(nextPage, ...) is called
  3. resolveDynamicPage triggers full relayout if dynamic nodes exist
  4. currentPage is always relaid out before being added to output

So, dynamic nodes are never output without a fresh Yoga pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slow generation time for PDF with many pages Performance problem with creating multiple pages

2 participants