Skip to content

Eager on pathologically oversized string literals (#478)#479

Open
mgajda wants to merge 1 commit intohaskell-suite:masterfrom
mgajda:bench/A-strict-accum
Open

Eager on pathologically oversized string literals (#478)#479
mgajda wants to merge 1 commit intohaskell-suite:masterfrom
mgajda:bench/A-strict-accum

Conversation

@mgajda
Copy link
Copy Markdown

@mgajda mgajda commented Apr 20, 2026

Summary

Addresses #478 — heap-usage explosion when parseModuleWithComments encounters a multi-MB single-line string literal of \NN-escaped bytes (reported as ~95× source size).

Three surgical changes in InternalLexer.hs, no API or dependency changes:

  1. Uncurry lexString's accumulator tuple and add bang patterns (loop !s !raw). The original loop (s,raw) built a chain of lazy tuple thunks, one per input character.
  2. Replace reverse xs ++ ys with a local tail-recursive revAppend in the two escape-handling branches. Avoids allocating an intermediate reversed list plus a ++ thunk.
  3. Strict parseInteger. The foldl1 (\n d -> n*radix + d) built a deep chain of Integer-multiplication thunks for long numeric literals; rewritten as go !acc.

Measurements

Pathological input: 2.9 MB Haskell source, one string literal of 1 M \25 escapes. GHC 9.10.3, +RTS -s, /usr/bin/time -v, best of 3:

variant bytes allocated max residency max RSS wall
master 1 284 MB 382 MB 877 MB 0.892s
this PR 1 212 MB 304 MB 662 MB 0.746s

−20 % max residency, −24 % RSS, −17 % wall time. 15 insertions / 8 deletions, one file.

This does not fully close #478 — the remaining 304 MB peak is dominated by (:) cons cells inherent to the String representation of the AST (~168 MB at peak by -hT profile). The principled fix is to carry Text/ByteString in StringTok/Literal, which is a breaking change and properly belongs to a future major release. This PR is the non-breaking portion worth shipping now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pathological memory on oversize string literals (~95x source size)

1 participant