Load PHP intl as a runtime-optional C++ side module#839
Open
mho22 wants to merge 2 commits into
Open
Conversation
## Why
- PHP's `intl` extension needs ICU, but base `php.wasm` must stay
ICU-free: ~30 MB of ICU should not burden every PHP boot and download.
So `intl` must ship as a runtime-optional side module loaded via
`extension=intl.so` (the way `opcache` is), not be statically compiled
in.
- This is the first self-contained C++ *side module* in the repo
(mariadb is C++ but a main module), and loading it exposed a real host
dynamic-linker gap. A self-contained C++ side module both defines and
imports its weak COMDAT symbols — virtual destructors, template
instantiations, replaceable `operator new`/`delete` — because wasm-ld
routes default-visibility weak symbols through `env` so a main module
could interpose them. A pure-C main module (`php.wasm`) exports none of
them, so instantiation failed with "function import requires a
callable". That is a host-runtime contract that must hold on Node and
browser, not an intl-specific quirk.
## What
- **host/src/dylink.ts** — resolve a side module's weak C++ self-imports:
an unresolved `env` function import is satisfied from the module's own
post-instantiation exports via a trampoline. A genuinely absent symbol
throws loudly at call time instead of returning a silent 0, so a real
ABI gap stays truthful. Also provide the `env.__cpp_exception` tag that
`-fwasm-exceptions` modules import.
- **packages/registry/icu** — new ICU4C 74.2 wasm32 library package
(two-stage host/cross build, static PIC, common data staged as
`share/icu.dat`).
- **packages/registry/libcxx** — emit position-independent
`libc++-pic.a` / `libc++abi-pic.a` variants for `-shared` PIC side
modules; the non-PIC pair is unchanged for main-module consumers.
- **packages/registry/php** — link `intl.so` by hand with
`wasm32posix-cc -shared` (the libtool workaround `opcache` uses),
naming the libc++ PIC archives; force the 25 pure-musl symbols
`intl.so` imports but base PHP never references (allocator, wide-char,
math, and the pthread mutex/cond/TLS ICU's UMutex uses) into
`php.wasm` via `-u` so the side module shares one libc state with the
main module. ICU is kept out of the main link entirely; its ~30 MB
common data ships as the separate `icu.dat`, fed to ICU at load time by
a constructor in `intl-icu-data-loader.c`.
## Validation
- clean rebuild: `bash build-php.sh` reproduces `php.wasm`
(35,590,340 B) and `intl.so` (5,867,877 B) byte-for-byte from the
committed sources (icu rev3 + libcxx rev6 resolver caches).
- `host/test/dylink.test.ts`: 19 passed, including a new "weak
self-import handling" suite that assembles synthetic WAT side modules
to cover the trampoline route-to-self, the loud missing-symbol
failure, and the `__cpp_exception` tag — no full PHP/ICU build needed.
- `packages/registry/php/test/php-intl.test.ts`: 4 passed — base
`php.wasm` lacks intl; `extension=intl.so` loads;
`Locale::getDisplayLanguage("fr","en")`→"French"; `Collator` sort off
`icu.dat`.
- no regression: `dlopen-e2e` (3), `fork-dlopen-replay-e2e` (1),
`opcache-prewarm` (2) pass; `opcache.so` is pure C so the trampoline
branch never fires for it.
## Host parity
`dylink.ts` is shared Node/browser host logic and uses only standard
WebAssembly APIs, so the fix covers both hosts; run on Node. No browser
demo is added here (no VFS-image staging yet), so no browser run was
required.
## ABI
No ABI change — host-side linking plus package artifacts, not
kernel/process ABI, syscall, memory layout, or fd semantics. Conformance
suites (libc/posix) were not run because no syscall or kernel behavior
changed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Phase B-1 matrix build status —
|
| Package | Arch | Status | Sha |
|---|---|---|---|
| icu | wasm32 | built | aa03c107 |
| libcxx | wasm32 | built | 35eec6d6 |
| libcxx | wasm64 | built | 9364e8f4 |
| dinit | wasm32 | built | 4fbff30d |
| kandelo-sdk | wasm32 | built | 22d40b0f |
| mariadb | wasm32 | built | 4ac2398c |
| mariadb | wasm64 | built | 265a4b97 |
| php | wasm32 | built | 889c740b |
| spidermonkey | wasm32 | built | e7d3a887 |
| lamp | wasm32 | built | 9d840407 |
| mariadb-test | wasm32 | built | b7195948 |
| mariadb-vfs | wasm32 | built | 86c048cf |
| mariadb-vfs | wasm64 | built | b94b0a28 |
| node | wasm32 | built | 77ec2f34 |
| spidermonkey-node | wasm32 | built | 5db86c0e |
| wordpress | wasm32 | built | 99a0763d |
| node-vfs | wasm32 | built | b2901d6c |
Auto-generated; replaced on each push. Raw data in the publish-status workflow artifact.
## Why - The ICU package's Stage-1 HOST build compiles the native data tools (genrb/pkgdata/icupkg/…) with the dev-shell clang++. On the Nix Linux CI runner those tools link the GNU C++/GCC runtime dynamically, but `libstdc++.so.6` is not on the runner's loader path, so they abort at exec with "error while loading shared libraries: libstdc++.so.6: cannot open shared object file". - Stage 2's `make` invokes icupkg/pkgdata to package the ICU common data, so an unrunnable host tool fails the ICU build itself — which reddened `lib-matrix-build (icu, wasm32)` and every job that depends on icu (`matrix-build (wasm32, php)`, `lamp`, `wordpress`) on PR #839. - macOS did not surface this: local clang links a self-contained libc++, so a macOS from-source build (or a reused macOS cache) never exercises the Linux host-tool loader linkage. ## What - **packages/registry/icu/build-icu.sh** — on Linux, pass `-static-libstdc++ -static-libgcc` as the Stage-1 host LDFLAGS so the C++/GCC runtime is folded into each data tool and no runtime `.so` is needed. The flags are Linux-guarded because macOS clang links libc++ and rejects them; `runConfigureICU` re-exports the pre-set LDFLAGS to configure, so setting them here reaches the host tool link. - **packages/registry/icu/build.toml** — bump revision 3 → 4 so CI rebuilds and re-stages icu from source. ## Validation - `bash -n packages/registry/icu/build-icu.sh` passes. - Linux from-source validation is deferred to CI (matrix-build rebuilds icu from source on the Nix Linux runner); it was not run locally because this Mac cannot reproduce the Linux host-tool linkage. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
intlextension needs ICU, but basephp.wasmmust stay ICU-free: ~30 MB of ICU should not burden every PHP boot and download. Sointlmust ship as a runtime-optional side module loaded viaextension=intl.so(the wayopcacheis), not be statically compiled in.operator new/delete— because wasm-ld routes default-visibility weak symbols throughenvso a main module could interpose them. A pure-C main module (php.wasm) exports none of them, so instantiation failed with "function import requires a callable". That is a host-runtime contract that must hold on Node and browser, not an intl-specific quirk.What
envfunction import is satisfied from the module's own post-instantiation exports via a trampoline. A genuinely absent symbol throws loudly at call time instead of returning a silent 0, so a real ABI gap stays truthful. Also provide theenv.__cpp_exceptiontag that-fwasm-exceptionsmodules import.share/icu.dat).libc++-pic.a/libc++abi-pic.avariants for-sharedPIC side modules; the non-PIC pair is unchanged for main-module consumers.intl.soby hand withwasm32posix-cc -shared(the libtool workaroundopcacheuses), naming the libc++ PIC archives; force the 25 pure-musl symbolsintl.soimports but base PHP never references (allocator, wide-char, math, and the pthread mutex/cond/TLS ICU's UMutex uses) intophp.wasmvia-uso the side module shares one libc state with the main module. ICU is kept out of the main link entirely; its ~30 MB common data ships as the separateicu.dat, fed to ICU at load time by a constructor inintl-icu-data-loader.c.Validation
bash build-php.shreproducesphp.wasm(35,590,340 B) andintl.so(5,867,877 B) byte-for-byte from the committed sources (icu rev3 + libcxx rev6 resolver caches).host/test/dylink.test.ts: 19 passed, including a new "weak self-import handling" suite that assembles synthetic WAT side modules to cover the trampoline route-to-self, the loud missing-symbol failure, and the__cpp_exceptiontag — no full PHP/ICU build needed.packages/registry/php/test/php-intl.test.ts: 4 passed — basephp.wasmlacks intl;extension=intl.soloads;Locale::getDisplayLanguage("fr","en")→"French";Collatorsort officu.dat.dlopen-e2e(3),fork-dlopen-replay-e2e(1),opcache-prewarm(2) pass;opcache.sois pure C so the trampoline branch never fires for it.Host parity
dylink.tsis shared Node/browser host logic and uses only standard WebAssembly APIs, so the fix covers both hosts; run on Node. No browser demo is added here (no VFS-image staging yet), so no browser run was required.ABI
No ABI change — host-side linking plus package artifacts, not kernel/process ABI, syscall, memory layout, or fd semantics. Conformance suites (libc/posix) were not run because no syscall or kernel behavior changed.
🤖 Generated with Claude Code