Skip to content

Load PHP intl as a runtime-optional C++ side module#839

Open
mho22 wants to merge 2 commits into
mainfrom
explore-intl-side-module
Open

Load PHP intl as a runtime-optional C++ side module#839
mho22 wants to merge 2 commits into
mainfrom
explore-intl-side-module

Conversation

@mho22

@mho22 mho22 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Why

  • PHP's intl extension needs ICU, but base php.wasm must stay ICU-free: ~30 MB of ICU should not burden every PHP boot and download. So intl must ship as a runtime-optional side module loaded via extension=intl.so (the way opcache is), not be statically compiled in.
  • This is the first self-contained C++ side module in the repo (mariadb is C++ but a main module), and loading it exposed a real host dynamic-linker gap. A self-contained C++ side module both defines and imports its weak COMDAT symbols — virtual destructors, template instantiations, replaceable operator new/delete — because wasm-ld routes default-visibility weak symbols through env so a main module could interpose them. A pure-C main module (php.wasm) exports none of them, so instantiation failed with "function import requires a callable". That is a host-runtime contract that must hold on Node and browser, not an intl-specific quirk.

What

  • host/src/dylink.ts — resolve a side module's weak C++ self-imports: an unresolved env function import is satisfied from the module's own post-instantiation exports via a trampoline. A genuinely absent symbol throws loudly at call time instead of returning a silent 0, so a real ABI gap stays truthful. Also provide the env.__cpp_exception tag that -fwasm-exceptions modules import.
  • packages/registry/icu — new ICU4C 74.2 wasm32 library package (two-stage host/cross build, static PIC, common data staged as share/icu.dat).
  • packages/registry/libcxx — emit position-independent libc++-pic.a / libc++abi-pic.a variants for -shared PIC side modules; the non-PIC pair is unchanged for main-module consumers.
  • packages/registry/php — link intl.so by hand with wasm32posix-cc -shared (the libtool workaround opcache uses), naming the libc++ PIC archives; force the 25 pure-musl symbols intl.so imports but base PHP never references (allocator, wide-char, math, and the pthread mutex/cond/TLS ICU's UMutex uses) into php.wasm via -u so the side module shares one libc state with the main module. ICU is kept out of the main link entirely; its ~30 MB common data ships as the separate icu.dat, fed to ICU at load time by a constructor in intl-icu-data-loader.c.

Validation

  • clean rebuild: bash build-php.sh reproduces php.wasm (35,590,340 B) and intl.so (5,867,877 B) byte-for-byte from the committed sources (icu rev3 + libcxx rev6 resolver caches).
  • host/test/dylink.test.ts: 19 passed, including a new "weak self-import handling" suite that assembles synthetic WAT side modules to cover the trampoline route-to-self, the loud missing-symbol failure, and the __cpp_exception tag — no full PHP/ICU build needed.
  • packages/registry/php/test/php-intl.test.ts: 4 passed — base php.wasm lacks intl; extension=intl.so loads; Locale::getDisplayLanguage("fr","en")→"French"; Collator sort off icu.dat.
  • no regression: dlopen-e2e (3), fork-dlopen-replay-e2e (1), opcache-prewarm (2) pass; opcache.so is pure C so the trampoline branch never fires for it.

Host parity

dylink.ts is shared Node/browser host logic and uses only standard WebAssembly APIs, so the fix covers both hosts; run on Node. No browser demo is added here (no VFS-image staging yet), so no browser run was required.

ABI

No ABI change — host-side linking plus package artifacts, not kernel/process ABI, syscall, memory layout, or fd semantics. Conformance suites (libc/posix) were not run because no syscall or kernel behavior changed.

🤖 Generated with Claude Code

## Why

- PHP's `intl` extension needs ICU, but base `php.wasm` must stay
  ICU-free: ~30 MB of ICU should not burden every PHP boot and download.
  So `intl` must ship as a runtime-optional side module loaded via
  `extension=intl.so` (the way `opcache` is), not be statically compiled
  in.
- This is the first self-contained C++ *side module* in the repo
  (mariadb is C++ but a main module), and loading it exposed a real host
  dynamic-linker gap. A self-contained C++ side module both defines and
  imports its weak COMDAT symbols — virtual destructors, template
  instantiations, replaceable `operator new`/`delete` — because wasm-ld
  routes default-visibility weak symbols through `env` so a main module
  could interpose them. A pure-C main module (`php.wasm`) exports none of
  them, so instantiation failed with "function import requires a
  callable". That is a host-runtime contract that must hold on Node and
  browser, not an intl-specific quirk.

## What

- **host/src/dylink.ts** — resolve a side module's weak C++ self-imports:
  an unresolved `env` function import is satisfied from the module's own
  post-instantiation exports via a trampoline. A genuinely absent symbol
  throws loudly at call time instead of returning a silent 0, so a real
  ABI gap stays truthful. Also provide the `env.__cpp_exception` tag that
  `-fwasm-exceptions` modules import.
- **packages/registry/icu** — new ICU4C 74.2 wasm32 library package
  (two-stage host/cross build, static PIC, common data staged as
  `share/icu.dat`).
- **packages/registry/libcxx** — emit position-independent
  `libc++-pic.a` / `libc++abi-pic.a` variants for `-shared` PIC side
  modules; the non-PIC pair is unchanged for main-module consumers.
- **packages/registry/php** — link `intl.so` by hand with
  `wasm32posix-cc -shared` (the libtool workaround `opcache` uses),
  naming the libc++ PIC archives; force the 25 pure-musl symbols
  `intl.so` imports but base PHP never references (allocator, wide-char,
  math, and the pthread mutex/cond/TLS ICU's UMutex uses) into
  `php.wasm` via `-u` so the side module shares one libc state with the
  main module. ICU is kept out of the main link entirely; its ~30 MB
  common data ships as the separate `icu.dat`, fed to ICU at load time by
  a constructor in `intl-icu-data-loader.c`.

## Validation

- clean rebuild: `bash build-php.sh` reproduces `php.wasm`
  (35,590,340 B) and `intl.so` (5,867,877 B) byte-for-byte from the
  committed sources (icu rev3 + libcxx rev6 resolver caches).
- `host/test/dylink.test.ts`: 19 passed, including a new "weak
  self-import handling" suite that assembles synthetic WAT side modules
  to cover the trampoline route-to-self, the loud missing-symbol
  failure, and the `__cpp_exception` tag — no full PHP/ICU build needed.
- `packages/registry/php/test/php-intl.test.ts`: 4 passed — base
  `php.wasm` lacks intl; `extension=intl.so` loads;
  `Locale::getDisplayLanguage("fr","en")`→"French"; `Collator` sort off
  `icu.dat`.
- no regression: `dlopen-e2e` (3), `fork-dlopen-replay-e2e` (1),
  `opcache-prewarm` (2) pass; `opcache.so` is pure C so the trampoline
  branch never fires for it.

## Host parity

`dylink.ts` is shared Node/browser host logic and uses only standard
WebAssembly APIs, so the fix covers both hosts; run on Node. No browser
demo is added here (no VFS-image staging yet), so no browser run was
required.

## ABI

No ABI change — host-side linking plus package artifacts, not
kernel/process ABI, syscall, memory layout, or fd semantics. Conformance
suites (libc/posix) were not run because no syscall or kernel behavior
changed.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

Phase B-1 matrix build status — pr-839-staging

ABI v16. 17 built, 0 failed, 17 total.

Package Arch Status Sha
icu wasm32 built aa03c107
libcxx wasm32 built 35eec6d6
libcxx wasm64 built 9364e8f4
dinit wasm32 built 4fbff30d
kandelo-sdk wasm32 built 22d40b0f
mariadb wasm32 built 4ac2398c
mariadb wasm64 built 265a4b97
php wasm32 built 889c740b
spidermonkey wasm32 built e7d3a887
lamp wasm32 built 9d840407
mariadb-test wasm32 built b7195948
mariadb-vfs wasm32 built 86c048cf
mariadb-vfs wasm64 built b94b0a28
node wasm32 built 77ec2f34
spidermonkey-node wasm32 built 5db86c0e
wordpress wasm32 built 99a0763d
node-vfs wasm32 built b2901d6c

Auto-generated; replaced on each push. Raw data in the publish-status workflow artifact.

## Why

- The ICU package's Stage-1 HOST build compiles the native data tools
  (genrb/pkgdata/icupkg/…) with the dev-shell clang++. On the Nix Linux
  CI runner those tools link the GNU C++/GCC runtime dynamically, but
  `libstdc++.so.6` is not on the runner's loader path, so they abort at
  exec with "error while loading shared libraries: libstdc++.so.6:
  cannot open shared object file".
- Stage 2's `make` invokes icupkg/pkgdata to package the ICU common
  data, so an unrunnable host tool fails the ICU build itself — which
  reddened `lib-matrix-build (icu, wasm32)` and every job that depends
  on icu (`matrix-build (wasm32, php)`, `lamp`, `wordpress`) on PR #839.
- macOS did not surface this: local clang links a self-contained libc++,
  so a macOS from-source build (or a reused macOS cache) never exercises
  the Linux host-tool loader linkage.

## What

- **packages/registry/icu/build-icu.sh** — on Linux, pass
  `-static-libstdc++ -static-libgcc` as the Stage-1 host LDFLAGS so the
  C++/GCC runtime is folded into each data tool and no runtime `.so` is
  needed. The flags are Linux-guarded because macOS clang links libc++
  and rejects them; `runConfigureICU` re-exports the pre-set LDFLAGS to
  configure, so setting them here reaches the host tool link.
- **packages/registry/icu/build.toml** — bump revision 3 → 4 so CI
  rebuilds and re-stages icu from source.

## Validation

- `bash -n packages/registry/icu/build-icu.sh` passes.
- Linux from-source validation is deferred to CI (matrix-build rebuilds
  icu from source on the Nix Linux runner); it was not run locally
  because this Mac cannot reproduce the Linux host-tool linkage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mho22 mho22 marked this pull request as ready for review July 2, 2026 21:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant