Skip to content

Qt5/armhf webview: intermittent heap-corruption crash during Chromium init (Sentry ANTHIAS-D/F) #3022

@vpetersson

Description

@vpetersson

Symptom

On armv7 Qt5 boards (linuxfb stream), AnthiasViewer sometimes dies within ~1s of spawn, before the D-Bus handshake:

This plugin does not support createPlatformOpenGLContext!
malloc(): unaligned tcache chunk detected

Sentry: ANTHIAS-D (all 30 startup retries exhausted), ANTHIAS-F (inline 3-attempt budget exhausted). On most fleet devices the crash is rare and load_browser()'s retry absorbs it; on a few devices it fires on every spawn, so the display stays dark.

What we established (on-device, Pi 4 testbed running the latest-pi3 armhf image)

  • Reproduced the exact fleet signature (GL-probe warning + malloc(): unaligned tcache chunk detected, SIGABRT rc=134) in a spawn-loop harness; high crash rate in this environment (~93% of 30 spawns; an earlier pass with MALLOC_CHECK_=3 + libc_malloc_debug crashed ~23%, so allocator timing matters).

  • 35 core dumps captured. Mix of SIGSEGV (faulting PC inside stripped libQt5WebEngineCore.so.5, ~3 recurring PC offsets across ASLR bases) and SIGABRT (glibc malloc abort) — i.e. ANTHIAS-D ("exited before handshake", no malloc line) and ANTHIAS-F (malloc line) are two faces of one bug. Crash happens mid-Chromium-init with ~18 threads live, before D-Bus registration.

  • Mitigation matrix — all negative (20-30 spawns per arm):

    Chromium flags ok crash
    (none, baseline) 2/30 28
    --disable-gpu 8/30 22
    --no-zygote --no-sandbox 5/20 15
    --single-process --no-sandbox 0/20 20
    --js-flags=--jitless 2/20 18
    --js-flags=--jitless --no-zygote --no-sandbox 0/20 20

    So it is not the zygote fork, the sandbox, the GPU process/GL probe, or the V8 JIT individually.

  • Environment note: the repro host runs the armhf userland under an aarch64 kernel (uname -m = aarch64 inside the armhf container; /proc/cpuinfo features show aarch64-style flags — no neon/vfp strings armhf feature-detectors look for). This configuration is also what any fleet device gets when Pi OS boots its default 64-bit kernel with the 32-bit Anthias stream. The install/upgrade scripts select pi3-64 when uname -m is aarch64, so the current armhf-under-arm64 exposure is limited to pre-pi3-64 installs that haven't re-run the upgrade script — but the crash signature is identical to what 32-bit-kernel fleet devices report, just amplified.

Repro harness

docker run --rm --privileged -v /tmp/cores:/cores -w /cores \
  -e QT_QPA_PLATFORM=linuxfb --user viewer --entrypoint bash \
  ghcr.io/screenly/anthias-viewer:latest-pi3 -c '
export XDG_RUNTIME_DIR=/tmp/rt; mkdir -p $XDG_RUNTIME_DIR; chmod 700 $XDG_RUNTIME_DIR
export HOME=/tmp; ulimit -c unlimited
for i in $(seq 1 30); do timeout 12 dbus-run-session -- AnthiasViewer; echo "rc=$?"; done'

Cores from the existing runs are on the Pi 4 testbed under /tmp/anthias-cores and /tmp/ab*/cores-*.

Next steps

  1. Tag Sentry events with device_type / kernel_release / kernel_machine / board_modelfeat(sentry): tag events with device type, host kernel, and board model #3021. Once a release ships, ANTHIAS-D/F events will tell us whether the always-crashing devices boot a 64-bit kernel (→ steering them to the pi3-64 stream fixes them) or a 32-bit one (→ the bug needs chasing inside the Qt 5.15 WebEngine build).
  2. Build libQt5WebEngineCore with symbols (qt5-webview-builder) and symbolize the 3 recurring crash PCs from the captured cores.
  3. Validate on the real Pi 3 testbed (32-bit kernel) to measure the un-amplified crash rate.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions