Skip to content

ci(probe): drop linux_amd64_musl exclusion — see if MEOS vcpkg port builds against musl-libc#172

Open
estebanzimanyi wants to merge 7 commits into
MobilityDB:mainfrom
estebanzimanyi:ci/probe-musl-build
Open

ci(probe): drop linux_amd64_musl exclusion — see if MEOS vcpkg port builds against musl-libc#172
estebanzimanyi wants to merge 7 commits into
MobilityDB:mainfrom
estebanzimanyi:ci/probe-musl-build

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Stacks on #171 (MinGW probe), which stacks on #170 (osx_arm64). The three probes are independent — one outcome doesn't affect the others.

Same probe shape as #171

The MEOS-port-side glibc-isms are the suspected blockers for musl-libc compatibility:

  • printf format flag handling in postgres/utils/formatting.c (musl is stricter about %Lf precision specifiers)
  • fpos_t opacity (musl makes it strictly opaque; glibc allows arithmetic)
  • __GLIBC__-guarded paths in vendored PG sources
  • getline availability (POSIX.1-2008; musl supports but with different prototypes)
  • signal handler differences

The precise failure log is the data needed to scope the per-blocker fix. This probe lets CI tell us.

Two outcomes

If the musl row Then
Goes green MobilityDuck gets free linux_amd64_musl binaries. The exclusion stays dropped; this PR lands as-is.
Fails The failure log identifies the precise next blocker. This PR is amended to put linux_amd64_musl back in the exclusion list with a comment naming the specific failure, and a follow-up PR addresses that blocker.

Tracker

Exclusion State This-session PR
osx_arm64 excluded pending MEOS-side hex-WKB bug fix #170
windows_amd64_mingw probe in flight #171
linux_amd64_musl probe in flight (this PR) #new
windows_amd64 (MSVC) blocked on <dirent.h> shim + (already-opened) MobilityDB #1104 attribute fix future Wave B PR

The MEOS `*_as_hexwkb` family produces odd-length hex output on macOS
arm64, which the DuckDB test framework's hex decoder rejects.  The bug
is real and MEOS-side; until it's fixed, every MobilityDuck PR that
includes hex-WKB-touching SQL tests sees a red osx_arm64 build, which
gates merges on a bug none of the PRs own.

Consolidation:

- MainDistributionPipeline.yml: add `osx_arm64` to `exclude_archs`
  (both build and deploy jobs, kept in sync).  Every PR currently
  blocked solely by the inherited hex-WKB red goes fully green
  without changing its content.

- HexWKBDiagnostic.yml (new): builds **only** osx_arm64 with the
  hex-WKB diagnostic instrumentation (PR MobilityDB#162 = `diag/hex-wkb-*`).
  Triggers: push to `diag/hex-wkb-*` branches, daily 06:00 UTC
  cron, manual dispatch.  Keeps the bug observable while it isn't
  gating production PRs.

Trade-off accepted: osx_arm64 stops gating unrelated arm64 regressions
until the hex-WKB bug is fixed.  When the diagnostic surfaces clean
output for two consecutive cron cycles, drop `osx_arm64` from the
production exclusion list and retire HexWKBDiagnostic.yml — that's a
2-line revert of this PR.
…t builds under MinGW

MinGW provides <dirent.h> and supports GCC __attribute__ syntax, so the
two source-level blockers that prevent an MSVC-native MEOS build are
absent on the MinGW path.  This probe lets CI tell us whether the MEOS
vcpkg port actually builds end-to-end under MinGW.

Two possible outcomes:

  - MinGW row goes green → MobilityDuck gets free Windows binaries
    from the existing CI matrix, no further work needed for the
    MinGW target.
  - MinGW row fails → the failure log identifies the precise next
    blocker, which is the data needed to scope the follow-up.

Stacks on MobilityDB#170 (osx_arm64 exclusion).  Independent of MobilityDB #1104
(pg_attribute_unused macro fix) and MobilityDB #959 (MSYS2/UCRT64
Windows MEOS bootstrap) — those address the source-level blockers
for the MSVC-native target, which remains excluded here.
…uilds against musl-libc

The MEOS-port-side glibc-isms are the suspected blockers — printf
format flag handling in postgres/utils/formatting.c, fpos_t opacity,
__GLIBC__-guarded paths, getline availability — but the precise
failure log is the data needed to scope the per-blocker fix.

Two possible outcomes:

  - musl row goes green → free linux musl binaries; the exclusion
    stays dropped.
  - musl row fails → the failure log identifies the precise next
    blocker.  This PR is amended to put linux_amd64_musl back with a
    comment naming the specific failure.

Stacks on MobilityDB#171 (MinGW probe), which stacks on MobilityDB#170 (osx_arm64).  The
three probes are independent: one outcome doesn't affect the others.
… PR MobilityDB#161)

Cherry-picked from open PR MobilityDB#161 so this PR's CI compiles against the
vcpkg-installed MEOS, which exposes 'meosType' (pre-consolidation)
not 'MeosType'.  When MobilityDB#161 reaches main, this commit collapses to a
no-op on rebase.
…MobilityDB#136)

Cherry-picked from open PR MobilityDB#136 so this PR's amd64 Linux test phase
goes green before MobilityDB#136 lands.  When MobilityDB#136 reaches main, this rebase
collapses to a no-op.
…en PR MobilityDB#140)

Cherry-picked from open PR MobilityDB#140 so this PR's osx_amd64 / osx_arm64 /
wasm builds compile.  On macOS LP64 and Wasm/emscripten, int64 (long)
and int64_t (long long) are the same width but distinct types; clang
rejects passing bigint_to_set where Set *(*)(int64_t) is expected.
The cast is a no-op on Linux.  When MobilityDB#140 reaches main, this rebase
collapses to a no-op.
…e_ptr<FunctionData> in Copy()

GCC + DuckDB 1.4.4's unique_ptr does not implicitly convert
derived->base, so 'return r;' in BinsBindData::Copy() fails to compile:

  error: could not convert 'r' from 'unique_ptr<duckdb::{anonymous}::BinsBindData,...>'
                                to 'unique_ptr<duckdb::FunctionData,...>'

Use duckdb's unique_ptr_cast helper (from duckdb/common/helper.hpp) to
do the conversion explicitly, matching the canonical pattern used by
DuckDB core (e.g. table_scan.hpp's TableScanBindData::Copy()).  No
behaviour change; the move is exactly what the implicit conversion
would have done if the compiler accepted it.
@estebanzimanyi
Copy link
Copy Markdown
Member Author

Linux amd64 + arm64 both PASS after the polyfill stack — the probe itself does not break the baseline. The actual musl probe (linux_amd64_musl row) is still in progress and will tell us whether the MEOS vcpkg port builds under musl-libc; result pending.

@estebanzimanyi
Copy link
Copy Markdown
Member Author

Probe outcome — musl build SURFACES a real Wave D blocker:

linux_amd64_musl build succeeded (build+link green; 623/623 targets compiled). The fail came in make test_release:

could not open directory "/usr/share/zoneinfo": No such file or directory
could not open directory "/usr/share/zoneinfo": No such file or directory
make: *** [Makefile:36: test_release_internal] Error 1
[0/59] (0%): /duckdb_build_dir/test/sql/stbox.test
##[error]Process completed with exit code 2.

Diagnosis: Alpine/musl ships without /usr/share/zoneinfo by default. MobilityDuck's extension entry point calls meos_initialize_timezone("Europe/Brussels") which then fails to locate the timezone database.

Wave D fix paths (any one of these unblocks musl):

  1. Install tzdata in the musl Docker image — upstream patch to duckdb/extension-ci-tools/docker/linux_amd64_musl/Dockerfile: RUN apk add tzdata. Smallest change, restores parity with the manylinux base used by glibc.
  2. Make MEOS timezone init resilient — wrap meos_initialize_timezone in an existence check; skip on missing zoneinfo. MobilityDuck-side, ~5 LoC.
  3. Bundle zoneinfo — ship tzdata in the MobilityDuck extension binary itself. Larger artifact, simpler runtime story.

Important positive signal: the MEOS C compile + link on musl is green — no source-level musl portability bug. The blocker is purely environmental (missing tzdata). This is a cleaner Wave D unblock than I expected.

@estebanzimanyi
Copy link
Copy Markdown
Member Author

Alternative unblock path opened as #182: extension-side fix that skips the meos_initialize_timezone call when /usr/share/zoneinfo isn't present. Complementary to the upstream Dockerfile apk add tzdata fix — either alone unblocks the musl probe, both give belt-and-suspenders coverage. The extension-side fix has the bonus of making MobilityDuck deployable on any tzdata-less host (edge devices, minimal containers), not just the CI image.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant