feat(spark): JMEOS 1.4 + BerlinMOD Q1-Q17 + 100% MobilityDB SQL parity (907 tests)#5
Open
estebanzimanyi wants to merge 67 commits into
Open
Conversation
c07a2af to
b17c0a4
Compare
This was referenced May 7, 2026
01ca625 to
16b8627
Compare
fed31d9 to
4a8ca7a
Compare
2 tasks
…37/37 tests
Upgrades MobilitySpark to JMEOS 1.3, adds BerlinMOD portable SQL (Q1-Q17 + QRT),
implements the TemporalParquet edge-to-cloud consumer pipeline, and adds full
test coverage.
UDFs registered
Temporal: tgeompoint, atTime, asHexWKB, startTimestamp, endTimestamp,
numInstants, speed, atGeometry
Geo: eIntersects(*), eContains, nearestApproachDistance, eDwithin,
tgeompoint, trajectory, geomFromText,
length, valueAtTimestamp, tDwithin, whenTrue, aDisjoint,
geomContains, (Q9-Q17)
tgeompointFromBinary, maxSpeed, duration (edge-to-cloud)
(*) eIntersects now auto-detects geodetic tgeogpoint trajectories and
promotes the polygon geometry via geom_to_geog() to avoid mixed-SRID
errors when reading TemporalParquet shards written by MobilityDuck.
BerlinMOD portable SQL (RFC #861 named-function dialect)
Q1-Q8 + QRT: initial set; Q9-Q17: full Spark SQL rewrites dropping the
&&-operator pre-filters (no GiST index in Spark; MEOS UDFs evaluate).
Edge-to-cloud pipeline (edge-to-cloud/)
N02AISData.java: reads TemporalParquet written by MobilityDuck asBinary(),
decodes MEOS-WKB bytes via tgeompointFromBinary(), runs queries A/B/C
matching quickstart.sql (MobilityDuck) and quickstart_mobilitydb.sql
(PostgreSQL/MobilityDB) — same portable SQL across all three platforms.
AISDataIntegrationTest (3): end-to-end Spark SQL against the demo Parquet.
run_pipeline.sh: orchestrates MobilityDuck → Parquet → MobilitySpark.
Build fix
pom.xml: exclude legacy org.mobiltydb + utils packages from compilation
(JMEOS 1.0 API; not yet ported to JMEOS 1.3). Remove once ported.
Test coverage: 37 tests, 0 failures
GeoUDFsTest (23): unit tests for all geo UDFs incl. new edge-to-cloud UDFs
TemporalUDFsTest (8): unit tests for temporal UDFs
BerlinMODIntegrationTest (3): end-to-end BerlinMOD Q1-Q17 + QRT
AISDataIntegrationTest (3): end-to-end edge-to-cloud Parquet pipeline
4a8ca7a to
212192f
Compare
The org/mobiltydb/ and utils/ packages (legacy JMEOS 1.0 API, already excluded from Maven compilation) and UDF/UDT test packages were not exempted from the license-header CI check, causing every CI run to fail. Align check_license.sh with pom.xml's exclude lists. Also add the PostgreSQL License header to Main.java, the one file in org/mobiltydb/ that the CI found before discovering the others.
meos_finalize() is an application-level shutdown call. Invoking it in @afterall causes the surefire forked JVM to crash during shutdown because MEOS TLS cleanup races with Spark/JVM thread teardown after all 34 tests have already passed. Remove the @afterall finalizeMeos() method from TemporalUDFsTest and remove ms.close() from BerlinMODIntegrationTest.tearDown(). The native library is unloaded when the JVM exits; no explicit finalize needed.
Extends c8b182a to cover the two remaining test classes that still called meos_finalize() or ms.close() in @afterall. AISDataIntegrationTest and GeoUDFsTest follow the same pattern fixed earlier for BerlinMODIntegrationTest and TemporalUDFsTest: calling meos_finalize() while the JVM is still tearing down Spark thread pools causes the surefire forked JVM to exit with code 1 without sending its goodbye message, which is why the CI build was failing even though all tests passed. The native library is unloaded automatically when the JVM exits; no explicit finalize is needed.
MEOS's geodetic operations (tpoint_length, tpoint_speed, geographic distance) require an SRS catalogue to resolve SRID definitions such as EPSG:4326. In standalone mode, MEOS reads this catalogue from spatial_ref_sys.csv (default path /usr/local/share/spatial_ref_sys.csv). When MobilitySpark runs without a full MEOS installation — as in CI, where only libmeos.so is extracted from the JMEOS jar — the file is absent and any geodetic calculation fails with the native error "got NULL for SRID (4326)" written to fd 1, which corrupts surefire's IPC channel and causes all AIS integration test results to be lost, turning a fully-passing test run into a BUILD FAILURE. Bundle the catalogue as a JAR resource (src/main/resources/) and extract it to a temp file in MobilitySparkSession.create(), then call meos_set_spatial_ref_sys_csv() so MEOS can find it. Extraction is guarded by an AtomicBoolean so it happens at most once per JVM.
… ttextFromBinary, asBinary UDFs Completes TemporalParquet type coverage for scalar temporal types. MobilityDuck's asBinary() writes all types to Parquet BYTE_ARRAY; MobilitySpark now has matching readers for tint, tfloat, tbool, and ttext alongside the existing tgeompointFromBinary. asBinary(STRING) → BINARY is the inverse: converts an internal hex-WKB string back to raw bytes for writing temporal values into Parquet columns. No MEOS call needed — the internal format is already hex-encoded MEOS-WKB. All four fromBinary UDFs share the same implementation via temporal_from_hexwkb, which is type-agnostic at the WKB level. Type-specific names match MobilityDuck's surface for SQL discoverability. Tests: 10 new cases in TemporalUDFsTest (round-trip + null safety for each UDF). Total: 44/44 pass locally.
…loatspan, bigintspan, datespan) Adds SpanUDFs with 10 TemporalParquet reader UDFs — one per span/spanset type — using the type-agnostic span_from_hexwkb / spanset_from_hexwkb MEOS functions. MobilitySparkSession now registers SpanUDFs alongside TemporalUDFs and GeoUDFs. 11 unit tests cover round-trips and null inputs for all types. Write-back uses the existing asBinary UDF (plain hex-decode, type-agnostic).
…e README tgeompointFromBinary and tgeogpointFromBinary fill the gap for the primary edge-to-cloud type: MobilityDuck writes tgeompoint as BYTE_ARRAY, now MobilitySpark can read it back with a named UDF (same fromBinaryImpl as the scalar temporal types). README now documents all 28 registered UDFs in three groups (temporal axis, geo, TemporalParquet read/write), adds a TemporalParquet edge-to-cloud pipeline example, a Linux-only platform note, and an accurate project structure tree. Test count updated to 51 (17+11+23).
…test count tgeogpoint_in() writes "got NULL for SRID (4326)" to native stderr when the spatial reference system CSV is not registered, corrupting the surefire channel and crashing the forked JVM. tgeogpointFromBinary uses the same fromBinaryImpl as tgeompointFromBinary (already tested), so no coverage is lost. Null safety for tgeogpointFromBinary is still verified in fromBinary_null_returns_null. README test count updated: 50 (23+16+11).
…tic unit tests tgeogpoint_in() writes "got NULL for SRID (4326)" to native stderr when meos_set_spatial_ref_sys_csv() has not been called, crashing the surefire forked JVM. The previous workaround (dropping the tgeogpoint round-trip test) was reverted. The correct fix is to load the bundled spatial_ref_sys.csv from the test classpath in @BeforeAll, mirroring MobilitySparkSession.registerSpatialRefSys(). tgeogpointFromBinary_round_trips() is now fully verified on all platforms including CI. Test count restored to 51 (23+17+11). README updated to match.
Patch utils.JarLibraryLoader to add macOS (libmeos.dylib) and fix Windows (libmeos.dll) native library loading in addition to the existing Linux path. The CI branch now also checks DYLD_LIBRARY_PATH so macOS GitHub Actions jobs can set that env var after building MEOS from source. CI workflow (maven.yml) gains two new jobs: - macos: builds libmeos.dylib from MobilityDB source via Homebrew deps, sets DYLD_LIBRARY_PATH, and runs the full 57-test suite. - windows: MSYS2/UCRT64 bootstrap; marked continue-on-error while the MEOS Windows standalone build stabilises. README updated with per-platform setup instructions (§2.2–2.4). All 57 Linux tests remain green.
- Add BerlinMOD Q1-Q17 portable SQL (18/18 PASS on MobilityDB/MobilityDuck/MobilitySpark) - Add benchmark query fixtures (vehicles, query_points/regions/licences/periods/instants) - Add three-platform benchmark driver with JSON timing output (BerlinMODBench) - Fix SRID consistency in eIntersects UDF: extract trip bbox SRID and pass to geo_from_text so ensure_same_srid(3857, 3857) passes instead of failing on SRID=0 WKT geometry - Add 100 new MEOS 1.3 UDFs across STBoxUDFs, SpanAccessorUDFs, TTextUDFs + 94 unit tests - Fix JVM crashes from uninitialised MEOS: add MeosThread.ensureReady() to 9 missing UDF classes (AccessorUDFs, AnalyticsUDFs, ConstructorUDFs, PredicateUDFs, SpanAccessorUDFs, SpanAlgebraUDFs, SpanUDFs, STBoxUDFs, TTextUDFs) — prevents NULL session_timezone SIGSEGV and temporal_as_hexwkb crashes in executor threads and surefire forks - Exclude berlinmod/data/trips.csv from git (138 MB, generated locally)
Running with local[*] on a 16-core machine created 16 concurrent MEOS threads, triggering pg_tm buffer races and GEOS context races that caused JVM crashes. Each crash wrote a 3-5 GB core dump, which OOM-killed WSL2 and forced a terminal reboot. Three changes to both run_mspark.sh and bench_mspark.sh: - local[*] → local[2]: safe concurrency level for this dataset scale - ulimit -c 0: suppress core dump files so a crash cannot OOM WSL2 - spark.driver.extraJavaOptions with java.library.path: ensures Spark always loads the libmeos.so with all thread-safety fixes installed, regardless of LD_LIBRARY_PATH state
…on; fix CI - Rename libs/JMEOS-1.5.jar to libs/JMEOS-1.4.jar (ecosystem policy: JMEOS version number must match the MEOS API version it implements, currently 1.4) - Update pom.xml dependency from version 1.5 to 1.4 - Add MeosThread.wrap() helpers (UDF1/UDF2/UDF3) so registerAll() can wrap lambdas at registration time, eliminating per-method ensureReady() boilerplate - Fix CI: add install-file step for JMEOS-1.4; use lib/libmeos.so (MEOS 1.4, has meos_initialize_noexit_error_handler) instead of extracting from JMEOS-1.3 - 94/94 unit tests pass locally
- Write mspark.json after every query completes (not only at the end) so a JVM crash still leaves a valid file with all timings collected so far, and reveals exactly which query triggered the crash. - Use an atomic write (write-to-tmp then rename) so a crash during the JSON write itself cannot corrupt the previous file.
be94aeb to
4a540a5
Compare
4a540a5 to
49e323b
Compare
🎉 Complete coverage of the active addressable MobilityDB SQL surface.
907/907 unit tests green. Compare to MobilityDuck 79.3% (current).
Adds ~315 UDFs across 16 new files + extends 12 existing files.
Coverage trajectory: 51% → 100% across the parity push. All 51 active
sections now at 100%.
==== New UDF classes ====
- TPointSTBoxOpsUDFs: 42 cross-type STBox×TPoint positional/topological
- TBoxOpsUDFs: 39 cross-type TBox×TNumber positional/topological
- SpansetOpsUDFs: 23 cross-type Span/Spanset positional/topological
- TemporalCompUDFs: 26 temporal comparison ops (teq/tne/tlt/tle/tgt/tge)
- TemporalBoxOpsUDFs: 30 cross-type box predicates
- AlwaysSpatialRelsUDFs: 12 'always' spatial-relationship predicates
- SetOpsUDFs: set×set positional + topological + per-type distance
- IOAliasUDFs: 100+ typed *From{HexWKB,Binary,Text,EWKT,EWKB,MFJSON} aliases
- SubtypeConstructorUDFs: typed Inst/Seq/SeqSet aliases + accessors
- AccessorAliasUDFs: typed span/spanset width, dates, valueSpan, set-values
arrays, tboxes/stboxes/spans (array-returning), bins, splits, valueSet,
segmentMin/MaxDuration, box2d, box3d (PostGIS embedded in MEOS),
mobilitydbVersion, avgValue, tgeometry/tgeography conversions, quadSplit,
getBin/timestamptzGetBin
- BucketUDFs: floatBucket, intBucket
- GeoAffineUDFs: translate/translate3, rotate, rotateX/Y/Z, transscale, affine
- TileUDFs: complete multi-dimensional tiling for parallel processing —
spaceBoxes / spaceTimeBoxes / valueTimeBoxesT{float,int} / time/value
Boxes/Tiles/Splits, getTimeTile / getSpaceTile / getSpaceTimeTile /
getStboxTimeTile / getValueTile / getValueTimeTile / getTBoxTimeTile,
spaceTiles / spaceTimeTiles / stbox/tint/tfloatTimeTiles, makeSimple
(Temporal** array of simple sub-tpoints), tfloat/tintValueTiles,
tfloat/tintValueSplit (Temporal** with Datum vsize/vorigin via IEEE bits),
tfloat/tintValueTimeSplit, geoMeasure (tpoint+tfloat → geometry),
asMVTGeom (tpoint → array of WKT geometries clipped to STBox bounds)
- SeqSetGapsUDFs: tbool/tint/tfloat/ttext/tgeompoint/tgeogpoint/tgeometry/
tgeographySeqSetGaps (closes long-standing user request from MobilityDB
issue #187 — array-of-instants → tsequenceset_make_gaps with native
TInstant** packing)
==== Extended existing UDF classes ====
- GeoUDFs, DistanceUDFs, GeoAnalyticsUDFs, STBoxUDFs, TBoxUDFs,
SimilarityUDFs, TTextUDFs, TransformUDFs, BoolOpsUDFs, TemporalUDFs,
AccessorUDFs, SpanAlgebraUDFs — see docs/parity-status.md for full per-
section coverage
==== MeosNative.java (new) ====
Supplementary JNR-FFI interface for ~70 MEOS-1.4 symbols not yet in
JMEOS-1.4: nad/nai/shortestline_tgeo_*, {dir}_stbox_tspatial /
_tspatial_stbox, float/int_get_bin, t{float,int}box_expand,
tgeometry/tgeography_in/_from_mfjson, temporal_mem_size, tgeoinst_make,
temporal_before/after_timestamptz, textcat_ttext_*, mobilitydb_version,
intset/bigintset/floatset_value_n out-param accessors, tnumber_avg_value,
tgeo*-to-tgeo* conversions, span_expand/_bins, tnumber/tgeo_split_*_n_*,
tnumber_tboxes / tgeo_stboxes, tpoint_minus_geom / _direction /
_make_simple, temporal_dyntimewarp_path / _frechet_path, tgeo_affine,
temporal_time_bins / tstzspan_bins / t{int,float}_value_bins,
stbox_quad_split, timestamptz_get_bin, stbox_get_space/time/space_time_tile,
tgeo_space/space_time_boxes, tnumber_value_time_boxes (Datum via long),
temporal_time_split / tgeo_space_split / tgeo_space_time_split (Temporal**
+ bin out-params), temporal_values_p + set_make_free + temptype_basetype
(valueSet path), temporal_segm_duration, stbox_to_box3d / _to_gbox +
box3d_out / gbox_out (PostGIS BOX3D/BOX2D embedded in MEOS),
stbox_space/time/space_time_tiles, t{int,float}box_time/value/value_time
_tiles, tnumber_value_split / _value_time_split (Datum splits with IEEE
bit-packed vsize/vorigin), tbox_get_value_time_tile (single-tile lookup
with MeosType basetype/spantype enum dispatch), tpoint_tfloat_to_geomeas,
tpoint_as_mvtgeom, tnumber_to_tbox.
==== Audit infrastructure ====
scripts/parity-audit.py — regenerable. Match strategy: snake_case →
camelCase, type-prefix stripping, wrapper-style dispatcher recognition,
type-suffix matching. Out-of-scope buckets:
- Section-level: GiST/SPGiST opclasses, set/span/spanset index files,
019_geo_constructors (PG geometric types), 999_oid_cache
- Suffix-level: PG plumbing (_in/_out/_recv/_send, _transfn/_combinefn/
_finalfn/_serialize/_deserialize, _sel/_joinsel/_supportfn/_analyze,
_typmod_in/_out, _cmp/_eq/_ne/_lt/_le/_gt/_ge/_hash/_hash_extended)
- Exact name: range/multirange (PG range types, NOT in MEOS),
create_trip (BerlinMOD generator, PG-only), transform_gk (SECONDO
Gauss-Krüger projection)
Note: box2d/box3d ARE addressable (PostGIS embedded in MEOS).
Deferred families: cbuffer, npoint, pose, rgeo.
docs/parity-status.md — per-section coverage report (regenerable).
49e323b to
aaaa05e
Compare
… 10 residuals JMEOS regenerated against MEOS 1.4 amalgamated headers (JMEOS PR MobilityDB#15) exposes ~120 of the symbols previously bound by MobilitySpark's supplementary MeosNative.java JNR-FFI interface. This commit: * bumps libs/JMEOS-1.4.jar to the regenerated artefact * migrates ~120 MeosNative.INSTANCE.X callsites to functions.X (or functions.MeosLibrary.meos.X for the long-typed timestamp / out-param functions where the OffsetDateTime wrapper is unwanted) * trims MeosNative.java from 326 lines / 133 method declarations to 81 lines / 10 declarations — the residuals all live in MEOS private headers (meos_internal.h, meos_internal_geo.h, temporal/temporal.h, temporal/meos_catalog.h) and use Datum / MeosType parameters that the JMEOS generator does not currently lower: mobilitydb_version, mobilitydb_full_version, temporal_values_p, set_make_free, temptype_basetype, temporal_mem_size, tnumber_value_split, tnumber_value_time_split, tnumber_value_time_boxes, tbox_get_value_time_tile * fixes a handful of MEOS 1.4 API-rename callsites surfaced by the regen: temporal_value_at_timestamptz → tgeo_value_at_timestamptz, acontains_geo_tpoint → acontains_geo_tgeo, tpoint_transform_pipeline → tspatial_transform_pipeline, temporal_to_tsequence(string interp) → (int interp), temporal_append_tinstant(temp, inst, …) → (temp, inst, interp, …), temporal_lower_inc / _upper_inc → boolean directly (no "!= 0") Tests: 907/907 green (unchanged from pre-regen baseline).
After JMEOS PR MobilityDB#15 added Datum -> long and MeosType -> int generator lowering plus the 10 private-header extern declarations to its amalgamated MEOS header, every MEOS symbol called by MobilitySpark is exposed by functions.functions.* and there is no longer any reason to maintain a parallel JNR-FFI interface in this repository. Removed: - src/main/java/org/mobilitydb/spark/MeosNative.java (was 81 lines / 10 declarations after the previous trim) - 'import org.mobilitydb.spark.MeosNative' from 5 callsite files Migrated 13 callsites across AccessorAliasUDFs, TileUDFs, and SubtypeConstructorUDFs: mobilitydb_version -> functions.mobilitydb_version mobilitydb_full_version -> functions.mobilitydb_full_version temporal_mem_size -> functions.temporal_mem_size temptype_basetype -> functions.temptype_basetype temporal_values_p -> functions.temporal_values_p set_make_free -> functions.set_make_free tnumber_value_split -> functions.MeosLibrary.meos.tnumber_value_split tnumber_value_time_split -> functions.MeosLibrary.meos.tnumber_value_time_split tnumber_value_time_boxes -> functions.MeosLibrary.meos.tnumber_value_time_boxes tbox_get_value_time_tile -> functions.MeosLibrary.meos.tbox_get_value_time_tile Tests: 907 / 907 green.
d591b53 to
d4c08a3
Compare
…handler The noexit error handler was added to MEOS in 9ee6cf721 (May 9, JVM- crash safety) and removed again in ae43d2f4a (May 10, JSONB integration commit that reverted the related thread-safety patch in error.c). JMEOS PR MobilityDB#15 followed suit and dropped the symbol from the regen amalgam (it was no longer in libmeos.so). MobilitySpark callers — three sites: MeosThread.java's per-thread init, MobilitySparkSession.create(), and NativeMemoryLeakTest's @BeforeAll — now install the handler via Class.getMethod() + invoke() and silently fall through if the symbol is absent. Net behaviour: * MEOS installed with noexit (older builds): handler installed, crashes prevented, BerlinMOD memory-leak tests run end-to-end. * MEOS installed without noexit (current branch): handler skipped; MEOS reverts to default_error_handler which calls exit() on any error. 845 / 907 MobilitySpark tests still pass. The 62 that don't are GeoUDFsExt5Test + STBoxUDFsTest, which trigger MEOS error paths that now tear down the JVM. Restoring noexit upstream brings the count back to 907 / 907. Also bumps libs/JMEOS-1.4.jar to the regen artefact from JMEOS PR MobilityDB#15 commit 490ca07 (scripts + smoke test + dropped 2 missing externs).
Pulls in JMEOS PR MobilityDB#15 (rebased) which now includes the dropped 'inline' fix + the noexit handler from MobilityDB PR #939. Once PR #939 lands and JMEOS PR MobilityDB#15 merges, MobilitySpark goes from 845 / 907 (reflective fallback installed by eb58420) to 906 / 907 (noexit installed natively). The remaining 1 failure is MathUDFsExtTest.tnumberTrend_tint — fixture passes a tint sequence (default STEP interpolation) to tnumber_trend() which validates linear interpolation. Tracked as a separate fixture-fix follow-up.
tnumber_trend requires linear interpolation; tint sequences default to step interpolation, so MEOS validates and returns NULL. The previous test asserted non-null, which only held while MEOS was lenient about this validation; the validation has tightened in the current source tree. Renames tnumberTrend_tint_returns_nonnull -> tnumberTrend_tint_step_returns_null and inverts the assertion to document the actual MEOS behaviour. The tfloat case at line 95 covers the main code path. Tests: 907 / 907 green.
…the helper
Once MobilityDB PR #939 is treated as landed (per the issued-PR-as-landed
policy), meos_initialize_noexit_error_handler exists in mainline meos.h
and libmeos.so. The reflective Class.getMethod() dance that survived
both the symbol-present and symbol-absent cases is no longer needed.
Three callsites simplified back to a direct call:
- MeosThread.java per-thread MEOS init
- MobilitySparkSession.java session-level init
(delegated to MeosThread.ensureReady;
duplicate meos_initialize/timezone calls
also removed)
- NativeMemoryLeakTest.java test-suite @BeforeAll
Net: ~24 lines of indirection removed across 3 files, plus one
unused 'import functions.functions' in the test.
Tests: 907 / 907 green.
MeosNative.java was deleted in commit 06765e2; tboxExpandFloat / tboxExpandInt are now wired directly via functions.tfloatbox_expand / tintbox_expand. Comment had no actionable content.
This was referenced May 11, 2026
…EADY MEOS spatial functions (eIntersects, eContains, eDwithin, etc.) call into GEOS through liblwgeom. GEOS 3.12 routes every reentrant function through a thread-local context handle. The first reentrant call on a thread that has not invoked `GEOS_init_r()` raises `context handle is uninitialized, call initGEOS` and aborts the JVM. MEOS's internal spatial helpers call `initGEOS(lwnotice, lwgeom_geos_error)` lazily on first use, but the call is not thread-safe — two Spark task threads racing through the same MEOS helper corrupt the global GEOS state. Bind libgeos_c.so via JNR-FFI and call `GEOS_init_r()` from the per-thread `MEOS_READY` `ThreadLocal` initialiser. Each Spark task thread now gets its own GEOS context the first time it enters `ensureReady()`, before any MEOS spatial UDF can race the global init. Verified by running BerlinMOD Q2 (`eIntersects(t.trip, r.geom)`) end to end on Spark `local[1]`. Without this fix the JVM aborts at the first spatial UDF call. `local[2]` and higher still hit a separate race inside MEOS's internal `initGEOS(lwnotice, lwgeom_geos_error)` call sequence (the lwgeom callbacks are not reentrant). Closing that race needs MEOS- side changes — out of scope for this Spark commit.
The Spark master defaults to local[4] (validated against MobilityDB/MobilityDB#949 + #815, which together make MEOS thread-safe across GEOS, WKT/GMT, errno and timezone). Users can override with SPARK_MASTER=local[N] for tuned thread counts. Validation on local[4]: Q1: 420 ms, Q2: 43.4 s (2.05x speedup vs local[2]), Q3: 40.2 s, Q4: 46.5 s. Clean exit, no hs_err_pid.
meos_initialize() owns the per-thread GEOS context handle (mirroring the existing PROJ pattern in MEOS). MeosThread.MEOS_READY only needs to call meos_initialize, meos_initialize_timezone and the noexit error handler — no separate JNR-FFI binding to libgeos_c is required. Validated on --master local[4]: Q1: 420 ms, Q2: 43.4 s, no SIGSEGV, no hs_err_pid. Depends on MobilityDB/MobilityDB#949 (per-thread GEOS context inside MEOS).
DistanceUDFs.registerAll() previously aliased "nearestApproachDistance" to nadTgeoGeo (tgeo × geometry). GeoUDFs.registerAll() registers the same name to the tgeo × tgeo lambda (which calls nad_tgeo_tgeo via temporal_from_hexwkb). Because registerAll runs in alphabetical order of UDF classes, DistanceUDFs shadowed GeoUDFs and resolved the bare "nearestApproachDistance" call to nadTgeoGeo. Q5 of BerlinMOD calls nearestApproachDistance(t1.trip, t2.trip) — both tgeompoint. Under the shadowed registration, the second tgeo's hex- WKB string was passed to geo_from_text, which returned a parse error on every cross-join row. The tgeo × tgeo registration in GeoUDFs is what MobilityDB exposes under the bare SQL name; keep it. Callers wanting tgeo × geometry use the explicit "nadTgeoGeo" name. Validated: Q5 of MobilitySpark BerlinMOD on local[4]: 508 s (matches the MobilityDB and MobilityDuck reference timings within the cross- join cost).
H3IndexJnrBindings loads four MEOS H3 symbols directly through JNR-FFI: tgeompoint_to_th3index, geo_to_h3index_set, ever_eq_th3index_th3index, and ever_eq_anyof_h3indexset_th3index. This sidesteps the JMEOS function generator's missing H3Index typedef support, so the h3 prefilter surface runs against the mainline JMEOS-1.4 jar. Th3IndexPrefilterUDFs registers four Spark UDFs that wrap the JNR bindings with hex-WKB string marshalling consistent with the rest of the MobilitySpark UDF surface: tgeompointToTh3index(STRING, INTEGER) -> STRING geoToH3IndexSet(STRING, INTEGER) -> STRING everEqTh3IndexTh3Index(STRING, STRING) -> BOOLEAN everIntersectsH3IndexSetTh3Index(STRING, STRING) -> BOOLEAN These match the MobilityDuck h3 prefilter surface (PR #131 on MobilityDuck) and the MobilityDB SQL operator names, so the BerlinMOD th3index portable SQL has a uniform shape across the three platforms for the cross-join queries (Q4, Q5, Q6, Q7, Q10, Q11, Q12, Q15, Q17).
The MEOS H3 symbols (geo_to_h3index_set, ever_eq_anyof_h3indexset_th3index, etc.) are compiled into libmeos.so but the binary may not declare libh3 as a DT_NEEDED dependency. The JVM loader hits an undefined-symbol error on degsToRads / radsToDegs when MobilitySpark's h3 prefilter UDF makes its first JNR-FFI call. Set LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libh3.so by default; allow LIBH3=/path override.
close() runs before spark.stop() in the standard try-with-resources benchmark/usage pattern, so meos_finalize() tears down MEOS global and per-thread TLS state while Spark executor threads are still alive; their subsequent teardown then double-frees the already-finalized MEOS TLS, aborting the JVM with double free or corruption (fasttop) during shutdown. The OS reclaims native MEOS memory at JVM exit, so the explicit finalize is unnecessary and unsafe in the Spark and surefire lifecycles; it belongs only in a standalone main that owns the whole JVM with no live MEOS-using threads at exit.
expandSpace and geoTimeStbox serialised the STBox with stbox_as_hexwkb(box, (byte) 0, ...). WKB variant 0 omits the SRID, so bboxOverlaps re-parsing it via stbox_from_hexwkb gets SRID 0; overlaps_tspatial_stbox then compares an SRID-3812 trip against an SRID-0 box, returns false for every pair, and Q10's WHERE ... AND bboxOverlaps(t2.trip, expandSpace(t1.trip, 3)) silently drops all matches (0 rows instead of the expected count). Serialise with WKB_EXTENDED (0x04) so the SRID round-trips; Q10 then returns the correct rows, matching MobilityDB's native && operator.
CI vendors $GITHUB_WORKSPACE/lib/libmeos.so for the unit tests (.github/workflows/maven.yml + pom surefire -Djava.library.path). The committed binary was a stale MEOS build predating the ensure_linear_interp guard in tnumber_trend, so tnumber_trend on a step-interpolated tint returned a computed trend instead of NULL, deterministically failing MathUDFsExtTest.tnumberTrend_tint_step_returns_null (expected null, got a tfloat hex-WKB). The test and the AnalyticsUDFs.tnumberTrend wrapper are correct against current MEOS: verified that the current libmeos returns NULL for that exact input while the stale one returns non-null. Replace lib/libmeos.so with a current MEOS 1.4 build that carries the guard.
This reverts commit ca9676d.
…lityDB State present coverage only (858/858 active addressable temporal+geo, 100%) with the scope partition and deferred families shared with MobilityDuck; drop dated-milestone and changelog narrative. parity-status.md regenerated from scripts/parity-audit.py against current MobilityDB master.
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/parity-status.md(audit script:scripts/parity-audit.py). Compare to MobilityDuck 79.3%.feat(parity)commit per the "1 feature = 1 commit" ecosystem policy.Per-section coverage
22 of 51 active sections at 100%. See
docs/parity-status.mdfor the full table.Sections still under 100% are dominated by:
Methodology
Adapted from
MobilityDuck/scripts/parity-audit.pywith two MobilitySpark-specific enhancements:spark.udf().register("name", ...)fromsrc/main/java/**/*.javatnumber/tpoint/tgeo/…), wrapper-style dispatcher recognition (temporal_above↔stboxAboveTpoint), type-suffix matching (always_eq↔alwaysEqTintInt)Same out-of-scope and deferred bucketing as MobilityDuck:
_in/_out/_recv/_send,_transfn/_combinefn/_finalfn,_sel/_joinsel/_supportfn/_analyze, btree opclass supportTest plan
mvn test— 907/907 green on Linux (Java 21, Spark 3.5)BerlinMODBenchThe single
feat(parity)commit body lists every UDF added/extended and the newMeosNativesymbols.