Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
e17f8ca
PeriodUDT simple implementation.
Aug 7, 2023
822ae6d
Added comments and definitions for PeriodUDT class.
Aug 7, 2023
88d7b1b
Deleted JMEOS jars from git tracking, added to gitignore. Added Regis…
Aug 7, 2023
69b8234
Partially implemented UDFs for PeriodUDT class.
Aug 8, 2023
5975da9
Add tgeompointinst UDF
Aug 2, 2023
97cd188
Add PeriodSet
Aug 4, 2023
747c016
Update pom
Aug 4, 2023
62a51c9
Modify main
Aug 7, 2023
f5de6a6
Add period set UDT
Aug 8, 2023
6639c3f
Finished UDFs and UDF registrator.
Aug 9, 2023
2fdc6cd
Finished UDFs and UDF registrator.
Aug 9, 2023
0e192f9
Added sample tests and testing utility.
Aug 10, 2023
8e699f2
Started working on TimestampSet UDTs
Aug 10, 2023
3cf33a6
Merge pull request #2 from satriabw/period-implementation
Action52 Aug 10, 2023
3b1c81c
Add PeriodSet registrator for UDT and UDF
Aug 8, 2023
c36010b
Add the rest of PointUDF implemenation
Aug 9, 2023
3fa4056
Implement Period Set UDF
Aug 10, 2023
c6240cb
Implemented basic version of TimestampSetUDT. Also modified examples …
Aug 11, 2023
c9b4e45
Implement PeriodSet
Aug 11, 2023
c33e269
Skip the test for now
Aug 11, 2023
a3b2309
Merge pull request #3 from satriabw/satria/poc
Aug 11, 2023
0505997
Implemented changes on Period using Binaries.
Aug 16, 2023
24b5271
Implemented new structure with MeosDatatype as parent class for Spark…
Aug 16, 2023
f290586
Reformated Factory to minimize reduncancies
Aug 17, 2023
45d193f
Merge pull request #4 from satriabw/meos-datatype
Action52 Aug 17, 2023
de5ae39
Merge branch 'develop' into timestampset-implementation
Action52 Aug 17, 2023
ca441df
Merge pull request #6 from satriabw/timestampset-implementation
Aug 17, 2023
78180d4
Add implementation for ais dataset
Aug 25, 2023
91afa4c
Tidy up implementation
Aug 25, 2023
7724506
Finish AISDataExample implementation
Aug 28, 2023
7b1109e
Merge pull request #7 from satriabw/feature/tpoint
Aug 28, 2023
212192f
feat(spark): JMEOS 1.3 + BerlinMOD Q1-Q17 + edge-to-cloud pipeline — …
estebanzimanyi May 7, 2026
3e17bda
ci: exclude legacy sources from license check; add header to Main.java
estebanzimanyi May 8, 2026
c8b182a
test: don't call meos_finalize in unit test teardown
estebanzimanyi May 8, 2026
f13da5c
test: remove all meos_finalize/ms.close calls from test teardown
estebanzimanyi May 8, 2026
59ebace
fix(meos): bundle spatial_ref_sys.csv and register it on session create
estebanzimanyi May 8, 2026
e939f29
feat(parquet): add tintFromBinary, tfloatFromBinary, tboolFromBinary,…
estebanzimanyi May 8, 2026
9a45b95
feat(parquet): add span/spanset fromBinary UDFs (tstzspan, intspan, f…
estebanzimanyi May 8, 2026
d026fde
feat(parquet): add tgeompointFromBinary + tgeogpointFromBinary; updat…
estebanzimanyi May 8, 2026
5a5ff76
fix(test): drop tgeogpoint unit test requiring SRS setup; fix README …
estebanzimanyi May 8, 2026
a25da95
fix(test): register spatial_ref_sys.csv in @BeforeAll to enable geode…
estebanzimanyi May 8, 2026
092ee60
feat(platform): add macOS and Windows support via patched JMEOS-1.4.jar
estebanzimanyi May 8, 2026
33b6d2d
fix(ci): remove invalid shell: pwsh on uses: step in Windows job
estebanzimanyi May 8, 2026
e07dc3f
feat(spark): BerlinMOD Q1-Q17 + UDFs + benchmark + JVM crash fixes
estebanzimanyi May 9, 2026
c2b93c4
fix(bench): use local[2], ulimit -c 0, and pin java.library.path
estebanzimanyi May 9, 2026
522cf6c
fix(build): pin java.library.path to /usr/local/lib in surefire; upda…
estebanzimanyi May 9, 2026
473404f
refactor(jmeos): rename JMEOS-1.5 → JMEOS-1.4 to match MEOS API versi…
estebanzimanyi May 9, 2026
2892eb0
fix(bench): flush results to JSON after each query; use atomic write
estebanzimanyi May 9, 2026
1b853e6
feat(bench): add --quick flag (--runs 1) for crash-safety verification
estebanzimanyi May 9, 2026
cbf79ff
feat(bench): add --queries range selector for targeted crash bisection
estebanzimanyi May 9, 2026
77dbe5e
fix(memory): free MEOS native objects in all UDFs to prevent OOM crash
estebanzimanyi May 9, 2026
8420c09
test(memory): add NativeMemoryLeakTest — VmRSS-based native leak dete…
estebanzimanyi May 9, 2026
1a47ed9
fix(bench): use tdwithin_tgeo_tgeo in tDwithin UDF (q10 fix)
estebanzimanyi May 9, 2026
c7d55e4
fix(berlinmod): ORDER BY alias in q12 + richer error output in bench
estebanzimanyi May 9, 2026
85f915d
feat(udfs): add 5 UDF groups for full operator parity — 166 tests green
estebanzimanyi May 10, 2026
62d2c28
feat(udfs): add 4 UDF groups + 13 UDAFs — 235 tests green
estebanzimanyi May 10, 2026
c449e17
fix(build): prioritise bundled lib/libmeos.so in surefire java.librar…
estebanzimanyi May 10, 2026
18d380b
feat(udfs): add DistanceUDFs, extend RestrictionUDFs and TransformUDF…
estebanzimanyi May 10, 2026
5e3c5c0
feat(udfs): add transcendental math, trend, tboolWhenTrue, tpointIsSi…
estebanzimanyi May 10, 2026
d6409ff
feat(udfs): add span/spanset/stbox/elevation restriction UDFs — 265 t…
estebanzimanyi May 10, 2026
176c03d
feat(udfs): add tintAtValue, tnumber span/spanset restriction, tgeoMi…
estebanzimanyi May 10, 2026
0a11203
feat(udfs): add cumulative length, traversed area, shift/scale time —…
estebanzimanyi May 10, 2026
2536c74
feat(geo): add StaticGeoUDFs — 17 static geometry predicates/metrics/…
estebanzimanyi May 10, 2026
c8bbb7e
feat(temporal): add 10 UDFs — temporal comparisons, tintToTfloat, tpr…
estebanzimanyi May 10, 2026
5569168
feat(geo): add 6 STBox analytics UDFs — area, perimeter, volume, isGe…
estebanzimanyi May 10, 2026
17528f8
feat(temporal): add TBoxUDFs — 13 TBox accessor/span-conversion UDFs …
estebanzimanyi May 10, 2026
382eaf4
feat(geo): add ever/always scalar predicates + tgeo×tgeo temporal rels
estebanzimanyi May 10, 2026
548755c
feat(temporal): MFJSON I/O, text output, and tint shift/scale UDFs
estebanzimanyi May 10, 2026
1e4b7a9
feat(temporal): ever_ne/always_ne predicates + value_at_timestamptz a…
estebanzimanyi May 10, 2026
c887dc4
feat(temporal): tintValueN, tintMinusValue, temporalDeleteTimestamptz…
estebanzimanyi May 10, 2026
45e91a1
feat(temporal): parity batch — 85 new UDFs, 642 tests green
estebanzimanyi May 10, 2026
2f9ccea
feat(udfs): set value accessors, ttext_values, geo I/O UDFs (701 tests)
estebanzimanyi May 10, 2026
6af23fe
feat(udfs): tstzspanset extra accessors + tpointFromBaseTemp construc…
estebanzimanyi May 10, 2026
bbdbd58
feat(udfs): parity batch — Transform/Restriction/Similarity/SpanAlgeb…
estebanzimanyi May 10, 2026
3c93c3a
fix(safety): replace local[*] with local[2] in all configs and docs
estebanzimanyi May 10, 2026
5965800
chore(libs): remove stale JMEOS jars — only JMEOS-1.4.jar is active
estebanzimanyi May 10, 2026
e77ecf1
feat(geo): add tpoint I/O, SRID, round, bounding-box, and convex-hull…
estebanzimanyi May 10, 2026
18d6b66
fix(demo): update BerlinMOD UDFs for MEOS 1.4 renamed symbols
estebanzimanyi May 10, 2026
841dc22
feat(bench): resumable BerlinMOD benchmark + --queries selector
estebanzimanyi May 10, 2026
8fda0fd
feat(parity): MobilityDB SQL surface parity at 100% (858/858)
estebanzimanyi May 10, 2026
038501e
feat(perf): th3index spatial prefilter for cross-join queries (Stage 2)
estebanzimanyi May 10, 2026
a986758
feat(perf): extend th3index prefilter to trip×trip cross-joins (Q5/Q6…
estebanzimanyi May 10, 2026
b133a57
feat(perf): cross-platform th3index prefilter — portable SQL + PG GiS…
estebanzimanyi May 10, 2026
e9871d6
feat(perf): include trip_h3 in setup/generate_data.sh trips.csv override
estebanzimanyi May 10, 2026
c12e257
feat(h3): 100% public-API parity in Th3IndexUDFs (86 UDFs)
estebanzimanyi May 10, 2026
6b238bc
feat(perf): polygon-side prefilter — adopt MobilityDB #938's static-g…
estebanzimanyi May 11, 2026
8b5c612
Add minDistance UDFs and adopt spatial-min Q5 form
estebanzimanyi May 14, 2026
e7101af
Merge remote-tracking branch 'upstream/main' into feat/mindistance-ud…
estebanzimanyi May 15, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
name: Maven CI

on:
push:
branches: ["main", "feat/**", "fix/**"]
paths-ignore:
- "**/*.md"
- "doc/**"
pull_request:
branches: ["main", "feat/**", "fix/**"]
paths-ignore:
- "**/*.md"
- "doc/**"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
build:
name: Build and test (Java 21 / Spark 3.5)
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4

- name: Set up Java 21
uses: actions/setup-java@v4
with:
distribution: temurin
java-version: "21"
cache: maven

- name: Install libmeos runtime dependencies
run: |
sudo apt-get update -qq
sudo apt-get install -y libjson-c5 libgeos-c1t64 libproj25 libgsl27

- name: Set up libmeos.so and LD_LIBRARY_PATH
run: |
mkdir -p /tmp/libmeos
cp "$GITHUB_WORKSPACE/lib/libmeos.so" /tmp/libmeos/libmeos.so
echo "LD_LIBRARY_PATH=/tmp/libmeos" >> "$GITHUB_ENV"

- name: Install JMEOS 1.4 to local Maven repository
run: |
mvn install:install-file \
-Dfile=libs/JMEOS-1.4.jar \
-DgroupId=org.jmeos \
-DartifactId=jmeos \
-Dversion=1.4 \
-Dpackaging=jar \
-q

- name: License header check
run: bash tools/scripts/check_license.sh

- name: Compile
run: mvn -B compile

- name: Unit tests
run: mvn -B test

- name: Package (fat jar)
run: mvn -B package -DskipTests

- name: Upload fat jar
uses: actions/upload-artifact@v4
with:
name: mobilityspark-spark.jar
path: target/*-spark.jar
11 changes: 9 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,21 @@
.project
.settings/

# Intellij
# IntelliJ IDEA
.idea/
*.iml
*.iws
*.ipr

# Mac
# macOS
.DS_Store
**/.DS_Store

# Maven
log/
target/

# Large BerlinMOD benchmark data (generated locally — too large for GitHub)
berlinmod/data/trips.csv
dependency-reduced-pom.xml
hs_err_pid*.log
22 changes: 22 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
-------------------------------------------------------------------------------
This MobilityDB code is provided under The PostgreSQL License.

Copyright (c) 2020-2025, Université libre de Bruxelles and MobilityDB
contributors

Permission to use, copy, modify, and distribute this software and its
documentation for any purpose, without fee, and without a written agreement is
hereby granted, provided that the above copyright notice and this paragraph and
the following two paragraphs appear in all copies.

IN NO EVENT SHALL UNIVERSITE LIBRE DE BRUXELLES BE LIABLE TO ANY PARTY FOR
DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS DOCUMENTATION,
EVEN IF UNIVERSITE LIBRE DE BRUXELLES HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGE.

UNIVERSITE LIBRE DE BRUXELLES SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE SOFTWARE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
UNIVERSITE LIBRE DE BRUXELLES HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE,
SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
192 changes: 192 additions & 0 deletions berlinmod/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,192 @@
# BerlinMOD Portable SQL — Cross-Platform Verification

This directory contains BerlinMOD benchmark queries in the **RFC #861 portable
dialect** — using named functions only, no MobilityDB-specific operator symbols.
The same SQL files run unchanged on all three platforms.

| Platform | Engine | Extension |
|---|---|---|
| [MobilityDB](https://github.com/MobilityDB/MobilityDB) | PostgreSQL | `CREATE EXTENSION mobilitydb` |
| [MobilityDuck](https://github.com/MobilityDB/MobilityDuck) | DuckDB | `LOAD mobilitydb` (community) |
| [MobilitySpark](https://github.com/MobilityDB/MobilitySpark) | Apache Spark | `MobilitySparkSession.create(spark)` |

---

## Schema

All three platforms use the same schema:

```
Vehicles (vehId INT, licence TEXT, type TEXT, model TEXT)
Trips (tripId INT, vehId INT, trip TEXT) -- tgeompoint hex-WKB
QueryLicences (licenceId INT, licence TEXT)
QueryInstants (instantId INT, instant TIMESTAMPTZ)
QueryPoints (pointId INT, geom TEXT) -- geometry WKT, SRID 0
QueryRegions (regionId INT, geom TEXT) -- polygon WKT, SRID 0
QueryPeriods (periodId INT, period TEXT) -- tstzspan literal
```

**Storage conventions:**
- `tgeompoint` values → hex-WKB STRING (`temporal_as_hexwkb` / `temporal_from_hexwkb`)
- `geometry` / polygon values → WKT STRING, parsed via `geo_from_text` / `ST_GeomFromText`
- `tstzspan` values → literal STRING `"[t1,t2]"`, cast to `tstzspan` by each platform

Storing temporal/geometry values as portable text keeps the CSV files
human-readable across all platforms without requiring platform-specific binary
encoding.

---

## Queries

| File | Query | Temporal operations |
|------|-------|---------------------|
| `q01.sql` | Vehicle models for query licences | none (baseline relational join) |
| `q02.sql` | Licence plates of vehicles that ever entered a query region | `eIntersects(tgeompoint, geometry)` |
| `q03.sql` | Position of query-licence vehicles at each query instant | `atTime(tgeompoint, timestamptz)` |
| `q04.sql` | Vehicles that ever passed a query point | `eIntersects(tgeompoint, geometry)` |
| `q05.sql` | Min nearest-approach distance between query-licence pairs | `nearestApproachDistance(tgeompoint, tgeompoint)` |
| `q06.sql` | Truck pairs within 10 m | `eDwithin(tgeompoint, tgeompoint, float)` |
| `q07.sql` | Trip portions of query-licence vehicles during each query period | `atTime(tgeompoint, tstzspan)` |
| `q08.sql` | Trajectory geometry of each trip | `trajectory(tgeompoint)` |
| `q09.sql` | Longest distance driven by any vehicle in each query period | `atTime`, `length` |
| `q10.sql` | When did query-licence vehicles meet others (within 3 m)? | `expandSpace`, `tDwithin`, `whenTrue` |
| `q11.sql` | Vehicles passing a query point at a query instant | `valueAtTimestamp`, `stbox` |
| `q12.sql` | Vehicle pairs at the same query point at the same query instant | `valueAtTimestamp`, `stbox` |
| `q13.sql` | Vehicles that travelled within a query region during a query period | `atTime`, `eIntersects`, `stbox` |
| `q14.sql` | Vehicles inside a query region at a query instant | `valueAtTimestamp`, `ST_Contains`, `stbox` |
| `q15.sql` | Vehicles that passed a query point during a query period | `atTime`, `eIntersects`, `stbox` |
| `q16.sql` | Query-licence vehicle pairs in same region+period but always disjoint | `atTime`, `eIntersects`, `aDisjoint` |
| `q17.sql` | Query points visited by the most distinct vehicles | `eIntersects` |
| `qrt.sql` | Binary roundtrip — all trips serialised as hex-WKB | `asHexWKB(tgeompoint)` |

`atTime` is polymorphic: pass a `TIMESTAMPTZ` (Q3) or a `tstzspan` literal (Q7)
and the platform routes to the appropriate MEOS function.

---

## Shared dataset

`data/` contains CSV files that all three platforms load:

| File | Description |
|------|-------------|
| `data/vehicles.csv` | 5 vehicles (3 passenger, 2 truck) |
| `data/trips.csv` | 5 trips, each as a tgeompoint hex-WKB string (SRID 0) |
| `data/query_licences.csv` | 2 query licences |
| `data/query_instants.csv` | 1 query instant |
| `data/query_points.csv` | 2 query points (WKT) |
| `data/query_regions.csv` | 1 query polygon region (WKT) |
| `data/query_periods.csv` | 1 query period (tstzspan literal) |

**Dataset design (SRID 0, planar):**

```
trip1 (B-AA 100): (0,0) → (100,0) y = 0
trip2 (B-BB 200): (0,5) → (100,5) y = 5
trip3 (B-CC 300): (0,3) → (100,3) y = 3 (truck)
trip4 (B-DD 400): (0,4) → (100,4) y = 4 (truck, 1 unit from trip3)
trip5 (B-EE 500): far away (not near others)

QueryPoints: POINT(50 0), POINT(50 5)
QueryRegions: POLYGON((40 -1,60 -1,60 6,40 6,40 -1)) covers x=40..60, y=-1..6
QueryPeriods: [2020-01-01 00:02:00+00, 2020-01-01 00:08:00+00]
All trips active during: 2020-01-01 00:00 – 00:10 UTC
```

**Expected results (verified on MobilityDuck/DuckDB with toy dataset):**

| Query | Result |
|-------|--------|
| Q1 | B-AA 100 → Sedan ; B-CC 300 → Lorry |
| Q2 | B-AA 100, B-BB 200, B-CC 300, B-DD 400 (all 4 non-remote vehicles) |
| Q3 | 2 rows — MEOS hex-WKB of position at 00:05 UTC |
| Q4 | B-AA 100, B-BB 200 |
| Q5 | B-AA 100 ↔ B-CC 300 : 3.0 (nearest approach distance) |
| Q6 | B-CC 300 ↔ B-DD 400 (trucks within 10 m) |
| Q7 | 2 rows — hex-WKB of trip portions during the query period |
| Q8 | 5 rows — WKT trajectory geometry for each trip |
| Q9 | 1 row — vehicle 5 (EE 500) covers max distance (600 units) in the query period |
| Q10 | 4 meetings — B-AA 100 meets vehicle 3; B-CC 300 meets vehicles 1, 2, and 4 |
| Q11 | 2 rows — B-AA 100 at POINT(50 0); B-BB 200 at POINT(50 5) at 00:05 |
| Q12 | 0 rows — no two vehicles at the same point at the same instant |
| Q13 | 4 rows — vehicles AA/BB/CC/DD all traverse the query region in the query period |
| Q14 | 4 rows — same 4 vehicles inside the query region at 00:05 |
| Q15 | 2 rows — B-AA 100 passes POINT(50 0); B-BB 200 passes POINT(50 5) in the period |
| Q16 | 1 row — query-licence pair AA/CC in region during period but always spatially disjoint |
| Q17 | 2 rows — both query points tied at 1 vehicle visit each |
| QRT | 5 rows — MEOS hex-WKB of all 5 trips (binary roundtrip) |

Expected CSV files for all queries are in `expected/`.

**Cross-platform portability design:**
- Q3 / Q7 / QRT: use `asHexWKB()` → `temporal_as_hexwkb(ptr, 0)` — byte-for-byte identical
- Q8: uses `trajectory()` → `geo_as_hexewkb(ptr, NULL)` (PostgreSQL COPY, DuckDB COPY, and MobilitySpark UDF all produce the same little-endian WKB hex)
- Q11/Q12/Q15: use `p.geomWKT` (original WKT text from CSV) instead of `ST_AsText(geom)` to avoid `POINT(x y)` vs `POINT (x y)` format divergence between PostGIS and DuckDB spatial
- All other queries: boolean / integer / float / text outputs — identical across platforms

---

## Running on MobilityDB (PostgreSQL)

```bash
# Create a database and run the comparison:
createdb berlinmod_portability
./berlinmod/run_mbdb.sh berlinmod_portability
```

---

## Running on MobilityDuck (DuckDB)

```bash
# Run from the repository root:
./berlinmod/run_mduck.sh [path/to/duckdb]
```

---

## Running on MobilitySpark (Apache Spark)

```bash
./berlinmod/run_mspark.sh [spark-submit-binary]
```

Or manually:

```bash
ulimit -c 0 # suppress multi-GB core dumps on native-library crashes
spark-submit \
--class org.mobilitydb.spark.demo.BerlinMODDemo \
--master "local[2]" \
--conf "spark.driver.extraJavaOptions=-Djava.library.path=/usr/local/lib" \
target/mobilityspark-*-spark.jar \
berlinmod/data \
berlinmod/expected
```

---

## Replacing the synthetic dataset with real BerlinMOD data

The shared CSV format is produced directly by
[MobilityDB-BerlinMOD](https://github.com/MobilityDB/MobilityDB-BerlinMOD)
via `berlinmod_portability_export()`:

```sql
-- In a PostgreSQL database with generated BerlinMOD data:
\i BerlinMOD/berlinmod_export.sql
SELECT berlinmod_portability_export('/path/to/output/');
```

This writes `vehicles.csv`, `trips.csv`, `query_licences.csv`,
`query_instants.csv`, `query_points.csv`, `query_regions.csv`, and
`query_periods.csv` in exactly the schema expected by the comparison scripts.

Replace `data/*.csv` with the generated files and re-run:

```bash
./berlinmod/run_mbdb.sh berlinmod_portability # MobilityDB
./berlinmod/run_mduck.sh # MobilityDuck
./berlinmod/run_mspark.sh # MobilitySpark
```
6 changes: 6 additions & 0 deletions berlinmod/bench/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Machine-specific benchmark results — do not commit
results/*.json
results/report.md

# DuckDB scratch database
/tmp/berlinmod_bench.duckdb
Loading