From 13b02d7c8f34984ef3bd9473acc79781ce6bf59d Mon Sep 17 00:00:00 2001 From: Esteban Zimanyi Date: Wed, 20 May 2026 20:23:18 +0200 Subject: [PATCH] doc(spark-version): document Spark 3.5 LTS as the target; Spark 4 future-tracked Adds doc/spark-version.md as the canonical position statement on Spark-version targeting: - MobilitySpark targets Apache Spark 3.5.x (LTS through 2026). - Spark 4.0 (May 2025 GA) is early-adoption, not production-default. - The MEOS-C-library-version axis is orthogonal and handled inside JMEOS (one JMEOS jar per MEOS version; MobilitySpark pins one JMEOS jar). - Five trigger signals enumerated for when to commit to Spark 4 support; commit when two or more fire. - The future Spark 4 effort, if/when needed, will mirror MobilityDuck's multi-DuckDB-version foundation (Maven profile per Spark version, CI matrix, per-version jar artefacts). README updated to bump the Spark requirement from the stale 3.4.0 line to 3.5.x with a pointer to the new doc. The position is asymmetric to MobilityDuck's multi-DuckDB-version work: DuckDB v1.4/v1.5 are both production with split adopter base today, whereas Spark 3.5 is the production line and 4.0 is the early-adoption line. --- README.md | 2 +- doc/spark-version.md | 68 ++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 69 insertions(+), 1 deletion(-) create mode 100644 doc/spark-version.md diff --git a/README.md b/README.md index 0e2d118..84ca828 100644 --- a/README.md +++ b/README.md @@ -40,7 +40,7 @@ More information about MobilityDB, including publications, presentations, etc., - 🚀 MobilityDB installed with MEOS - 🔧 JMEOS working version -- ⚡ Spark 3.4.0 +- ⚡ Apache Spark 3.5.x (LTS); see [`doc/spark-version.md`](doc/spark-version.md) for the Spark-version target and the rationale for not yet supporting Spark 4 - 📝 Maven 4 - ☕ Java 17 (recommended) diff --git a/doc/spark-version.md b/doc/spark-version.md new file mode 100644 index 0000000..aea2067 --- /dev/null +++ b/doc/spark-version.md @@ -0,0 +1,68 @@ +# Spark version target + +MobilitySpark targets **Apache Spark 3.5.x** (the current LTS line). One artefact, one supported Spark major. + +## Position table + +| Axis | Choice | Why | +|---|---|---| +| Spark engine version | **3.5.x** only | LTS line through 2026; the production-line default in every managed Spark distribution (Databricks DBR 15, AWS EMR 7.0-7.4, Google Dataproc image 2.2, Azure Synapse pool default). One MobilitySpark jar runs on the whole 3.x line because the Java/Scala API surface of `spark-core` is stable across 3.0 → 3.5. | +| MEOS C-library version | follows JMEOS | The two existing MEOS lines (v1.3 on MobilityDB master, v1.4 as the in-flight 15-PR bump) flow through JMEOS as separate jar builds. MobilitySpark pins one JMEOS jar; JMEOS pins one MEOS SHA. Version discipline lives in JMEOS, not here. | +| Scala version | 2.12 / 2.13 (as Spark 3.5 supports) | The Maven profile selects against `spark-core_2.13` by default; `_2.12` is available for adopters on the older Scala line. | + +## Why not Spark 4.0 today + +Spark 4.0 (May 2025 GA) is **early-adoption**, not production-default. Concrete signals: + +| Distribution / ecosystem | Default Spark version | Spark 4 status | +|---|---|---| +| Databricks Runtime | DBR 15 = Spark 3.5 | DBR 16 (Spark 4) available, not the UI default | +| AWS EMR | EMR 7.0-7.4 = Spark 3.5 | EMR 7.5+ has Spark 4, not the default image | +| Google Dataproc | Image 2.2 = Spark 3.5 LTS | Image 2.3 (Spark 4) in preview | +| Azure Synapse / HDInsight | Spark 3.5 pool | Spark 4 not yet GA in the managed runtime | +| Cloudera CDP | Spark 3.5 | Spark 4 not yet shipped | +| Delta Lake | Delta 3.x → Spark 3.4 / 3.5 | Delta 4.0 → Spark 4 (separate release line) | +| Apache Iceberg | `iceberg-spark-runtime-3.5` stable | `iceberg-spark-runtime-4.0` in Iceberg 1.7+, recently-stabilised | +| Apache Hudi | Spark 3.x supported across recent releases | Spark 4 support rolling out in the 1.x line | +| Apache Sedona (spatial, closest analogue) | Spark 3.4 / 3.5 first-class | Spark 4 added in Sedona 1.7 (late 2025), some modules still flagged experimental | + +Spark 4 also introduces behavioural breakers (ANSI-by-default, type widening, parameterised SQL) that **break existing 3.x SQL**, so enterprise data teams typically wait 12-18 months before migrating because every SQL workload needs revalidation. + +## Trigger signals that switch this position + +Commit to Spark 4 support when **two or more** of these fire: + +| Signal | Why it matters | +|---|---| +| Databricks DBR 16 becomes the default runtime in the Databricks UI | Mainstream tipping point — production workloads start moving | +| AWS EMR 7.5+ becomes the default in the EMR console | Same, for AWS | +| Iceberg makes Spark 4 the **recommended** runtime (not just supported) | Data-lake stack consensus | +| Apache Sedona drops the "experimental" qualifier on Spark 4 modules | Direct spatial-analogue signal | +| A MobilityDB user opens a Spark-4 issue against MobilitySpark | Concrete demand | + +None have fired as of this writing. Track them. + +## What the version axis looks like if/when Spark 4 is added + +The pattern would mirror MobilityDuck's multi-DuckDB-version foundation (`MobilityDuck/doc/multi-duckdb-version.md`): + +- Maven profile per Spark version (`mvn -Pspark-3.5` / `mvn -Pspark-4.0`); each profile pins the matching `spark-core` dependency. +- `if (sparkVersionAtLeast("4.0.0")) { … } else { … }` preprocessor-style branches for the small set of API divergences (`udf.register` shape, `DataType.fromName`, deprecated `ScalaUDF` constructors). +- CI matrix dimension `spark_version: [3.5.4, 4.0.0]`; each row builds a separate jar. +- Per-Spark-version jar in the release artefacts (`mobilityspark-1.0.0-spark-3.5.jar` / `mobilityspark-1.0.0-spark-4.0.jar`). + +This is **future work**, not current. The estimated effort is small-to-medium because the Spark Java API divergences between 3.5 and 4.0 are narrow compared to the DuckDB v1.4-vs-v1.5 ABI break. + +## Relationship to MEOS-version churn + +The MEOS-version axis is **orthogonal** to the Spark-version axis. MEOS versions (v1.3, v1.4) flow through JMEOS regenerations; MobilitySpark depends on one JMEOS jar at a time. Bumping MEOS means swapping the JMEOS dependency, not touching anything Spark-version-specific. + +## Conclusion + +| Question | Answer | +|---|---| +| What Spark version does MobilitySpark target? | 3.5.x (current LTS) | +| Does MobilitySpark support Spark 4 today? | No | +| Is the lack of Spark 4 support a parity gap with the standard stack? | No — Spark 4 is early-adoption; the production stack is 3.5 | +| When will Spark 4 support be added? | When the trigger signals fire (see table above) | +| Is there a MEOS-version-axis problem MobilitySpark needs to solve? | No — JMEOS owns it; MobilitySpark pins one JMEOS jar at a time |