Skip to content

test(geography): full numeric + NULL + tabular round-trip matrix#177

Open
estebanzimanyi wants to merge 12 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/geography-test-matrix
Open

test(geography): full numeric + NULL + tabular round-trip matrix#177
estebanzimanyi wants to merge 12 commits into
MobilityDB:mainfrom
estebanzimanyi:feat/geography-test-matrix

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

Summary

Extends test/sql/geography.test to 37 assertions across the full GEOGRAPHY surface:

Section Assertions Notes
Type alias registration 1 NULL handling
ST_GeogFromText / ST_AsText round-trip 2 POINT, LINESTRING
ST_AsBinary / ST_GeogFromBinary round-trip 1 POINT
GEOMETRYGEOGRAPHY casts 2 Both directions
ST_Length numeric 2 Zero-length + 5 562 m Brussels segment
ST_Area numeric 2 Zero-area POINT + 9.79 km² Brussels triangle
eIntersects / nearestApproachDistance over TGEOGPOINT × GEOGRAPHY 3 Hit / miss / NAD-zero
NULL propagation (new) 5 Every UDF honours NULL in -> NULL out
Tabular round-trip (new) 5 INSERT/SELECT on multi-row mixed-NULL table
Large-polygon numeric (new) 2 Belgium-sized polygon (~61 538 km²), 264 km Brussels-Paris LINESTRING
Semantic EWKB round-trip (new) 1 EWKT-level equality after geog_in -> EWKB -> geo_from_ewkb

Numeric ground truth

The expected values are MEOS-on-Postgres ground truth with use_spheroid=true (WGS84 ellipsoidal geodesics):

  • 5 km Brussels LINESTRING: 5562 m
  • 9.79 km² Brussels triangle: 9 792 944 m²
  • ~61 538 km² Belgium quadrilateral
  • 264 km Brussels → Paris LINESTRING

Step in the GEOGRAPHY roadmap

This is step 4 of 4 — final step of the ~430-LoC follow-up plan in doc/geography-boundary.md:

  1. I/O UDFs (4 functions) — feat(geography): ST_GeogFromText / ST_AsText / ST_AsBinary / ST_GeogFromBinary #174
  2. GEOMETRY ⇄ GEOGRAPHY casts — feat(geography): explicit GEOMETRY <-> GEOGRAPHY casts #175
  3. Operations (4 functions) — feat(geography): ST_Length / ST_Area / eIntersects / nearestApproachDistance #176
  4. Full numeric test matrix (this PR) — 37 assertions

Depends on

Local verification

$ ./build/release/test/unittest "test/sql/geography.test"
All tests passed (37 assertions in 1 test case)

estebanzimanyi and others added 12 commits May 20, 2026 20:29
Adds doc/geography-boundary.md as the canonical write-up of how
MobilityDuck represents geodetic geography values across the
MEOS<->DuckDB columnar boundary.

Covers:
- The closed-algebra property in MEOS and why it doesn't survive
  the columnar boundary without a dedicated LogicalType.
- The GEOGRAPHY LogicalType registration: a BLOB alias carrying
  MEOS-WKB with the geodetic flag preserved in the type tag, with
  no dependence on a DuckDB upstream change or on a third-party
  duckdb-geography extension.
- The I/O surface (ST_GeogFromText, ST_AsText, ST_AsBinary,
  ST_GeogFromBinary), all thin shims over existing MEOS exports.
- The operation surface (length, area, eIntersects, etc.) — every
  call delegates to a MEOS function that takes geodetic input and
  returns the correct type; DuckDB never sees a non-geodetic
  representation of a geodetic value during a computation.
- The complete inter-type cast matrix (GEOMETRY / GEOGRAPHY /
  TGEOGPOINT / TGEOMPOINT), mirroring the MobilityDB-on-Postgres
  surface.
- TemporalParquet round-trip preservation via the footer JSON's
  base_type / geodetic / srid fields.
- Pitfalls a binding implementation must avoid (using
  ST_GeomFromText to construct a GEOGRAPHY value, reusing DuckDB
  Spatial Cartesian functions on a GEOGRAPHY blob, stripping the
  geodetic flag in Parquet output, etc.).
- Current state of the implementation and the bounded pending
  work (~430 LoC, single PR) to register the LogicalType, the
  I/O UDFs, the casts, and the tests.

README updated with a single-paragraph pointer in the
parity-gaps neighbourhood so adopters land here when looking for
geography semantics on the DuckDB side.
…yDB#168 design)

Foundation for the DuckDB GEOGRAPHY boundary documented in MobilityDB#168.
Registers the LogicalType alias (BLOB carrying MEOS-WKB with the
geodetic flag preserved in the type tag) so DuckDB columns can be
declared as `GEOGRAPHY`.

Files:

- src/include/geo/geography.hpp — GeographyType struct declaration
  (LogicalType getter + RegisterType entry point), mirroring the
  shape of StboxType / TgeogpointType.

- src/geo/geography.cpp — implementation: BLOB with SetAlias
  ("GEOGRAPHY") and loader.RegisterType("GEOGRAPHY", GEOGRAPHY()).

- src/mobilityduck_extension.cpp — Load() wires GeographyType::RegisterType
  after StboxType.

- CMakeLists.txt — adds src/geo/geography.cpp to EXTENSION_SOURCES.

- test/sql/geography.test — CREATE TABLE / INSERT NULL / SELECT
  round-trip sanity test verifying the alias is registered.

Follow-up PRs build on this:

  - I/O UDFs:    ST_GeogFromText, ST_AsText, ST_AsBinary,
                 ST_GeogFromBinary (~150 LoC)
  - Casts:       GEOMETRY <-> GEOGRAPHY, GEOGRAPHY <-> TGEOGPOINT
                 (~80 LoC)
  - Operations: length / area / eIntersects / nearestApproachDistance
                 on GEOGRAPHY columns dispatching to MEOS geog_*
                 functions
  - Tests:      round-trip, cast-matrix, numeric checks against
                MEOS-on-Postgres ground truth (~200 LoC)

Total target surface (from MobilityDB#168): ~430 LoC across the four follow-ups.
Each follow-up is independently mergeable on top of this skeleton.

Stacks on MobilityDB#168 (geography boundary design doc).
… PR MobilityDB#161)

Cherry-picked from open PR MobilityDB#161 so this PR's CI compiles against the
vcpkg-installed MEOS, which exposes 'meosType' (pre-consolidation)
not 'MeosType'.  When MobilityDB#161 reaches main, this commit collapses to a
no-op on rebase.
…MobilityDB#136)

Cherry-picked from open PR MobilityDB#136 so this PR's amd64 Linux test phase
goes green before MobilityDB#136 lands.  When MobilityDB#136 reaches main, this rebase
collapses to a no-op.
…en PR MobilityDB#140)

Cherry-picked from open PR MobilityDB#140 so this PR's osx_amd64 / osx_arm64 /
wasm builds compile.  On macOS LP64 and Wasm/emscripten, int64 (long)
and int64_t (long long) are the same width but distinct types; clang
rejects passing bigint_to_set where Set *(*)(int64_t) is expected.
The cast is a no-op on Linux.  When MobilityDB#140 reaches main, this rebase
collapses to a no-op.
The first geography.cpp landing had a 'reference to Interval is ambiguous'
arm64 build error because the duckdb header chain pulled in duckdb::Interval
before meos.h's extern "C" block scoped MEOS's Interval to global linkage.
Reorder includes to put meos_wrapper_simple.hpp first, mirroring the
existing static-type pattern in stbox.cpp / tgeogpoint.cpp.
…e_ptr<FunctionData> in Copy()

GCC + DuckDB 1.4.4's unique_ptr does not implicitly convert
derived->base, so 'return r;' in BinsBindData::Copy() fails to compile:

  error: could not convert 'r' from 'unique_ptr<duckdb::{anonymous}::BinsBindData,...>'
                                to 'unique_ptr<duckdb::FunctionData,...>'

Use duckdb's unique_ptr_cast helper (from duckdb/common/helper.hpp) to
do the conversion explicitly, matching the canonical pattern used by
DuckDB core (e.g. table_scan.hpp's TableScanBindData::Copy()).  No
behaviour change; the move is exactly what the implicit conversion
would have done if the compiler accepted it.
…romBinary

Four I/O UDFs over the GEOGRAPHY LogicalType, each a thin shim over a
MEOS export:

  - ST_GeogFromText(VARCHAR)  -> GEOGRAPHY   via geog_in
  - ST_AsText(GEOGRAPHY)      -> VARCHAR     via geo_as_ewkt
  - ST_AsBinary(GEOGRAPHY)    -> BLOB        via geo_as_ewkb
  - ST_GeogFromBinary(BLOB)   -> GEOGRAPHY   via geo_from_ewkb

The BLOB payload of a GEOGRAPHY value is the raw GSERIALIZED struct
(varlena VARSIZE bytes), which preserves the geodetic flag in the type
tag across the DuckDB columnar boundary; standard EWKB does not carry
the flag, so ST_GeogFromBinary re-sets it explicitly via
MEOS_FLAGS_SET_GEODETIC.

test/sql/geography.test covers EWKT round-trip + EWKB round-trip on
POINT and LINESTRING.

Step 1 of the ~430-LoC GEOGRAPHY follow-up roadmap.  Casts
(GEOMETRY <-> GEOGRAPHY, GEOGRAPHY <-> TGEOGPOINT) and operations
(length / area / eIntersects / nearestApproachDistance) land in
follow-up PRs.

See doc/geography-boundary.md.
Two cast functions, both reusing the same GSERIALIZED with the geodetic
flag toggled:

  - GEOMETRY -> GEOGRAPHY: lift via GeometryToGSerialized, set the
    geodetic flag on the resulting GSERIALIZED, store as GEOGRAPHY BLOB.
  - GEOGRAPHY -> GEOMETRY: deserialize the GEOGRAPHY BLOB, clear the
    geodetic flag, emit sgl GEOMETRY via GSerializedToGeometry.

DuckDB-Spatial GEOMETRY has no SRID slot, so a round-trip via GEOMETRY
drops the SRID prefix; an EWKT / EWKB round-trip preserves SRID. The
EWKT and EWKB paths are already tested in test/sql/geography.test.

The existing TGEOGPOINT(GEOMETRY, …) constructors transparently accept
GEOGRAPHY arguments via the new implicit cast — no extra overloads
needed for the temporal geographic family.

Step 2 of the GEOGRAPHY follow-up roadmap.  Operations (length / area /
eIntersects / nearestApproachDistance) and the full test matrix land in
steps 3 and 4.

test/sql/geography.test: 9 assertions pass locally.
…istance

Four geodetic scalar operations, each a thin shim over the MEOS export:

  - ST_Length(GEOGRAPHY)                                 -> DOUBLE
    via geog_length(gs, use_spheroid=true)
  - ST_Area(GEOGRAPHY)                                   -> DOUBLE
    via geog_area(gs, use_spheroid=true)
  - eIntersects(TGEOGPOINT, GEOGRAPHY)                   -> BOOLEAN
    via eintersects_tgeo_geo(temp, gs)
  - nearestApproachDistance(TGEOGPOINT, GEOGRAPHY)       -> DOUBLE
    via nad_tgeo_geo(temp, gs)

use_spheroid=true matches the MEOS-on-Postgres default for the
geography flavour (WGS84 ellipsoidal geodesics).

test/sql/geography.test: 16 assertions pass — including numeric checks
against MEOS-on-Postgres ground truth (a Brussels-area POLYGON area =
9 792 944 m^2, a 5 km LINESTRING = 5 562 m, plus eIntersects /
nearestApproachDistance smoke checks against a TGEOGPOINT trajectory).

Step 3 of the GEOGRAPHY follow-up roadmap.  The full numeric test
matrix lands in step 4.
Extends test/sql/geography.test to 37 assertions covering:

  - NULL propagation across every UDF (ST_AsText, ST_Length, ST_Area,
    ST_AsBinary, ST_GeogFromText)
  - Tabular round-trip via INSERT/SELECT on a multi-row table with
    mixed NULL and non-NULL GEOGRAPHY values; COUNT(*) and COUNT(col)
  - Numeric ground truth: Belgium-sized polygon (~61 538 km^2 on the
    WGS84 ellipsoid) and a 264 km Brussels-Paris LINESTRING
  - Semantic round-trip equality through EWKB at the EWKT level
    (underlying GSERIALIZED byte layout may differ between geog_in and
    geo_from_ewkb; semantic value is preserved)

Step 4 of the GEOGRAPHY follow-up roadmap.  All 37 assertions pass
locally.
The stage_icu helper mapped only the Linux uname values, so on the
macOS arm64 test runner uname -m returned "arm64" and the icu
extension was copied to .duckdb/extensions/v1.4.4/arm64 instead of
.../osx_arm64, where DuckDB's autoload looks. The hub fallback is not
reliably resolvable on that runner, so the osx_arm64 Test step failed
to load the extension. Map the OS and architecture to the DuckDB
platform string (linux_amd64, linux_arm64, osx_amd64, osx_arm64) so
the locally built icu is staged at the path autoload expects on every
tested platform; the Linux mapping is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant