Skip to content

Prepend a stdbool stub when invoking libclang on MEOS headers#1

Open
estebanzimanyi wants to merge 2 commits into
masterfrom
fix/stdbool-stub
Open

Prepend a stdbool stub when invoking libclang on MEOS headers#1
estebanzimanyi wants to merge 2 commits into
masterfrom
fix/stdbool-stub

Conversation

@estebanzimanyi
Copy link
Copy Markdown
Member

MEOS public headers reference bool but do not include stdbool.h; when libclang parses them without system header paths it silently demotes every bool return and bool * pointer to int / int , losing the distinction that every binding depends on. Prepending a minimal stub with #define bool _Bool (plus the size_t typedef libclang needs to honour the WKB-size accessors) keeps the spelling intact, and a regex-based normaliser in _c_spelling rewrites the resulting Bool * back to bool * so the IDL matches the source code. Before the fix all 641 bool-returning functions and the bool * out-parameters of tbox_inc / tbool_value_at_timestamptz / tbool_value_n / tbool_values landed in meos-idl.json as int.

MEOS public headers reference bool but do not include stdbool.h; when
libclang parses them without system header paths it silently demotes
every bool return and bool * pointer to int / int *, losing the
distinction that every binding depends on.  Prepending a minimal stub
with `#define bool _Bool` (plus the size_t typedef libclang needs to
honour the WKB-size accessors) keeps the spelling intact, and a
regex-based normaliser in _c_spelling rewrites the resulting `_Bool *`
back to `bool *` so the IDL matches the source code.  Before the fix
all 641 bool-returning functions and the bool * out-parameters of
tbox_*_inc / tbool_value_at_timestamptz / tbool_value_n / tbool_values
landed in meos-idl.json as int.
Without a real pg_config.h, postgres/c.h never typedefs int64, so
int64 and every type built on it (TimestampTz, Timestamp, TimeADT,
DateADT) collapsed to implicit int and timestamp parameters were
emitted 32-bit. Extend the system-header stub with the postgres
integer family, mirroring postgres/c.h (LP64 branch), timestamp_def.h
and date.h exactly.
@Nyuke235
Copy link
Copy Markdown
Collaborator

The stub-plus-regex approach is the right fix, and the split between the two commits is clean (typedef stubbing + token-level normalisation are logically separate).

What I verified

  • parser/parser.py : _SYSTEM_HEADER_STUBS covers bool/true/false, size_t, and the full PostgreSQL integer family (int8uint64, Timestamp, TimestampTz, TimeADT, TimeOffset, DateADT). Prepended at the top of the entry-point in build_entry_point, with #ifndef bool guards so a real stdbool.h could coexist later.
  • parser/extractors.py : _BOOL_TOKEN_RE = re.compile(r"\b_Bool\b") correctly switches _c_spelling from a full-string equality check to token-level substitution, so compound types (_Bool *, _Bool[]) get normalised too. import re already present at the top of the file, word boundary protects against accidental matches in identifiers.
  • The _BOOL_SPELLINGS = {"bool", "_Bool"} set already anticipates both representations (typedef char bool vs. #define bool _Bool) : good defensive choice.

Needs tests

This PR changes the typing of 641 catalog functions with no regression test. Given how load-bearing it is for every downstream binding (PyMEOS-CFFI, GoMEOS, MEOS.NET, JMEOS, MEOS.js, all of which will refresh against the new IDL), even a minimal test that asserts the boolean shape would protect against silent regressions. Pattern matches what #8 introduced (plain unittest, no pytest dependency):

# tests/test_bool_typing.py
import json, unittest
from pathlib import Path

IDL = Path(__file__).resolve().parents[1] / "output" / "meos-idl.json"

class BoolTypingTests(unittest.TestCase):
    def setUp(self):
        if not IDL.exists():
            self.skipTest(f"{IDL} not generated; run `python run.py` first")
        idl = json.loads(IDL.read_text())
        self.by_name = {f["name"]: f for f in idl["functions"]}

    def test_known_bool_returners_typed_bool(self):
        # Sanity sample — pre-fix these all came back as `int`.
        for n in ("meos_initialized", "temporal_eq",
                  "tbox_overlaps_tbox", "span_eq"):
            self.assertIn(n, self.by_name)
            self.assertEqual(self.by_name[n]["returnType"], "bool", n)

    def test_bool_pointer_out_params_preserved(self):
        # Compound-type regex fix — pre-fix these arrived as `int *`.
        f = self.by_name["tbool_value_at_timestamptz"]
        out = [p for p in f["params"] if p.get("isOutput")]
        self.assertTrue(any("bool" in p["cType"] for p in out),
                        f"no bool* out-param found: {out}")

    def test_no_bool_demoted_to_int(self):
        # Hard regression guard: catch any future drift back to `int`.
        # Replace this with the exact pre-fix count once you have it.
        bool_returners = [f for f in self.by_name.values()
                          if f["returnType"] == "bool"]
        self.assertGreater(len(bool_returners), 500,
                           "bool returners collapsed back to int — "
                           "stdbool stub regression?")

The third test in particular is what catches a future contributor accidentally removing the stub or breaking the regex without anyone noticing until a downstream binding crashes.

Non-blocking notes

  1. LP64 assumption : typedef long int int64 mirrors MobilityDB's LP64 branch (Linux/macOS). Fine since MEOS builds under Docker Linux; if Windows ever becomes a target, the stub will need an _LP64/_WIN64 discriminator. Worth a one-line note in README.md or a # LP64 only comment above the stub so the assumption is documented at the site.

  2. mergeable_state: unknown : GitHub hasn't recomputed since feat: canonical portable bare-name mapping as the codegen source of truth #8 merged. feat: canonical portable bare-name mapping as the codegen source of truth #8 only added parser/portable.py (new file) so no real conflict expected, but a quick rebase on the latest master would refresh the state and make the merge UI green.

  3. Stub maintenance : when MEOS upstream adds new public typedefs (e.g., a future Interval type), the stub will need a parallel update. Maybe worth a brief tests/test_stub_coverage.py later that grep-counts typedef occurrences in MEOS headers vs. the stub, but that's out of scope here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants