MobilityDB · estebanzimanyi · May 18, 2026
diff --git a/README.md b/README.md
@@ -10,6 +10,7 @@ This catalog is the foundation for generating language bindings (Python, Java, R
 - [Getting started](#getting-started)
 - [Output format](#output-format)
 - [Adding metadata](#adding-metadata)
+- [MCP generation](#mcp-generation)
 
 ## How it works
 
@@ -83,3 +84,28 @@ A typical function entry looks like this:
 ## Adding metadata
 
 Manual annotations (ownership rules, additional documentation, deprecation flags, etc.) live in `meta/meos-meta.json`. The merger applies them on top of the libclang-parsed structure when generating the final catalog.
+
+## MCP generation
+
+The enriched catalog also projects onto a **Model Context Protocol (MCP)**
+tool manifest, so an LLM/agent can call the MEOS value algebra directly:
+
+```bash
+python run.py                 # produce the enriched catalog
+python generate_mcp.py        # output/meos-idl.json -> output/meos-mcp.json
+```
+
+Every *stateless-exposable* MEOS function becomes one MCP tool with a
+**self-contained** JSON Schema (2020-12) — enums and opaque-type schemas are
+inlined, since MCP clients don't resolve external `$ref`s. Spatiotemporal
+values are passed as serialized strings (text/WKT, MF-JSON, HexWKB);
+`annotations` mark the tools read-only/idempotent; `x-meos.{decode,encode}`
+give a runtime everything it needs to dispatch a call.
+
+Against the live MobilityDB `master` catalog this yields **1952 tools**
+(90% of the public API; internal `meos_internal*.h` policy-excluded),
+array params rendered as JSON arrays.
+Pure `dict` → `dict` (no libclang, no MEOS runtime); see
+[`docs/mcp.md`](docs/mcp.md) for the projection rules and roadmap, and
+[`tests/test_mcp.py`](tests/test_mcp.py) for worked examples
+(`python3 tests/test_mcp.py`).
diff --git a/docs/mcp.md b/docs/mcp.md
@@ -0,0 +1,64 @@
+# MCP tool-manifest projection
+
+`generator/mcp.py` turns the **enriched** catalog (`network` / `wire` /
+`typeEncodings`, see [`enrichment.md`](enrichment.md)) into a **Model Context
+Protocol (MCP)** tool manifest — one tool per stateless-exposable MEOS
+function, so an LLM/agent can call the MEOS value algebra directly.
+
+```bash
+python run.py                 # enriched catalog -> output/meos-idl.json
+python generate_mcp.py        #              -> output/meos-mcp.json
+```
+
+Pure `dict` → `dict` (no libclang, no MEOS runtime); deterministic
+(tools sorted by name) so generated diffs are reviewable.
+
+## Why a separate generator (not the OpenAPI one)
+
+MCP `inputSchema` must be a **self-contained** JSON Schema per tool — MCP
+clients do not resolve external `#/components/...` `$ref`s. So enums and
+opaque-type schemas are **inlined** into each tool rather than referenced.
+The projection rules are otherwise the same model as
+[`openapi.md`](openapi.md); only the rendering differs.
+
+## Projection rules
+
+| MEOS concept | MCP tool |
+|---|---|
+| stateless-exposable function | one tool, `name = function` |
+| `doc` (or synthesized) | `description`; serialized args add a "passed as serialized strings" hint so the model formats them correctly |
+| parameter | `inputSchema.properties` entry (all `required`, `additionalProperties:false`, JSON Schema 2020-12) |
+| `wire` scalar / enum | inline `{"type": …}` / `{"type":"string","enum":[real C constant names]}` |
+| `wire` serialized | `{"type":"string"}` + a description naming the type and its encodings (text/MF-JSON/HexWKB) |
+| `wire` array (builder `(Elem **,count)`) | `{"type":"array","items":<element schema>}`; the C `count` is the array length |
+| out-parameter result (`from_outparam`) | the out-param value is the tool result (scalar or serialized); `presence_return` false ⇒ no value |
+| result | `outputSchema` = `{type:object, properties:{result:…}}`; `void` ⇒ no `outputSchema` |
+| purity | `annotations`: `readOnlyHint`/`idempotentHint` true, `destructiveHint`/`openWorldHint` false |
+| dispatch metadata | `x-meos.category`, `x-meos.decode` (param → MEOS parse fn), `x-meos.encode` (result serialize fn) |
+
+A runtime serves a call by JSON-decoding the arguments, running each
+`x-meos.decode` on the serialized strings, invoking the function, and
+`x-meos.encode` on the result — nothing beyond this manifest is needed.
+
+## Coverage (live MobilityDB `master`)
+
+2161 **public** functions → **1952 tools (85%)**; the internal
+`meos_internal*.h` programmer API (511 fns, `Datum`-generic) is
+policy-excluded. Spans `predicate`, `transformation`, `accessor`, `io`,
+`setop`, `conversion`, `constructor`, `aggregate`. The remaining public
+functions carry a truthful `reason` and are overridable via
+`meta/meos-meta.json`.
+
+## Limitations / roadmap
+
+- `x-meos` is a namespaced extension to the MCP tool object (clients ignore
+  unknown keys); the `tools` array itself is spec-pure.
+- No MCP **server** here — this PR delivers the manifest/contract; a
+  generated stdio/HTTP MCP server (decode → call → encode) is the next unit.
+- Encoding uses the generic root (`temporal_out`, correct for every
+  subtype); decoding a polymorphic argument uses a *typed* wrapper
+  (`tbool_in`) because the generic `temporal_in` needs a semantic type tag.
+  A mismatched subtype yields a clean error, never a wrong result; carrying
+  the subtype on the wire for universal decode is the remaining gap.
+- Tool count (1829) exceeds what some clients comfortably list; a curated
+  subset / namespacing by `category` is a sensible later refinement.
diff --git a/generate_mcp.py b/generate_mcp.py
@@ -0,0 +1,39 @@
+# Generate an MCP tool manifest from the enriched MEOS catalog.
+#
+# Usage:
+#     python run.py                    # first, to produce the catalog
+#     python generate_mcp.py           # output/meos-idl.json -> output/meos-mcp.json
+#     python generate_mcp.py in.json [out.json]
+
+import json
+import sys
+from pathlib import Path
+
+from generator.mcp import build_mcp
+
+IN_PATH = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("output/meos-idl.json")
+OUT_PATH = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("output/meos-mcp.json")
+
+
+def main() -> None:
+    if not IN_PATH.exists():
+        sys.exit(f"Catalog not found: {IN_PATH} — run `python run.py` first.")
+
+    catalog = json.loads(IN_PATH.read_text())
+    if "functions" not in catalog or not any(
+        "network" in f for f in catalog["functions"]
+    ):
+        sys.exit(f"{IN_PATH} is not enriched (no `network` fields). "
+                 "Run the enrichment pass first.")
+
+    manifest = build_mcp(catalog)
+
+    OUT_PATH.parent.mkdir(parents=True, exist_ok=True)
+    OUT_PATH.write_text(json.dumps(manifest, indent=2))
+
+    print(f"[mcp] {len(manifest['tools'])} tools → {OUT_PATH}",
+          file=sys.stderr)
+
+
+if __name__ == "__main__":
+    main()
diff --git a/generator/mcp.py b/generator/mcp.py
@@ -0,0 +1,154 @@
+"""MCP tool-manifest generator.
+
+Projects the *enriched* MEOS catalog (`network` / `wire` / `typeEncodings`
+from the service-projection pass) onto a Model Context Protocol (MCP) tool
+manifest: one tool per stateless-exposable function, so an LLM/agent can
+call the MEOS value algebra directly.
+
+Unlike the OpenAPI projection, every tool is **self-contained** — its
+`inputSchema` inlines all definitions (no shared `$ref`s), which is what MCP
+clients expect. `x-meos` carries the decode/encode function names and
+category so a runtime can dispatch a call without any extra metadata.
+
+Pure `dict` → `dict`; no libclang and no MEOS runtime. Deterministic
+(tools sorted by name) so generated diffs stay reviewable.
+"""
+
+import re
+
+_QUAL_RE = re.compile(r"\b(const|volatile|struct|union|enum)\b")
+_PRIM = {"integer": "integer", "number": "number",
+         "boolean": "boolean", "string": "string"}
+
+
+def _clean_type(c_type: str) -> str:
+    """``const struct Temporal *`` -> ``Temporal``."""
+    return " ".join(_QUAL_RE.sub(" ", c_type).replace("*", " ").split())
+
+
+def _enum_values(name: str, enums: list) -> list:
+    for e in enums:
+        if e["name"] == name:
+            return [v["name"] for v in e.get("values", [])]
+    return []
+
+
+def _param_schema(p: dict, enums: list) -> dict:
+    if p["kind"] == "json":
+        if p.get("enum"):
+            s = {"type": "string", "title": p["enum"]}
+            vals = _enum_values(p["enum"], enums)
+            if vals:
+                s["enum"] = vals
+            return s
+        return {"type": _PRIM.get(p.get("json", "string"), "string")}
+    if p["kind"] == "array":               # builder (Elem **, count)
+        return {"type": "array",
+                "items": _param_schema(p["element"], enums)}
+    # serialized
+    t = _clean_type(p["cType"])
+    encs = ", ".join(p.get("encodings", [])) or "text"
+    return {
+        "type": "string",
+        "title": t,
+        "description": (
+            f"A MEOS {t} value, serialized as {encs} "
+            f"(e.g. WKT / MF-JSON / HexWKB)."
+        ),
+    }
+
+
+def _describe(fn: dict) -> str:
+    doc = fn.get("doc")
+    text = doc.strip() if doc else (
+        f"MEOS {fn['category']} operation `{fn['name']}`."
+    )
+    if any(p["kind"] == "serialized" for p in fn["wire"]["params"]):
+        text += (" Spatiotemporal arguments are passed as serialized strings "
+                 "(text/WKT, MF-JSON, or HexWKB).")
+    return text
+
+
+def _result_schema(result: dict, enums: list):
+    if result["kind"] == "json":
+        if result.get("enum"):
+            s = {"type": "string"}
+            vals = _enum_values(result["enum"], enums)
+            if vals:
+                s["enum"] = vals
+            return s
+        return {"type": _PRIM.get(result.get("json", "string"), "string")}
+    if result["kind"] == "serialized":
+        return {"type": "string", "title": _clean_type(result["cType"])}
+    return None  # void
+
+
+def _tool(fn: dict, enums: list) -> dict:
+    wire = fn["wire"]
+    props, required = {}, []
+    for p in wire["params"]:
+        props[p["name"]] = _param_schema(p, enums)
+        required.append(p["name"])
+
+    tool = {
+        "name": fn["name"],
+        "description": _describe(fn),
+        "inputSchema": {
+            "$schema": "https://json-schema.org/draft/2020-12/schema",
+            "type": "object",
+            "properties": props,
+            "required": required,
+            "additionalProperties": False,
+        },
+        "annotations": {
+            "title": fn["name"],
+            "readOnlyHint": True,
+            "idempotentHint": True,
+            "destructiveHint": False,
+            "openWorldHint": False,
+        },
+        "x-meos": {"category": fn["category"]},
+    }
+
+    rs = _result_schema(wire["result"], enums)
+    if rs is not None:
+        tool["outputSchema"] = {
+            "type": "object",
+            "properties": {"result": rs},
+            "required": ["result"],
+        }
+    if wire["result"]["kind"] == "serialized":
+        tool["x-meos"]["encode"] = wire["result"]["encode"]
+
+    decode = {p["name"]: p["decode"] for p in wire["params"]
+              if p["kind"] == "serialized"}
+    if decode:
+        tool["x-meos"]["decode"] = decode
+    return tool
+
+
+def build_mcp(catalog: dict, *, server_name: str = "meos") -> dict:
+    """Build an MCP tool manifest from an enriched catalog."""
+    functions = sorted(
+        (f for f in catalog.get("functions", [])
+         if f.get("network", {}).get("exposable")),
+        key=lambda f: f["name"],
+    )
+    enums = catalog.get("enums", [])
+    tools = [_tool(f, enums) for f in functions]
+    return {
+        "x-meos": {
+            "server": server_name,
+            "description": (
+                "MEOS spatiotemporal value algebra exposed as MCP tools, "
+                "generated from the MEOS-API catalog. One tool per "
+                "stateless-exposable function; spatiotemporal values are "
+                "passed as serialized strings. Generated, do not edit."
+            ),
+            "coverage": {
+                "functions": len(catalog.get("functions", [])),
+                "exposed": len(tools),
+            },
+        },
+        "tools": tools,
+    }