diff --git a/README.md b/README.md index fb0a8d0..b38970c 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,7 @@ This catalog is the foundation for generating language bindings (Python, Java, R - [Getting started](#getting-started) - [Output format](#output-format) - [Adding metadata](#adding-metadata) +- [MCP generation](#mcp-generation) ## How it works @@ -83,3 +84,28 @@ A typical function entry looks like this: ## Adding metadata Manual annotations (ownership rules, additional documentation, deprecation flags, etc.) live in `meta/meos-meta.json`. The merger applies them on top of the libclang-parsed structure when generating the final catalog. + +## MCP generation + +The enriched catalog also projects onto a **Model Context Protocol (MCP)** +tool manifest, so an LLM/agent can call the MEOS value algebra directly: + +```bash +python run.py # produce the enriched catalog +python generate_mcp.py # output/meos-idl.json -> output/meos-mcp.json +``` + +Every *stateless-exposable* MEOS function becomes one MCP tool with a +**self-contained** JSON Schema (2020-12) — enums and opaque-type schemas are +inlined, since MCP clients don't resolve external `$ref`s. Spatiotemporal +values are passed as serialized strings (text/WKT, MF-JSON, HexWKB); +`annotations` mark the tools read-only/idempotent; `x-meos.{decode,encode}` +give a runtime everything it needs to dispatch a call. + +Against the live MobilityDB `master` catalog this yields **1952 tools** +(90% of the public API; internal `meos_internal*.h` policy-excluded), +array params rendered as JSON arrays. +Pure `dict` → `dict` (no libclang, no MEOS runtime); see +[`docs/mcp.md`](docs/mcp.md) for the projection rules and roadmap, and +[`tests/test_mcp.py`](tests/test_mcp.py) for worked examples +(`python3 tests/test_mcp.py`). diff --git a/docs/mcp.md b/docs/mcp.md new file mode 100644 index 0000000..b55852b --- /dev/null +++ b/docs/mcp.md @@ -0,0 +1,64 @@ +# MCP tool-manifest projection + +`generator/mcp.py` turns the **enriched** catalog (`network` / `wire` / +`typeEncodings`, see [`enrichment.md`](enrichment.md)) into a **Model Context +Protocol (MCP)** tool manifest — one tool per stateless-exposable MEOS +function, so an LLM/agent can call the MEOS value algebra directly. + +```bash +python run.py # enriched catalog -> output/meos-idl.json +python generate_mcp.py # -> output/meos-mcp.json +``` + +Pure `dict` → `dict` (no libclang, no MEOS runtime); deterministic +(tools sorted by name) so generated diffs are reviewable. + +## Why a separate generator (not the OpenAPI one) + +MCP `inputSchema` must be a **self-contained** JSON Schema per tool — MCP +clients do not resolve external `#/components/...` `$ref`s. So enums and +opaque-type schemas are **inlined** into each tool rather than referenced. +The projection rules are otherwise the same model as +[`openapi.md`](openapi.md); only the rendering differs. + +## Projection rules + +| MEOS concept | MCP tool | +|---|---| +| stateless-exposable function | one tool, `name = function` | +| `doc` (or synthesized) | `description`; serialized args add a "passed as serialized strings" hint so the model formats them correctly | +| parameter | `inputSchema.properties` entry (all `required`, `additionalProperties:false`, JSON Schema 2020-12) | +| `wire` scalar / enum | inline `{"type": …}` / `{"type":"string","enum":[real C constant names]}` | +| `wire` serialized | `{"type":"string"}` + a description naming the type and its encodings (text/MF-JSON/HexWKB) | +| `wire` array (builder `(Elem **,count)`) | `{"type":"array","items":}`; the C `count` is the array length | +| out-parameter result (`from_outparam`) | the out-param value is the tool result (scalar or serialized); `presence_return` false ⇒ no value | +| result | `outputSchema` = `{type:object, properties:{result:…}}`; `void` ⇒ no `outputSchema` | +| purity | `annotations`: `readOnlyHint`/`idempotentHint` true, `destructiveHint`/`openWorldHint` false | +| dispatch metadata | `x-meos.category`, `x-meos.decode` (param → MEOS parse fn), `x-meos.encode` (result serialize fn) | + +A runtime serves a call by JSON-decoding the arguments, running each +`x-meos.decode` on the serialized strings, invoking the function, and +`x-meos.encode` on the result — nothing beyond this manifest is needed. + +## Coverage (live MobilityDB `master`) + +2161 **public** functions → **1952 tools (85%)**; the internal +`meos_internal*.h` programmer API (511 fns, `Datum`-generic) is +policy-excluded. Spans `predicate`, `transformation`, `accessor`, `io`, +`setop`, `conversion`, `constructor`, `aggregate`. The remaining public +functions carry a truthful `reason` and are overridable via +`meta/meos-meta.json`. + +## Limitations / roadmap + +- `x-meos` is a namespaced extension to the MCP tool object (clients ignore + unknown keys); the `tools` array itself is spec-pure. +- No MCP **server** here — this PR delivers the manifest/contract; a + generated stdio/HTTP MCP server (decode → call → encode) is the next unit. +- Encoding uses the generic root (`temporal_out`, correct for every + subtype); decoding a polymorphic argument uses a *typed* wrapper + (`tbool_in`) because the generic `temporal_in` needs a semantic type tag. + A mismatched subtype yields a clean error, never a wrong result; carrying + the subtype on the wire for universal decode is the remaining gap. +- Tool count (1829) exceeds what some clients comfortably list; a curated + subset / namespacing by `category` is a sensible later refinement. diff --git a/generate_mcp.py b/generate_mcp.py new file mode 100644 index 0000000..2b43be8 --- /dev/null +++ b/generate_mcp.py @@ -0,0 +1,39 @@ +# Generate an MCP tool manifest from the enriched MEOS catalog. +# +# Usage: +# python run.py # first, to produce the catalog +# python generate_mcp.py # output/meos-idl.json -> output/meos-mcp.json +# python generate_mcp.py in.json [out.json] + +import json +import sys +from pathlib import Path + +from generator.mcp import build_mcp + +IN_PATH = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("output/meos-idl.json") +OUT_PATH = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("output/meos-mcp.json") + + +def main() -> None: + if not IN_PATH.exists(): + sys.exit(f"Catalog not found: {IN_PATH} — run `python run.py` first.") + + catalog = json.loads(IN_PATH.read_text()) + if "functions" not in catalog or not any( + "network" in f for f in catalog["functions"] + ): + sys.exit(f"{IN_PATH} is not enriched (no `network` fields). " + "Run the enrichment pass first.") + + manifest = build_mcp(catalog) + + OUT_PATH.parent.mkdir(parents=True, exist_ok=True) + OUT_PATH.write_text(json.dumps(manifest, indent=2)) + + print(f"[mcp] {len(manifest['tools'])} tools → {OUT_PATH}", + file=sys.stderr) + + +if __name__ == "__main__": + main() diff --git a/generator/mcp.py b/generator/mcp.py new file mode 100644 index 0000000..a97b0f3 --- /dev/null +++ b/generator/mcp.py @@ -0,0 +1,154 @@ +"""MCP tool-manifest generator. + +Projects the *enriched* MEOS catalog (`network` / `wire` / `typeEncodings` +from the service-projection pass) onto a Model Context Protocol (MCP) tool +manifest: one tool per stateless-exposable function, so an LLM/agent can +call the MEOS value algebra directly. + +Unlike the OpenAPI projection, every tool is **self-contained** — its +`inputSchema` inlines all definitions (no shared `$ref`s), which is what MCP +clients expect. `x-meos` carries the decode/encode function names and +category so a runtime can dispatch a call without any extra metadata. + +Pure `dict` → `dict`; no libclang and no MEOS runtime. Deterministic +(tools sorted by name) so generated diffs stay reviewable. +""" + +import re + +_QUAL_RE = re.compile(r"\b(const|volatile|struct|union|enum)\b") +_PRIM = {"integer": "integer", "number": "number", + "boolean": "boolean", "string": "string"} + + +def _clean_type(c_type: str) -> str: + """``const struct Temporal *`` -> ``Temporal``.""" + return " ".join(_QUAL_RE.sub(" ", c_type).replace("*", " ").split()) + + +def _enum_values(name: str, enums: list) -> list: + for e in enums: + if e["name"] == name: + return [v["name"] for v in e.get("values", [])] + return [] + + +def _param_schema(p: dict, enums: list) -> dict: + if p["kind"] == "json": + if p.get("enum"): + s = {"type": "string", "title": p["enum"]} + vals = _enum_values(p["enum"], enums) + if vals: + s["enum"] = vals + return s + return {"type": _PRIM.get(p.get("json", "string"), "string")} + if p["kind"] == "array": # builder (Elem **, count) + return {"type": "array", + "items": _param_schema(p["element"], enums)} + # serialized + t = _clean_type(p["cType"]) + encs = ", ".join(p.get("encodings", [])) or "text" + return { + "type": "string", + "title": t, + "description": ( + f"A MEOS {t} value, serialized as {encs} " + f"(e.g. WKT / MF-JSON / HexWKB)." + ), + } + + +def _describe(fn: dict) -> str: + doc = fn.get("doc") + text = doc.strip() if doc else ( + f"MEOS {fn['category']} operation `{fn['name']}`." + ) + if any(p["kind"] == "serialized" for p in fn["wire"]["params"]): + text += (" Spatiotemporal arguments are passed as serialized strings " + "(text/WKT, MF-JSON, or HexWKB).") + return text + + +def _result_schema(result: dict, enums: list): + if result["kind"] == "json": + if result.get("enum"): + s = {"type": "string"} + vals = _enum_values(result["enum"], enums) + if vals: + s["enum"] = vals + return s + return {"type": _PRIM.get(result.get("json", "string"), "string")} + if result["kind"] == "serialized": + return {"type": "string", "title": _clean_type(result["cType"])} + return None # void + + +def _tool(fn: dict, enums: list) -> dict: + wire = fn["wire"] + props, required = {}, [] + for p in wire["params"]: + props[p["name"]] = _param_schema(p, enums) + required.append(p["name"]) + + tool = { + "name": fn["name"], + "description": _describe(fn), + "inputSchema": { + "$schema": "https://json-schema.org/draft/2020-12/schema", + "type": "object", + "properties": props, + "required": required, + "additionalProperties": False, + }, + "annotations": { + "title": fn["name"], + "readOnlyHint": True, + "idempotentHint": True, + "destructiveHint": False, + "openWorldHint": False, + }, + "x-meos": {"category": fn["category"]}, + } + + rs = _result_schema(wire["result"], enums) + if rs is not None: + tool["outputSchema"] = { + "type": "object", + "properties": {"result": rs}, + "required": ["result"], + } + if wire["result"]["kind"] == "serialized": + tool["x-meos"]["encode"] = wire["result"]["encode"] + + decode = {p["name"]: p["decode"] for p in wire["params"] + if p["kind"] == "serialized"} + if decode: + tool["x-meos"]["decode"] = decode + return tool + + +def build_mcp(catalog: dict, *, server_name: str = "meos") -> dict: + """Build an MCP tool manifest from an enriched catalog.""" + functions = sorted( + (f for f in catalog.get("functions", []) + if f.get("network", {}).get("exposable")), + key=lambda f: f["name"], + ) + enums = catalog.get("enums", []) + tools = [_tool(f, enums) for f in functions] + return { + "x-meos": { + "server": server_name, + "description": ( + "MEOS spatiotemporal value algebra exposed as MCP tools, " + "generated from the MEOS-API catalog. One tool per " + "stateless-exposable function; spatiotemporal values are " + "passed as serialized strings. Generated, do not edit." + ), + "coverage": { + "functions": len(catalog.get("functions", [])), + "exposed": len(tools), + }, + }, + "tools": tools, + } diff --git a/tests/test_mcp.py b/tests/test_mcp.py new file mode 100644 index 0000000..e4c52ad --- /dev/null +++ b/tests/test_mcp.py @@ -0,0 +1,158 @@ +"""Unit tests for generator/mcp.py. + +Runs without libclang or pytest: python3 tests/test_mcp.py +""" + +import sys +import unittest +from pathlib import Path + +sys.path.insert(0, str(Path(__file__).resolve().parents[1])) + +from generator.mcp import build_mcp + +TEMP = "const struct Temporal *" + + +def serialized(name, ctype, decode): + return {"name": name, "kind": "serialized", "cType": ctype, + "decode": decode, "encodings": ["mfjson", "text", "wkb"]} + + +CATALOG = { + "functions": [ + { + "name": "temporal_eq", "category": "predicate", + "network": {"exposable": True}, + "wire": { + "params": [serialized("temp1", TEMP, "temporal_in"), + serialized("temp2", TEMP, "temporal_in")], + "result": {"kind": "json", "json": "integer"}, + }, + }, + { + "name": "temporal_set_interp", "category": "transformation", + "doc": "Set the interpolation of a temporal value.", + "network": {"exposable": True}, + "wire": { + "params": [ + serialized("temp", TEMP, "temporal_in"), + {"name": "interp", "kind": "json", "json": "string", + "enum": "interpType"}, + ], + "result": {"kind": "serialized", "cType": "struct Temporal *", + "encode": "temporal_out", + "encodings": ["text"]}, + }, + }, + { + "name": "noop_op", "category": "transformation", + "network": {"exposable": True}, + "wire": {"params": [], "result": {"kind": "void"}}, + }, + { + "name": "tsequence_make", "category": "constructor", + "network": {"exposable": False, + "reason": "array-or-out-param:instants"}, + "wire": {"params": [], "result": {"kind": "unsupported"}}, + }, + { + "name": "temporal_merge_array", "category": "transformation", + "network": {"exposable": True}, + "wire": { + "params": [{"name": "temparr", "kind": "array", + "count_param": "count", + "element": {"kind": "serialized", + "cType": "struct Temporal *", + "decode": "temporal_in", + "encodings": ["text"]}}], + "result": {"kind": "serialized", + "cType": "struct Temporal *", + "encode": "temporal_out"}}, + }, + ], + "enums": [{"name": "interpType", + "values": [{"name": "STEP", "value": 0}, + {"name": "LINEAR", "value": 1}]}], + "structs": [], +} + + +class McpTests(unittest.TestCase): + def setUp(self): + self.m = build_mcp(CATALOG) + self.tools = {t["name"]: t for t in self.m["tools"]} + + def test_envelope_and_exclusion(self): + self.assertEqual(self.m["x-meos"]["coverage"], + {"functions": 5, "exposed": 4}) + self.assertNotIn("tsequence_make", self.tools) + self.assertEqual(len(self.m["tools"]), 4) + + def test_array_param_inlined(self): + t = self.tools["temporal_merge_array"] + a = t["inputSchema"]["properties"]["temparr"] + self.assertEqual(a["type"], "array") + self.assertEqual(a["items"]["type"], "string") # serialized elem + self.assertIn("MEOS", a["items"]["description"]) + + def test_tools_sorted(self): + names = [t["name"] for t in self.m["tools"]] + self.assertEqual(names, sorted(names)) + + def test_input_schema_and_serialized_param(self): + t = self.tools["temporal_eq"] + s = t["inputSchema"] + self.assertEqual(s["$schema"], + "https://json-schema.org/draft/2020-12/schema") + self.assertEqual(s["type"], "object") + self.assertEqual(s["required"], ["temp1", "temp2"]) + self.assertFalse(s["additionalProperties"]) + p = s["properties"]["temp1"] + self.assertEqual(p["type"], "string") + self.assertIn("MEOS Temporal", p["description"]) + self.assertEqual(t["x-meos"]["decode"], + {"temp1": "temporal_in", "temp2": "temporal_in"}) + self.assertEqual(t["x-meos"]["category"], "predicate") + # scalar result -> wrapped outputSchema + self.assertEqual( + t["outputSchema"]["properties"]["result"], {"type": "integer"}) + + def test_enum_param_inlined(self): + t = self.tools["temporal_set_interp"] + interp = t["inputSchema"]["properties"]["interp"] + self.assertEqual(interp["type"], "string") + self.assertEqual(interp["enum"], ["STEP", "LINEAR"]) + self.assertEqual(t["description"], + "Set the interpolation of a temporal value. " + "Spatiotemporal arguments are passed as serialized " + "strings (text/WKT, MF-JSON, or HexWKB).") + self.assertEqual(t["x-meos"]["encode"], "temporal_out") + self.assertEqual( + t["outputSchema"]["properties"]["result"]["type"], "string") + + def test_void_has_no_output_schema(self): + t = self.tools["noop_op"] + self.assertNotIn("outputSchema", t) + self.assertEqual(t["inputSchema"]["properties"], {}) + self.assertEqual(t["inputSchema"]["required"], []) + + def test_annotations(self): + a = self.tools["temporal_eq"]["annotations"] + self.assertTrue(a["readOnlyHint"]) + self.assertTrue(a["idempotentHint"]) + self.assertFalse(a["destructiveHint"]) + self.assertFalse(a["openWorldHint"]) + + def test_all_tools_well_formed(self): + for t in self.m["tools"]: + self.assertTrue(t["name"]) + self.assertTrue(t["description"]) + self.assertEqual(t["inputSchema"]["type"], "object") + self.assertLessEqual( + set(t["inputSchema"]["required"]), + set(t["inputSchema"]["properties"])) + + +if __name__ == "__main__": + unittest.main(verbosity=2)