Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ This catalog is the foundation for generating language bindings (Python, Java, R
- [Getting started](#getting-started)
- [Output format](#output-format)
- [Adding metadata](#adding-metadata)
- [MCP generation](#mcp-generation)

## How it works

Expand Down Expand Up @@ -83,3 +84,28 @@ A typical function entry looks like this:
## Adding metadata

Manual annotations (ownership rules, additional documentation, deprecation flags, etc.) live in `meta/meos-meta.json`. The merger applies them on top of the libclang-parsed structure when generating the final catalog.

## MCP generation

The enriched catalog also projects onto a **Model Context Protocol (MCP)**
tool manifest, so an LLM/agent can call the MEOS value algebra directly:

```bash
python run.py # produce the enriched catalog
python generate_mcp.py # output/meos-idl.json -> output/meos-mcp.json
```

Every *stateless-exposable* MEOS function becomes one MCP tool with a
**self-contained** JSON Schema (2020-12) — enums and opaque-type schemas are
inlined, since MCP clients don't resolve external `$ref`s. Spatiotemporal
values are passed as serialized strings (text/WKT, MF-JSON, HexWKB);
`annotations` mark the tools read-only/idempotent; `x-meos.{decode,encode}`
give a runtime everything it needs to dispatch a call.

Against the live MobilityDB `master` catalog this yields **1952 tools**
(90% of the public API; internal `meos_internal*.h` policy-excluded),
array params rendered as JSON arrays.
Pure `dict` → `dict` (no libclang, no MEOS runtime); see
[`docs/mcp.md`](docs/mcp.md) for the projection rules and roadmap, and
[`tests/test_mcp.py`](tests/test_mcp.py) for worked examples
(`python3 tests/test_mcp.py`).
64 changes: 64 additions & 0 deletions docs/mcp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
# MCP tool-manifest projection

`generator/mcp.py` turns the **enriched** catalog (`network` / `wire` /
`typeEncodings`, see [`enrichment.md`](enrichment.md)) into a **Model Context
Protocol (MCP)** tool manifest — one tool per stateless-exposable MEOS
function, so an LLM/agent can call the MEOS value algebra directly.

```bash
python run.py # enriched catalog -> output/meos-idl.json
python generate_mcp.py # -> output/meos-mcp.json
```

Pure `dict` → `dict` (no libclang, no MEOS runtime); deterministic
(tools sorted by name) so generated diffs are reviewable.

## Why a separate generator (not the OpenAPI one)

MCP `inputSchema` must be a **self-contained** JSON Schema per tool — MCP
clients do not resolve external `#/components/...` `$ref`s. So enums and
opaque-type schemas are **inlined** into each tool rather than referenced.
The projection rules are otherwise the same model as
[`openapi.md`](openapi.md); only the rendering differs.

## Projection rules

| MEOS concept | MCP tool |
|---|---|
| stateless-exposable function | one tool, `name = function` |
| `doc` (or synthesized) | `description`; serialized args add a "passed as serialized strings" hint so the model formats them correctly |
| parameter | `inputSchema.properties` entry (all `required`, `additionalProperties:false`, JSON Schema 2020-12) |
| `wire` scalar / enum | inline `{"type": …}` / `{"type":"string","enum":[real C constant names]}` |
| `wire` serialized | `{"type":"string"}` + a description naming the type and its encodings (text/MF-JSON/HexWKB) |
| `wire` array (builder `(Elem **,count)`) | `{"type":"array","items":<element schema>}`; the C `count` is the array length |
| out-parameter result (`from_outparam`) | the out-param value is the tool result (scalar or serialized); `presence_return` false ⇒ no value |
| result | `outputSchema` = `{type:object, properties:{result:…}}`; `void` ⇒ no `outputSchema` |
| purity | `annotations`: `readOnlyHint`/`idempotentHint` true, `destructiveHint`/`openWorldHint` false |
| dispatch metadata | `x-meos.category`, `x-meos.decode` (param → MEOS parse fn), `x-meos.encode` (result serialize fn) |

A runtime serves a call by JSON-decoding the arguments, running each
`x-meos.decode` on the serialized strings, invoking the function, and
`x-meos.encode` on the result — nothing beyond this manifest is needed.

## Coverage (live MobilityDB `master`)

2161 **public** functions → **1952 tools (85%)**; the internal
`meos_internal*.h` programmer API (511 fns, `Datum`-generic) is
policy-excluded. Spans `predicate`, `transformation`, `accessor`, `io`,
`setop`, `conversion`, `constructor`, `aggregate`. The remaining public
functions carry a truthful `reason` and are overridable via
`meta/meos-meta.json`.

## Limitations / roadmap

- `x-meos` is a namespaced extension to the MCP tool object (clients ignore
unknown keys); the `tools` array itself is spec-pure.
- No MCP **server** here — this PR delivers the manifest/contract; a
generated stdio/HTTP MCP server (decode → call → encode) is the next unit.
- Encoding uses the generic root (`temporal_out`, correct for every
subtype); decoding a polymorphic argument uses a *typed* wrapper
(`tbool_in`) because the generic `temporal_in` needs a semantic type tag.
A mismatched subtype yields a clean error, never a wrong result; carrying
the subtype on the wire for universal decode is the remaining gap.
- Tool count (1829) exceeds what some clients comfortably list; a curated
subset / namespacing by `category` is a sensible later refinement.
39 changes: 39 additions & 0 deletions generate_mcp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Generate an MCP tool manifest from the enriched MEOS catalog.
#
# Usage:
# python run.py # first, to produce the catalog
# python generate_mcp.py # output/meos-idl.json -> output/meos-mcp.json
# python generate_mcp.py in.json [out.json]

import json
import sys
from pathlib import Path

from generator.mcp import build_mcp

IN_PATH = Path(sys.argv[1]) if len(sys.argv) > 1 else Path("output/meos-idl.json")
OUT_PATH = Path(sys.argv[2]) if len(sys.argv) > 2 else Path("output/meos-mcp.json")


def main() -> None:
if not IN_PATH.exists():
sys.exit(f"Catalog not found: {IN_PATH} — run `python run.py` first.")

catalog = json.loads(IN_PATH.read_text())
if "functions" not in catalog or not any(
"network" in f for f in catalog["functions"]
):
sys.exit(f"{IN_PATH} is not enriched (no `network` fields). "
"Run the enrichment pass first.")

manifest = build_mcp(catalog)

OUT_PATH.parent.mkdir(parents=True, exist_ok=True)
OUT_PATH.write_text(json.dumps(manifest, indent=2))

print(f"[mcp] {len(manifest['tools'])} tools → {OUT_PATH}",
file=sys.stderr)


if __name__ == "__main__":
main()
154 changes: 154 additions & 0 deletions generator/mcp.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
"""MCP tool-manifest generator.

Projects the *enriched* MEOS catalog (`network` / `wire` / `typeEncodings`
from the service-projection pass) onto a Model Context Protocol (MCP) tool
manifest: one tool per stateless-exposable function, so an LLM/agent can
call the MEOS value algebra directly.

Unlike the OpenAPI projection, every tool is **self-contained** — its
`inputSchema` inlines all definitions (no shared `$ref`s), which is what MCP
clients expect. `x-meos` carries the decode/encode function names and
category so a runtime can dispatch a call without any extra metadata.

Pure `dict` → `dict`; no libclang and no MEOS runtime. Deterministic
(tools sorted by name) so generated diffs stay reviewable.
"""

import re

_QUAL_RE = re.compile(r"\b(const|volatile|struct|union|enum)\b")
_PRIM = {"integer": "integer", "number": "number",
"boolean": "boolean", "string": "string"}


def _clean_type(c_type: str) -> str:
"""``const struct Temporal *`` -> ``Temporal``."""
return " ".join(_QUAL_RE.sub(" ", c_type).replace("*", " ").split())


def _enum_values(name: str, enums: list) -> list:
for e in enums:
if e["name"] == name:
return [v["name"] for v in e.get("values", [])]
return []


def _param_schema(p: dict, enums: list) -> dict:
if p["kind"] == "json":
if p.get("enum"):
s = {"type": "string", "title": p["enum"]}
vals = _enum_values(p["enum"], enums)
if vals:
s["enum"] = vals
return s
return {"type": _PRIM.get(p.get("json", "string"), "string")}
if p["kind"] == "array": # builder (Elem **, count)
return {"type": "array",
"items": _param_schema(p["element"], enums)}
# serialized
t = _clean_type(p["cType"])
encs = ", ".join(p.get("encodings", [])) or "text"
return {
"type": "string",
"title": t,
"description": (
f"A MEOS {t} value, serialized as {encs} "
f"(e.g. WKT / MF-JSON / HexWKB)."
),
}


def _describe(fn: dict) -> str:
doc = fn.get("doc")
text = doc.strip() if doc else (
f"MEOS {fn['category']} operation `{fn['name']}`."
)
if any(p["kind"] == "serialized" for p in fn["wire"]["params"]):
text += (" Spatiotemporal arguments are passed as serialized strings "
"(text/WKT, MF-JSON, or HexWKB).")
return text


def _result_schema(result: dict, enums: list):
if result["kind"] == "json":
if result.get("enum"):
s = {"type": "string"}
vals = _enum_values(result["enum"], enums)
if vals:
s["enum"] = vals
return s
return {"type": _PRIM.get(result.get("json", "string"), "string")}
if result["kind"] == "serialized":
return {"type": "string", "title": _clean_type(result["cType"])}
return None # void


def _tool(fn: dict, enums: list) -> dict:
wire = fn["wire"]
props, required = {}, []
for p in wire["params"]:
props[p["name"]] = _param_schema(p, enums)
required.append(p["name"])

tool = {
"name": fn["name"],
"description": _describe(fn),
"inputSchema": {
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": props,
"required": required,
"additionalProperties": False,
},
"annotations": {
"title": fn["name"],
"readOnlyHint": True,
"idempotentHint": True,
"destructiveHint": False,
"openWorldHint": False,
},
"x-meos": {"category": fn["category"]},
}

rs = _result_schema(wire["result"], enums)
if rs is not None:
tool["outputSchema"] = {
"type": "object",
"properties": {"result": rs},
"required": ["result"],
}
if wire["result"]["kind"] == "serialized":
tool["x-meos"]["encode"] = wire["result"]["encode"]

decode = {p["name"]: p["decode"] for p in wire["params"]
if p["kind"] == "serialized"}
if decode:
tool["x-meos"]["decode"] = decode
return tool


def build_mcp(catalog: dict, *, server_name: str = "meos") -> dict:
"""Build an MCP tool manifest from an enriched catalog."""
functions = sorted(
(f for f in catalog.get("functions", [])
if f.get("network", {}).get("exposable")),
key=lambda f: f["name"],
)
enums = catalog.get("enums", [])
tools = [_tool(f, enums) for f in functions]
return {
"x-meos": {
"server": server_name,
"description": (
"MEOS spatiotemporal value algebra exposed as MCP tools, "
"generated from the MEOS-API catalog. One tool per "
"stateless-exposable function; spatiotemporal values are "
"passed as serialized strings. Generated, do not edit."
),
"coverage": {
"functions": len(catalog.get("functions", [])),
"exposed": len(tools),
},
},
"tools": tools,
}
Loading