Skip to content

feat: schema datatype annotations in N-Quads export#46

Merged
Haigutus merged 2 commits into
mainfrom
feat/nquads-datatypes
Jun 11, 2026
Merged

feat: schema datatype annotations in N-Quads export#46
Haigutus merged 2 commits into
mainfrom
feat/nquads-datatypes

Conversation

@Haigutus

Copy link
Copy Markdown
Owner

Closes #31 — the datatype support that was keeping it open.

What

When the export schema declares an xsd:type for a property, N-Quads literals carry the datatype:

<urn:uuid:...> <http://iec.ch/TC57/CIM100#Conductor.length> "44.84"^^<http://www.w3.org/2001/XMLSchema#float> <urn:uuid:...> .

Rules

  • xsd:string keys → plain literals (RDF 1.1 default type, no annotation bloat)
  • xsd:anyURI keys (e.g. Model.DependentOn) are references, not typed literals — they keep IRI handling (<urn:uuid:...>); caught by a test after the first implementation turned them into literals
  • A key with a declared datatype is a literal by schema, so the check precedes the UUID heuristic — this also fixes IdentifiedObject.mRID being mis-exported as a urn:uuid reference instead of the string literal it is
  • Without a schema, output is byte-identical to before

Applies to both engines (pandas + polars) via the shared nquads_utils.build_key_metadata / make_object.

Tests

TestNquadsDatatypes: typed floats, strings stay plain, mRID-as-literal, anyURI stays IRI, no-schema output unchanged, polars/pandas parity, and rdflib validation — the export parses as N-Quads (rdflib.Dataset), statement count matches, and typed literals round-trip with datatype == XSD.float and python float values.

Full suite: 154 passed, 44 skipped, 1 known xfail.

Kristjan Vilgo added 2 commits June 11, 2026 15:20
When the export schema declares an xsd:type for a property, literals
get the datatype annotation: "1.5"^^<http://www.w3.org/2001/XMLSchema#float>.

- xsd:string keys stay plain literals (RDF 1.1 default type)
- xsd:anyURI keys (e.g. Model.DependentOn) are references — excluded,
  they keep IRI handling
- a key with a declared datatype is a literal by schema, so the check
  precedes the UUID heuristic — fixes IdentifiedObject.mRID being
  mis-exported as a urn:uuid reference instead of a string literal
- without a schema, output is unchanged (no annotations)

Tests: typed floats, plain strings, mRID-as-literal, anyURI references,
no-schema unchanged, polars/pandas parity, and rdflib validation — the
export parses as N-Quads and typed literals round-trip with the right
python type.
Every urn:uuid object IRI must resolve to a subject in the dataset,
except references the source data itself knows are dangling — the
unresolved set is cross-checked against get_dangling_references()
(boundary objects, Model.DependentOn to other profiles).
@Haigutus Haigutus merged commit b1185b7 into main Jun 11, 2026
6 checks passed
@Haigutus Haigutus deleted the feat/nquads-datatypes branch June 13, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add export of nquads - good fast input to qlever or other sparql engines

1 participant