Skip to content

fix: tolerate non-string VALUEs in cython cimxml export, check columns (#50)#53

Merged
Haigutus merged 1 commit into
mainfrom
fix/export-input-validation
Jun 12, 2026
Merged

fix: tolerate non-string VALUEs in cython cimxml export, check columns (#50)#53
Haigutus merged 1 commit into
mainfrom
fix/export-input-validation

Conversation

@Haigutus

Copy link
Copy Markdown
Owner

The compiled export crashed with "Expected bytes, got a 'int' object"
when VALUE contained numbers (natural after tableview roundtrips with
string_to_number=True, or user edits) — the lxml engine formats them
silently, so the engines disagreed.

_string_array now goes through astype("string[pyarrow]"): any input
dtype becomes text (matching lxml's formatting), nulls stay null, and
it is 14x faster than the old astype(object) path for clean arrow
columns (8 ms vs 112 ms per 1.14M-row column). Arrow-backed columns
arrive as ChunkedArray; combined to the single contiguous array the
C++ extension requires.

export_to_cimxml and export_to_nquads now fail early with a clear
message when required triplet columns are missing.

#50)

The compiled export crashed with "Expected bytes, got a 'int' object"
when VALUE contained numbers (natural after tableview roundtrips with
string_to_number=True, or user edits) — the lxml engine formats them
silently, so the engines disagreed.

_string_array now goes through astype("string[pyarrow]"): any input
dtype becomes text (matching lxml's formatting), nulls stay null, and
it is 14x faster than the old astype(object) path for clean arrow
columns (8 ms vs 112 ms per 1.14M-row column). Arrow-backed columns
arrive as ChunkedArray; combined to the single contiguous array the
C++ extension requires.

export_to_cimxml and export_to_nquads now fail early with a clear
message when required triplet columns are missing.
@Haigutus Haigutus merged commit e098e7a into main Jun 12, 2026
@Haigutus Haigutus deleted the fix/export-input-validation branch June 13, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant