EncoDB

EncoDB is a lightweight artifact for studying encoding-related behavioral divergences across database systems and conversion tools.

Environment

PostgreSQL 18.3
MySQL 8.4.0
MariaDB 11.8.6
DuckDB 1.4.3, with the encodings extension (version b5a547e)
(GNU libc) iconv 2.43

Research Questions

RQ1: Measure how DBMSs behave after malformed bytes are admitted on the write path. The basic method is to enumerate candidate byte sequences for a target encoding, insert them into DBMS tables, read them back immediately, and record admission, readback success, and emitted character mappings. See RQ1/.
RQ2: Compare query-side validity, mapping, and failure semantics on common encodings. The basic method is to probe DuckDB and iconv, normalize the outputs into *-CHARS.txt files, and run pairwise differential analysis with RQ2/compare_chars.py. See RQ2/.
RQ3: Check whether observed divergences become compatibility bugs under claimed equivalence. The basic method is to use TiDB GBK probing, compare the results against MySQL outputs, and further compare CONVERT behavior with the same differential analysis workflow. See RQ3/.

In addition, the efficiency result is based on simple wall-clock timing already printed by the probe and comparison scripts. We did not introduce extra optimization specifically, but the scripts are still efficient in practice.

We also implemented a PostgreSQL patch to tighten the validation for EUC-CN (https://github.com/SWUFE-DB-Group/postgresql-encoding-validation), and conducted end-to-end testing with the patched PostgreSQL 18.3. See GB2312-PG-benchmark.

Web Viewer

https://encodb-web.pages.dev/

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
GB2312-PG-benchmark		GB2312-PG-benchmark
RQ1		RQ1
RQ2		RQ2
RQ3		RQ3
.codex		.codex
.gitignore		.gitignore
README.md		README.md
mapping-divergences.md		mapping-divergences.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EncoDB

Environment

Research Questions

Web Viewer

Related Bug Reports and Discussion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EncoDB

Environment

Research Questions

Web Viewer

Related Bug Reports and Discussion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages