Skip to content

SWUFE-DB-Group/EncoDB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EncoDB

EncoDB is a lightweight artifact for studying encoding-related behavioral divergences across database systems and conversion tools.

Environment

  • PostgreSQL 18.3
  • MySQL 8.4.0
  • MariaDB 11.8.6
  • DuckDB 1.4.3, with the encodings extension (version b5a547e)
  • (GNU libc) iconv 2.43

Research Questions

  • RQ1: Measure how DBMSs behave after malformed bytes are admitted on the write path. The basic method is to enumerate candidate byte sequences for a target encoding, insert them into DBMS tables, read them back immediately, and record admission, readback success, and emitted character mappings. See RQ1/.
  • RQ2: Compare query-side validity, mapping, and failure semantics on common encodings. The basic method is to probe DuckDB and iconv, normalize the outputs into *-CHARS.txt files, and run pairwise differential analysis with RQ2/compare_chars.py. See RQ2/.
  • RQ3: Check whether observed divergences become compatibility bugs under claimed equivalence. The basic method is to use TiDB GBK probing, compare the results against MySQL outputs, and further compare CONVERT behavior with the same differential analysis workflow. See RQ3/.

In addition, the efficiency result is based on simple wall-clock timing already printed by the probe and comparison scripts. We did not introduce extra optimization specifically, but the scripts are still efficient in practice.

We also implemented a PostgreSQL patch to tighten the validation for EUC-CN (https://github.com/SWUFE-DB-Group/postgresql-encoding-validation), and conducted end-to-end testing with the patched PostgreSQL 18.3. See GB2312-PG-benchmark.

Web Viewer

Related Bug Reports and Discussion

About

On the Consistency of Non-UTF Character Encoding Behaviors in DBMSs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors