Skip to content

The data for the chardet tests, pulled out into its own repo since licensing can be an issue

Notifications You must be signed in to change notification settings

chardet/test-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

chardet test data

Test data files for the chardet Universal Encoding Detector. Each file is organized into directories named {encoding} or {encoding}-{language} (e.g., big5-chinese, utf-8-english, cp037-breton).

See CATALOG.md for a full listing of every file's provenance and characteristics.

Data quality

Run check_test_data.py to verify that all files decode correctly with their labeled encoding and pass quality checks (mojibake, control characters, language/script mismatches):

python3 check_test_data.py .

Contributing

Contributions of openly-licensed test data are welcome at https://github.com/chardet/chardet.

License

Each test file is copyright its respective publisher.

About

The data for the chardet tests, pulled out into its own repo since licensing can be an issue

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors