- HDF5 OS drivers (Arch linux:
hdf5-cpp-fortran, Mac homebrew:hdf5) - Python HDF5 support: python-h5py, or
pip install h5py.
Downloads the original handwritten mnist digits from Yan LeCun's website. The data is transformed into a single HDF5 database file and partioned into 50000 train, 10000 validation and 10000 test entries. (The original data is 60000-train, 10000-test with no seperate validation data.)
python download_mnist.pymnist.h5 HDF5 database with handwritten digits and labels in directory datasets.
Then use mnisttest:
mnisttest <path-to-database>/mnist.h5Downloads the CIFAR10 image database from a site of the University of Toronto. The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The data is transformed into a single HDF5 database file.
python download_cifar10.pycifar10.h5 HDF5 database with images and labels in directory datasets.
Then use cifar10test:
cifar10test <path-to-database>/cifar10.h5The recurrent text generator rnnreader is trained with an UTF-8 text file.
Part of the repository is a subset of Shakespeare's collected works, taken from Justin Johnson's torch-rnn repository.
To test text generation via RNNs, use rnnreader:
rnnreader <path-to-text>/tiny-shakespeare.txtUse the python script download_shakespeare.py to get the unabrivated version, about 5x larger:
pip install -U ml-indie-tools
python download_shakespeare.pyResults in shakespeare.txt with the complete works.
Use the python script download_women_writers to download a collection of about 20 books by
authors Emilie Brontë, Jane Austen, and Virginia Woolf from Project Gutenberg:
pip install -U ml-indie-tools
python download_women_writers.pyNote: have a look at the download script, it can be easily modified for other authors, subjects or collections.
The resulting file women_writers.txt, which contains all book texts (about 12MB) concatenated.
This uses ml-indie-tools to download the Complete Works from Project Gutenberg.
The library can be used to download arbitrary book-collections from Project Gutenberg, see Documentation.