PyTorch reimplementation of the paper "Generative Adversarial Text to Image Synthesis" (Reed et al., 2016).
| Architecture | GAN-CLS algorithm |
|---|---|
![]() |
![]() |
| Architecture of the text-conditional deep convolutional generative adversarial network. Taken from Reed et al., 2016. | GAN-CLS algorithm used to train the text-conditional deep convolutional generative adversarial network. Taken from Reed et at., 2016. |
The dataset can be downloaded from: 102 Category Flower Dataset.
The corresponding captions can be downloaded from: 102 Category Flower Captions.
Make sure the the data directory has following tree structure:
|-- 102flowers
| |-- images // image folder
| |-- captions
| | |-- class_0001 // captions of class_0001
| | |-- class_0002 // captions of class_0001
| | |-- class_0003 // captions of class_0001
. . .
. . .
. . .All settings and hyperparameters can be adjusted in config.py.
Precompute 1024-dimensional pooling units from GoogLeNet by running the script below:
python3 ./flowers/precompute_image_features.pyRun the script below to train a text encoder Char-CNN-RNN that minimizes structured joint embedding loss:
python3 ./flowers/train_text_encoder.pyTo precompute 1024-dimensional text embeddings run the script:
python3 ./flowers/precompute_image_features.pyFinally, we can train the text conditioned deep convolutional generative adversarial network by running:
python3 ./flowers/train_gan_int_cls.pyThe notebook demo_flowers.py illustrates an example usage.
The dataset can be downloaded from: Caltech-UCSD Birds-200-2011 (CUB-200-2011).
The corresponding captions can be downloaded from: CUB-200-2011 Captions.
Make sure the data directory has following tree structure:
|-- CUB_200_2011
| |-- images // image folder
| |-- captions // caption folderAll settings and hyperparameters and be adjusted in config.py.
Precompute 1024-dimensional pooling units from GoogLeNet by running the script below:
python3 ./birds/precompute_image_features.pyRun the script below to train a text encoder Char-CNN-RNN that minimizes structured joint embedding loss:
python3 ./birds/train_text_encoder.pyTo precompute 1024-dimensional text embeddings run the script:
python3 ./birds/precompute_image_features.pyFinally, we can train our text conditioned deep convolutional generative adversarial network by running:
python3 ./birds/train_gan_int_cls.pyThe notebook demo_birds.py illustrates an example usage.
- OS: Fedora Linux 42 (Workstation Edition) x86_64
- CPU: AMD Ryzen 5 2600X (12) @ 3.60 GHz
- GPU: NVIDIA GeForce RTX 3060 ti (8GB VRAM)
- RAM: 32 GB DDR4 3200 MHz
@misc{goodfellow2014generativeadversarialnetworks,
title={Generative Adversarial Networks},
author={Ian J. Goodfellow and Jean Pouget-Abadie and Mehdi Mirza and Bing Xu and David Warde-Farley and Sherjil Ozair and Aaron Courville and Yoshua Bengio},
year={2014},
eprint={1406.2661},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/1406.2661},
}@misc{radford2016unsupervisedrepresentationlearningdeep,
title={Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks},
author={Alec Radford and Luke Metz and Soumith Chintala},
year={2016},
eprint={1511.06434},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1511.06434},
}@misc{reed2016generativeadversarialtextimage,
title={Generative Adversarial Text to Image Synthesis},
author={Scott Reed and Zeynep Akata and Xinchen Yan and Lajanugen Logeswaran and Bernt Schiele and Honglak Lee},
year={2016},
eprint={1605.05396},
archivePrefix={arXiv},
primaryClass={cs.NE},
url={https://arxiv.org/abs/1605.05396},
}@misc{reed2016learningdeeprepresentationsfinegrained,
title={Learning Deep Representations of Fine-grained Visual Descriptions},
author={Scott Reed and Zeynep Akata and Bernt Schiele and Honglak Lee},
year={2016},
eprint={1605.05395},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/1605.05395},
}@techreport{WahCUB_200_2011,
Title = ,
Author = {Wah, C. and Branson, S. and Welinder, P. and Perona, P. and Belongie, S.},
Year = {2011}
Institution = {California Institute of Technology},
Number = {CNS-TR-2011-001}
}
























