This is the top level Readme. For more information have a look at src/README.md
- CCE-reconstruction/
- data/ -> Directory for corpus files
- models/ -> Directory for model files
- scripts/ -> Directory for bash scripts
- src/ -> Directory for source code
- visualizations/ -> Directory for visualization notebooks
- run.sh -> Bash script to set up and run the system
- README.md -> Top level Readme
For convenience we offer a bash script to simplify using our system.
- download the corpus files and place them in the data directory
- make script executable:
chmod +x run.sh - Execute
./run.sh --help. This will help you set up the environment and train or evaluate models.
NOTE: this system requires CUDA, but it should be possible to deactivate this dependency, which will result in much longer runtimes.
On my system Python 3.10.12 and the following packages are installed:
Package Version
absl-py 2.1.0
accelerate 0.26.1
aiohttp 3.9.1
aiosignal 1.3.1
alembic 1.13.3
async-timeout 4.0.3
attrs 21.2.0
Automat 20.2.0
Babel 2.8.0
bcrypt 3.2.0
blinker 1.4
certifi 2020.6.20
chardet 4.0.0
charset-normalizer 3.3.2
click 8.0.3
cloud-init 24.3.1
colorama 0.4.4
colorlog 6.8.2
command-not-found 0.3
configobj 5.0.6
constantly 15.1.0
contourpy 1.1.1
cryptography 3.4.8
cycler 0.12.1
datasets 2.16.1
dbus-python 1.2.18
dill 0.3.7
distro 1.7.0
distro-info 1.1+ubuntu0.2
evaluate 0.4.1
filelock 3.12.4
fonttools 4.43.1
frozenlist 1.4.1
fsspec 2023.9.2
greenlet 3.1.1
grpcio 1.66.1
httplib2 0.20.2
huggingface-hub 0.20.2
hyperlink 21.0.0
idna 3.3
importlib-metadata 4.6.4
incremental 21.3.0
jeepney 0.7.1
Jinja2 3.0.3
joblib 1.3.2
jsonpatch 1.32
jsonpointer 2.0
jsonschema 3.2.0
kaleido 0.2.1
keyring 23.5.0
kiwisolver 1.4.5
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
lxml 5.1.0
Mako 1.3.5
Markdown 3.7
MarkupSafe 2.1.5
matplotlib 3.8.0
more-itertools 8.10.0
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
netifaces 0.11.0
networkx 3.2
nltk 3.8.1
numpy 1.26.1
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.2.140
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.0
optuna 4.0.0
packaging 23.2
pandas 2.1.4
pexpect 4.8.0
Pillow 10.1.0
pip 22.0.2
plotly 5.24.1
portalocker 2.8.2
protobuf 5.28.0
psutil 5.9.7
ptyprocess 0.7.0
pyarrow 14.0.2
pyarrow-hotfix 0.6
pyasn1 0.4.8
pyasn1-modules 0.2.1
PyGObject 3.42.1
PyHamcrest 2.0.2
PyJWT 2.3.0
pyOpenSSL 21.0.0
pyparsing 2.4.7
pyrsistent 0.18.1
pyserial 3.5
python-apt 2.4.0+ubuntu4
python-dateutil 2.8.2
python-debian 0.1.43+ubuntu1.1
python-magic 0.4.24
pytz 2022.1
PyYAML 5.4.1
regex 2023.12.25
requests 2.31.0
responses 0.18.0
sacrebleu 2.4.0
safetensors 0.4.1
scikit-learn 1.5.2
scipy 1.14.1
SecretStorage 3.3.1
sentencepiece 0.1.99
service-identity 18.1.0
setuptools 59.6.0
six 1.16.0
sos 4.5.6
SQLAlchemy 2.0.35
ssh-import-id 5.11
sympy 1.12
systemd-python 234
tabulate 0.9.0
tenacity 9.0.0
tensorboard 2.17.1
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
tokenizers 0.15.0
torch 2.1.0
tqdm 4.66.1
transformers 4.36.2
triton 2.1.0
Twisted 22.1.0
typing_extensions 4.8.0
tzdata 2023.4
ubuntu-drivers-common 0.0.0
ubuntu-pro-client 8001
ufw 0.36.1
unattended-upgrades 0.1
urllib3 1.26.5
wadllib 1.3.6
Werkzeug 3.0.4
wheel 0.37.1
xkit 0.0.0
xxhash 3.4.1
yarl 1.9.4
zipp 1.0.0
zope.interface 5.4.0
This code belongs to the following paper and should be cited as the same:
Schmidt, M., Harbusch, K., & Memmesheimer, D. (2024, September). Automatic Ellipsis Reconstruction in Coordinated German Sentences Based on Text-to-Text Transfer Transformers. In International Conference on Text, Speech, and Dialogue (pp. 171-183). Cham: Springer Nature Switzerland.