Skip to content

mariisaschmidt/CCE-reconstruction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

112 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clausal Coordinate Ellipsis Reconstruction in German

This is the top level Readme. For more information have a look at src/README.md

Project Structure

  • CCE-reconstruction/
    • data/ -> Directory for corpus files
    • models/ -> Directory for model files
    • scripts/ -> Directory for bash scripts
    • src/ -> Directory for source code
    • visualizations/ -> Directory for visualization notebooks
    • run.sh -> Bash script to set up and run the system
    • README.md -> Top level Readme

Quickly run the system

For convenience we offer a bash script to simplify using our system.

  1. download the corpus files and place them in the data directory
  2. make script executable: chmod +x run.sh
  3. Execute ./run.sh --help. This will help you set up the environment and train or evaluate models.

NOTE: this system requires CUDA, but it should be possible to deactivate this dependency, which will result in much longer runtimes.

Python Environment

On my system Python 3.10.12 and the following packages are installed:

Package Version
absl-py 2.1.0
accelerate 0.26.1
aiohttp 3.9.1
aiosignal 1.3.1
alembic 1.13.3
async-timeout 4.0.3
attrs 21.2.0
Automat 20.2.0
Babel 2.8.0
bcrypt 3.2.0
blinker 1.4
certifi 2020.6.20
chardet 4.0.0
charset-normalizer 3.3.2
click 8.0.3
cloud-init 24.3.1
colorama 0.4.4
colorlog 6.8.2
command-not-found 0.3
configobj 5.0.6
constantly 15.1.0
contourpy 1.1.1
cryptography 3.4.8
cycler 0.12.1
datasets 2.16.1
dbus-python 1.2.18
dill 0.3.7
distro 1.7.0
distro-info 1.1+ubuntu0.2
evaluate 0.4.1
filelock 3.12.4
fonttools 4.43.1
frozenlist 1.4.1
fsspec 2023.9.2
greenlet 3.1.1
grpcio 1.66.1
httplib2 0.20.2
huggingface-hub 0.20.2
hyperlink 21.0.0
idna 3.3
importlib-metadata 4.6.4
incremental 21.3.0
jeepney 0.7.1
Jinja2 3.0.3
joblib 1.3.2
jsonpatch 1.32
jsonpointer 2.0
jsonschema 3.2.0
kaleido 0.2.1
keyring 23.5.0
kiwisolver 1.4.5
launchpadlib 1.10.16
lazr.restfulclient 0.14.4
lazr.uri 1.0.6
lxml 5.1.0
Mako 1.3.5
Markdown 3.7
MarkupSafe 2.1.5
matplotlib 3.8.0
more-itertools 8.10.0
mpmath 1.3.0
multidict 6.0.4
multiprocess 0.70.15
netifaces 0.11.0
networkx 3.2
nltk 3.8.1
numpy 1.26.1
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.2.140
nvidia-nvtx-cu12 12.1.105
oauthlib 3.2.0
optuna 4.0.0
packaging 23.2
pandas 2.1.4
pexpect 4.8.0
Pillow 10.1.0
pip 22.0.2
plotly 5.24.1
portalocker 2.8.2
protobuf 5.28.0
psutil 5.9.7
ptyprocess 0.7.0
pyarrow 14.0.2
pyarrow-hotfix 0.6
pyasn1 0.4.8
pyasn1-modules 0.2.1
PyGObject 3.42.1
PyHamcrest 2.0.2
PyJWT 2.3.0
pyOpenSSL 21.0.0
pyparsing 2.4.7
pyrsistent 0.18.1
pyserial 3.5
python-apt 2.4.0+ubuntu4
python-dateutil 2.8.2
python-debian 0.1.43+ubuntu1.1
python-magic 0.4.24
pytz 2022.1
PyYAML 5.4.1
regex 2023.12.25
requests 2.31.0
responses 0.18.0
sacrebleu 2.4.0
safetensors 0.4.1
scikit-learn 1.5.2
scipy 1.14.1
SecretStorage 3.3.1
sentencepiece 0.1.99
service-identity 18.1.0
setuptools 59.6.0
six 1.16.0
sos 4.5.6
SQLAlchemy 2.0.35
ssh-import-id 5.11
sympy 1.12
systemd-python 234
tabulate 0.9.0
tenacity 9.0.0
tensorboard 2.17.1
tensorboard-data-server 0.7.2
threadpoolctl 3.5.0
tokenizers 0.15.0
torch 2.1.0
tqdm 4.66.1
transformers 4.36.2
triton 2.1.0
Twisted 22.1.0
typing_extensions 4.8.0
tzdata 2023.4
ubuntu-drivers-common 0.0.0
ubuntu-pro-client 8001
ufw 0.36.1
unattended-upgrades 0.1
urllib3 1.26.5
wadllib 1.3.6
Werkzeug 3.0.4
wheel 0.37.1
xkit 0.0.0
xxhash 3.4.1
yarl 1.9.4
zipp 1.0.0
zope.interface 5.4.0

Citation

This code belongs to the following paper and should be cited as the same:

Schmidt, M., Harbusch, K., & Memmesheimer, D. (2024, September). Automatic Ellipsis Reconstruction in Coordinated German Sentences Based on Text-to-Text Transfer Transformers. In International Conference on Text, Speech, and Dialogue (pp. 171-183). Cham: Springer Nature Switzerland.

About

This repository consists of the implementation of the paper "Automatic Ellipsis Reconstruction in Coordinated German Sentences based on Text-To-Text Transfer Transformers" which is accepted at 27th International Conference on Text, Speech and Dialogue.

Topics

Resources

Stars

Watchers

Forks

Contributors