Machine Learning meets Biochemistry.
Tested on Linux. Expect bugs.
Required python packages:
numpy numba scipy tqdm scikit-learn rdkit openbabel mordred matplotlib
Create three separate directories for storing ananas database, results of the experiments, and external data.
Set the following environment variables as full paths to the newly created directories:
CRISPER_DATA_PATH ANANAS_RESULTS_PATH BANANAS_EXTERNAL_DATA_PATH
Download ChEMBL 24.1 database (sqlite), extract the archive, and put the file chembl_24.db in the BANANAS_EXTERNAL_DATA_PATH directory.
cd fruits/elderberries/benchmarks2018
./export_zip.py TARGET_ID
Running this script will create a ZIP archive TARGET_ID.zip in the current directory.
Replace TARGET_ID with ChEMBL target ID, e.g. CHEMBL214 for serotonin 1a (5-HT1a) receptor.
cd fruits/elderberries/benchmarks2018
./run.py
You can run multiple instances of the script in parallel, e.g.:
for i in {1..43}; do ./run.py & done
Monitor the progress:
env CRISPER_MODE=MONITOR ./run.py
Results will be stored in the ANANAS_RESULTS_PATH directory.
Modify the set of targets:
Edit line 108 in fruits/elderberries/benchmarks2018/problem.py.
Modify the set of trained models:
Edit dictionaries SOLUTIONS_C and SOLUTIONS_R (classification and regression respectively) in fruits/elderberries/benchmarks2018/solutions.py.
Modify the set of run benchmarks:
Edit the list SUMMARIES in fruits/elderberries/benchmarks2018/run.py
Sometimes it might be necessary to manually unlock the database. Stop all running scripts and:
- remove
CRISPER_DATA_PATH/cachedirectory - remove
CRISPER_DATA_PATH/fingerprints/cachedirectory - remove
CRISPER_DATA_PATH/fingerprints/pendingdirectory