-
Notifications
You must be signed in to change notification settings - Fork 6
Rules
In order to participate we do encourage some level of standardization to make this process a bit smoother for everyone.
To make sure the comparisons are as fair as possible, all comparisons need to use the same dataset. However, the way in which it was used can be in whatever structure you find works best. In this challenge, we will be using a publicly available dataset: the 1000 Genomes. These files can be parsed and transformed into any format necessary for your database schema. All data must be loaded into your database, so no pre-filtering!
Special Note: the AD field in the VCF file will need to be parsed in one of the challenges, so make sure you pay attention to that! See more details about each challenge on the Home page.
-
Database Name
-
user/
-
README.md
- This markdown document should explain how to run you code. Importantly, it must contain sufficient detail so that someone else can replicate your results. This would include which particular database version you use, any instructions for sharding or indexing, etc.
-
SCHEMAs.md
- This document should contain an example entry for each document or table in your database, so that users can easily understand your design at a glance.
-
scripts/
- All scripts used from the initial
curlorwgetcalls to database import and query will go here.
- All scripts used from the initial
-
README.md
-
user/
Note: 🐳 Docker containers are extremely helpful here to avoid assumptions about dependencies and ensures the toolkit will work in other people's hands!
All code must be made publicly available. In particular, we will make the code available on this site so others can learn from your approach. No code, no contribution. Remember, this is a learning exercise and being open is a critical component.