Rules

In order to participate we do encourage some level of standardization to make this process a bit smoother for everyone.

Data

To make sure the comparisons are as fair as possible, all comparisons need to use the same dataset. However, the way in which it was used can be in whatever structure you find works best. In this challenge, we will be using a publicly available dataset: the 1000 Genomes. These files can be parsed and transformed into any format necessary for your database schema. All data must be loaded into your database, so no pre-filtering!

Special Note: the AD field in the VCF file will need to be parsed in one of the challenges, so make sure you pay attention to that! See more details about each challenge on the Home page.

Directory structure

Database Name
- user/
  - README.md
    - This markdown document should explain how to run you code. Importantly, it must contain sufficient detail so that someone else can replicate your results. This would include which particular database version you use, any instructions for sharding or indexing, etc.
  - SCHEMAs.md
    - This document should contain an example entry for each document or table in your database, so that users can easily understand your design at a glance.
  - scripts/
    - All scripts used from the initial curl or wget calls to database import and query will go here.

Note: 🐳 Docker containers are extremely helpful here to avoid assumptions about dependencies and ensures the toolkit will work in other people's hands!

Transparency

All code must be made publicly available. In particular, we will make the code available on this site so others can learn from your approach. No code, no contribution. Remember, this is a learning exercise and being open is a critical component.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules

Data

Directory structure

Transparency

Let's Get Started.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally