Improvement of multidog

Hi, I've been using updog to genotype GBS data for a whole genome as part of my PhD thesis and thus I have very big datasets (normally of the order of 300K variants but one of them happened to be 1.8M variants). At the time I started genotyping multidog was not implemented, so I wrote my own parallel implementation of flexdog.

Initially I had a major issue with memory and time efficiency in my computer cluster, which I managed to solve with some nifty tricks. Now I saw that there's the new multidog and I've been doing some tests (not done yet) in which it seems that my approach is almost 1000x more memory efficient and 40x faster than the current multidog implementation. Using profvis readings, running 5k markers on 25 cores multidog took 11332Mb and 28520ms where my implementation took 11.6Mb and 730ms. I attach the profvis object for you to check (to view unzip the file, and load into R using `result <- readRDS()`; use` print(result)` after loading the profvis library, and select the "Data" tab, not the "Flame Graph" tab).
[updog_test_profiling.zip](https://github.com/dcgerard/updog/files/5383855/updog_test_profiling.zip)

I'd be happy to collaborate on the code but it would imply a major re-write of how multidog works. Is that okay? Should I just put forth a pull request? (I haven't used Github a lot so a bit of guidance would be helpful)

Cheers,
Alejandro Thérèse Navarro

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement of multidog #12

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Improvement of multidog #12

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions