Skip to content

Improvement of multidog #12

Description

@alethere

Hi, I've been using updog to genotype GBS data for a whole genome as part of my PhD thesis and thus I have very big datasets (normally of the order of 300K variants but one of them happened to be 1.8M variants). At the time I started genotyping multidog was not implemented, so I wrote my own parallel implementation of flexdog.

Initially I had a major issue with memory and time efficiency in my computer cluster, which I managed to solve with some nifty tricks. Now I saw that there's the new multidog and I've been doing some tests (not done yet) in which it seems that my approach is almost 1000x more memory efficient and 40x faster than the current multidog implementation. Using profvis readings, running 5k markers on 25 cores multidog took 11332Mb and 28520ms where my implementation took 11.6Mb and 730ms. I attach the profvis object for you to check (to view unzip the file, and load into R using result <- readRDS(); use print(result) after loading the profvis library, and select the "Data" tab, not the "Flame Graph" tab).
updog_test_profiling.zip

I'd be happy to collaborate on the code but it would imply a major re-write of how multidog works. Is that okay? Should I just put forth a pull request? (I haven't used Github a lot so a bit of guidance would be helpful)

Cheers,
Alejandro Thérèse Navarro

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions