It appears that Illumina's Hail tables of gnomAD data drop leading zeros from ClinVar Canonical Allele IDs. So, for instance, CA025094 is recorded in Illumina's table (or at least in the version of it that has been transformed and stored on S3 to be queried by Athena) as CA25094.
Since MaveDB uses CAIDs to annotate mapped variants with gnomAD minor allele frequencies, it seems possible that some annotations will be missed if we're not stripping the leading zero. Let's check on this before rolling out gnomAD features.
Example variant
URN: urn:mavedb:00001224-a-1#1
ClinGen allele ID: CA025094
gnomAD data: https://gnomad.broadinstitute.org/variant/13-32356440-G-A?dataset=gnomad_r4
It appears that Illumina's Hail tables of gnomAD data drop leading zeros from ClinVar Canonical Allele IDs. So, for instance,
CA025094is recorded in Illumina's table (or at least in the version of it that has been transformed and stored on S3 to be queried by Athena) asCA25094.Since MaveDB uses CAIDs to annotate mapped variants with gnomAD minor allele frequencies, it seems possible that some annotations will be missed if we're not stripping the leading zero. Let's check on this before rolling out gnomAD features.
Example variant
URN:
urn:mavedb:00001224-a-1#1ClinGen allele ID:
CA025094gnomAD data: https://gnomad.broadinstitute.org/variant/13-32356440-G-A?dataset=gnomad_r4