Skip to content

Mark intronic variants with a dedicated consequence instead of raising a mapping error #99

@bencap

Description

@bencap

Summary

When a score set contains intronic variants (e.g. BAP1 SGE), the mapping pipeline currently raises an error for each one rather than producing a mapped output with a recognizable consequence label. This causes those variants to land in the error_message field with no post_mapped VRS allele, which in turn prevents the mutational consequence histogram from rendering (see VariantEffect/mavedb-ui#672).

Problem

In dcd_mapping/src/dcd_mapping/vrs_map.py, both _create_pre_mapped_hgvs_strings and _create_post_mapped_hgvs_strings detect intronic variants via is_intronic_variant() and immediately raise a ValueError:

msg = f"Variant is intronic and cannot be processed: {variant}"
raise ValueError(msg)

This exception propagates up through the VRS mapping logic and gets caught at the MappedScore level, setting error_message on the variant record. The result is that intronic variants look identical to genuine mapping failures — there is no way to distinguish "could not map because intronic" from "could not map due to a tooling error".

GA4GH hgvs_tools cannot translate intronic HGVS positions into VRS alleles (the downstream vrs allele translator would error), so producing a post_mapped allele is not currently feasible. The correct behavior is to skip VRS translation but still produce a MappedScore that communicates "this variant is intronic" explicitly.

Proposed behavior

Instead of raising a ValueError, intronic variants should produce a MappedScore with:

  • post_mapped = None (no VRS allele — expected and correct)
  • error_message = None
  • A new dedicated field or a well-known sentinel value that downstream consumers can recognize as an intronic variant

We could then use this tooling downstream to handle edge cases around intronic variants while representing fully annotated variants completely.

Acceptance criteria

  • Intronic variants in a score set no longer produce error_message values in mapped_variants
  • Intronic variants are assigned a vep_functional_consequence value that is distinct from mapping errors and recognizable as intronic (e.g. "intron_variant")
  • Non-intronic variants in the same score set are unaffected
  • score_set.mapping_state reflects incomplete (not failed) when all failures are intronic variants, since this is expected and not an error

Implementation notes

  • The two is_intronic_variant guard blocks in vrs_map.py (one in _create_pre_mapped_hgvs_strings, one in _create_post_mapped_hgvs_strings) should return a sentinel or signal value instead of raising
  • The caller that builds MappedScore objects (in the map_score_set / _map_variants_for_score_set path) needs to handle the intronic signal and populate a sentinel accordingly
  • The MappingState.incomplete vs MappingState.failed logic in the mapping job (worker/jobs/variant_processing/mapping.py) should treat variants with intronic consequence as expected non-errors when computing the final score set mapping state
  • The SO term intron_variant is the standard VEP consequence term for intronic variants and is preferred for consistency

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendapp: mapperTask implementation touches the mappertype: enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions