Skip to content

feat: Target Gene Mapping Table#719

Merged
bencap merged 4 commits intorelease-2026.2.0from
feature/bencap/mapping-table
May 6, 2026
Merged

feat: Target Gene Mapping Table#719
bencap merged 4 commits intorelease-2026.2.0from
feature/bencap/mapping-table

Conversation

@bencap
Copy link
Copy Markdown
Collaborator

@bencap bencap commented Apr 30, 2026

This pull request introduces a new per-(target gene, alignment level) mapping QC and provenance model, refactors the mapping library for better modularity, and updates the database schema and ORM models to support richer mapping provenance and annotation. The changes enable more detailed tracking of variant mapping quality and provenance, and lay the groundwork for improved downstream analysis and data integrity.

Database schema and model enhancements:

  • Added a new target_gene_mappings table to store per-(target gene, alignment level) QC and provenance information, and extended the mapped_variants table with new columns (target_gene_mapping_id, alignment_level, at_mismatched_locus, near_gap) to link variants to their mapping QC and annotation details.
  • Introduced the TargetGeneMapping SQLAlchemy model and established relationships from TargetGene and MappedVariant to TargetGeneMapping for ORM-level access to mapping QC records. [1] [2] [3]
  • Added the AnnotationLayer enum to standardize annotation layer values and provide translation from dcd-mapping wire codes.

Mapping library refactor:

  • Refactored the mapping library into submodules (client.py, constants.py, metadata.py, schema.py), with a new public API in mapping/__init__.py for backward compatibility. This modularizes code for maintainability and clarity. [1] [2] [3] [4] [5]
  • Updated the mapping API client and schema definitions to match the new dcd-mapping payload structure, including TypedDicts for wire-format documentation and validation. [1] [2]

API and script updates:

  • Updated the mapped variant API endpoint to return the new MappedVariantWithMappingDetails model, exposing richer mapping QC and provenance information.
  • Changed job execution logic to commit the database session after job creation, improving transactional integrity.

Other improvements:

  • Added target_gene_mapping to the public model exports for easier access in other modules.
  • Cleaned up and removed the legacy monolithic mapping module, splitting logic into focused submodules.

These changes collectively provide a robust foundation for tracking, querying, and analyzing variant mapping provenance and quality throughout the application.

@bencap
Copy link
Copy Markdown
Collaborator Author

bencap commented Apr 30, 2026

API support for VariantEffect/dcd_mapping2#97

@bencap bencap force-pushed the feature/bencap/627/job-traceability branch from 23cc43b to 97579c4 Compare April 30, 2026 15:53
@bencap bencap force-pushed the feature/bencap/mapping-table branch 11 times, most recently from d40d913 to ddc7ec3 Compare May 4, 2026 16:20
@bencap bencap requested review from jstone-dev and sallybg May 5, 2026 18:31
@bencap bencap marked this pull request as ready for review May 5, 2026 18:31
@bencap bencap force-pushed the feature/bencap/mapping-table branch 3 times, most recently from 7c372c3 to 4ad8368 Compare May 6, 2026 17:43
bencap and others added 3 commits May 6, 2026 11:19
…arated concerns

Move VRSMap client code, type schemas, metadata utilities, and constants
into separate modules within a mapping package. Maintain backward
compatibility through re-exports in __init__.py so existing imports
continue to work without changes.

Co-authored-by: Copilot <copilot@github.com>
…) QC and provenance

Add a new `target_gene_mappings` table that records alignment QC and provenance
for each (target gene, annotation layer) pair produced by dcd-mapping. Replaces
flat QC fields on `mapped_variants` with a normalized FK relationship.

- Add `TargetGeneMapping` model, view model, and `AnnotationLayer` enum
- Extend `MappedVariant` with `target_gene_mapping_id`, `alignment_level`,
  `at_mismatched_locus`, and `near_gap` columns
- Update mapping worker to persist `TargetGeneMapping` rows and link variants
- Add Alembic migration (`8c4a2f1d9e6b`) for schema changes
- Add manual backfill script to populate new columns for existing mapped variants
- Drop `variants_failed_pre_layer_selection` and `variants_with_mapping_warnings`
  QC counts from the schema (not recoverable for existing data)

Co-authored-by: Copilot <copilot@github.com>
@bencap bencap force-pushed the feature/bencap/mapping-table branch from 4ad8368 to 9ba16ea Compare May 6, 2026 18:19
…d fallback

Replace `.get()` default parameter with `or` chaining to satisfy type
checking and add UUID fallback for cases where correlation_id is
unavailable in both pipeline_params and logging context. This improves
type safety and ensures all pipelines have a correlation_id for better
traceability in logs and external systems.
Base automatically changed from feature/bencap/627/job-traceability to release-2026.2.0 May 6, 2026 18:41
@bencap bencap merged commit 29e3e87 into release-2026.2.0 May 6, 2026
6 checks passed
@bencap bencap deleted the feature/bencap/mapping-table branch May 6, 2026 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant