Skip to content

Deprecate and remove eutils dependency in favor of BioPython #725

@bencap

Description

@bencap

Summary

The eutils package is effectively unmaintained and has known bugs, see biocommons/eutils#205. BioPython provides equivalent and superior functionality for NCBI Entrez database access. This issue tracks the migration away from eutils to BioPython to improve maintainability and code quality.

Problem

  • eutils is unmaintained: Aside from limited recent work, the package has been effectively abandoned since 2014.
  • Known bugs: eutils has documented bugs which remain unfixed but are not a problem in Biopython.
  • BioPython is superior: BioPython provides comprehensive Entrez API access, active maintenance, broader ecosystem integration, and better documentation.

Proposed behavior

  1. Identify all uses of eutils in the MaveDB API codebase
  2. Refactor any eutils consumers in this project to use Biopython
  3. Remove the eutils dependency from pyproject.toml
  4. Verify all tests pass and functionality is preserved
  5. Document any API changes in relevant files

Acceptance criteria

  • All references to eutils have been removed from the codebase
  • eutils is no longer listed as a dependency in pyproject.toml
  • Any code that previously used eutils now uses BioPython's Bio.Entrez API
  • All existing tests pass without modification to test expectations
  • No functional regressions in NCBI-related operations
  • Code review confirms that BioPython usage follows best practices

Implementation notes

  • Start by searching the codebase for import eutils and from eutils to identify all usage points
  • The primary use case appears to be NCBI Entrez queries, which BioPython handles via Bio.Entrez.esearch(), Bio.Entrez.efetch(), Bio.Entrez.elink(), and other e-utility functions
  • BioPython uses dictionary/list output from XML parsing rather than the OO facades that eutils provides; this may require minor refactoring but is straightforward
  • Consider whether any helper functions would benefit from being wrapped to simplify calling code, but keep these lightweight
  • BioPython respects NCBI's rate limiting (3 requests/sec by default, 10 req/sec with API key), same as eutils
  • Add BioPython as a dependency if not already present: biopython package with version constraint matching the Python 3.11+ requirement

Metadata

Metadata

Assignees

No one assigned

    Labels

    app: backendTask implementation touches the backendtype: maintenanceMaintaining this project

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions