Summary
The eutils package is effectively unmaintained and has known bugs, see biocommons/eutils#205. BioPython provides equivalent and superior functionality for NCBI Entrez database access. This issue tracks the migration away from eutils to BioPython to improve maintainability and code quality.
Problem
- eutils is unmaintained: Aside from limited recent work, the package has been effectively abandoned since 2014.
- Known bugs: eutils has documented bugs which remain unfixed but are not a problem in Biopython.
- BioPython is superior: BioPython provides comprehensive Entrez API access, active maintenance, broader ecosystem integration, and better documentation.
Proposed behavior
- Identify all uses of eutils in the MaveDB API codebase
- Refactor any eutils consumers in this project to use Biopython
- Remove the eutils dependency from pyproject.toml
- Verify all tests pass and functionality is preserved
- Document any API changes in relevant files
Acceptance criteria
Implementation notes
- Start by searching the codebase for
import eutils and from eutils to identify all usage points
- The primary use case appears to be NCBI Entrez queries, which BioPython handles via
Bio.Entrez.esearch(), Bio.Entrez.efetch(), Bio.Entrez.elink(), and other e-utility functions
- BioPython uses dictionary/list output from XML parsing rather than the OO facades that eutils provides; this may require minor refactoring but is straightforward
- Consider whether any helper functions would benefit from being wrapped to simplify calling code, but keep these lightweight
- BioPython respects NCBI's rate limiting (3 requests/sec by default, 10 req/sec with API key), same as eutils
- Add BioPython as a dependency if not already present:
biopython package with version constraint matching the Python 3.11+ requirement
Summary
The eutils package is effectively unmaintained and has known bugs, see biocommons/eutils#205. BioPython provides equivalent and superior functionality for NCBI Entrez database access. This issue tracks the migration away from eutils to BioPython to improve maintainability and code quality.
Problem
Proposed behavior
Acceptance criteria
Implementation notes
import eutilsandfrom eutilsto identify all usage pointsBio.Entrez.esearch(),Bio.Entrez.efetch(),Bio.Entrez.elink(), and other e-utility functionsbiopythonpackage with version constraint matching the Python 3.11+ requirement