🚀 Feature Request: Migrate Codebase from Pandas to Polars & Integrate MongoDB for Faster Analysis

The current codebase uses **pandas** for data processing and relies on local or flat-file storage. As our data volume grows, performance has degraded, leading to slower analysis and longer runtimes.

---

### Describe the solution you'd like

- **Migrate data analysis code from `pandas` to [`polars`](https://pola-rs.github.io/polars/py-polars/html/):**
  - Polars offers much faster, multi-threaded data frame operations.
  - Update all scripts, modules, and notebooks to use `polars` syntax and idioms.
  - Ensure output and results remain consistent.

- **Integrate [`MongoDB`](https://www.mongodb.com/)** as the primary data source and sink:
  - Move relevant data storage/loading from CSV/Parquet/Excel to MongoDB collections.
  - Refactor data ingestion/extraction logic to use `pymongo` or appropriate async libraries.
  - Benchmark performance improvements for common analysis tasks.

---

### Describe alternatives you've considered

- Keeping pandas but optimizing with Dask or Vaex (still not as fast as Polars for most operations).
- Using a SQL database, but MongoDB offers more flexibility for semi-structured data.

---

### Additional context

- Existing pandas code is located in: `src/data_analysis/`
- We rely on reading/writing large CSVs and DataFrames (often 1M+ rows).
- Please ensure all tests pass and update documentation/examples as needed.

---

### Tasks Checklist

- [ ] Inventory all pandas usages and data-loading code
- [ ] Convert scripts and modules to use `polars`
- [ ] Replace local file I/O with MongoDB queries where appropriate
- [ ] Add/modify tests to cover new code paths
- [ ] Update README and any usage docs
- [ ] Provide before/after benchmarks (runtime, memory usage)

---

### References

- [Polars Migration Guide](https://pola-rs.github.io/polars/py-polars/html/reference/migration.html)
- [PyPolars Documentation](https://pola-rs.github.io/polars/py-polars/html/)
- [MongoDB Python Driver (`pymongo`)](https://pymongo.readthedocs.io/en/stable/)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Feature Request: Migrate Codebase from Pandas to Polars & Integrate MongoDB for Faster Analysis #10

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Tasks Checklist

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

🚀 Feature Request: Migrate Codebase from Pandas to Polars & Integrate MongoDB for Faster Analysis #10

Description

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Tasks Checklist

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions