Ancora is a focused news discovery and analysis tool designed to extract the "crux" of articles from major Indian news sources. It identifies the central, load-bearing claim of a piece rather than providing a simple summary.
- Multi-Source Discovery: Supports RSS (The Hindu) and HTML scraping (Scroll.in).
- Clean Extraction: Leverages
trafilaturafor high-quality body text extraction. - Crux Identification: Uses Google's Gemini API (
gemini-3-flash-preview) to isolate the core argument of an article in 2-3 sentences. - Modular Architecture: Separate modules for discovery and extraction logic.
- Python 3.10+
- Google Gemini API Key
- Clone the repository:
git clone https://github.com/harshafaik/ancora.git cd ancora - Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate - Install dependencies:
pip install -r requirements.txt
- Configure environment:
Create a
.envfile in the root directory:GOOGLE_API_KEY=your_gemini_api_key_here
Run the main script to discover and analyze the latest articles:
python main.pyDetailed project state and decisions can be found in docs/CONTEXT.md.