[FEATURE] Multimodal LLM Support — Image and document understanding

## Description

Extend LLM capabilities to handle multimodal inputs including images, PDFs,
charts, and structured documents for richer analysis.

## Scope

Build multimodal processing pipeline for document understanding and visual analysis.

## Files to Touch/Create

- `astroml/llm/multimodal/__init__.py`
- `astroml/llm/multimodal/vision.py` — Vision model integration (GPT-4V, Claude)
- `astroml/llm/multimodal/ocr.py` — OCR for documents and images
- `astroml/llm/multimodal/charts.py` — Chart and graph understanding
- `astroml/llm/multimodal/processors.py` — Image preprocessing
- `astroml/llm/multimodal/prompts.py` — Multimodal prompt templates
- `api/routers/multimodal.py` — Multimodal API endpoints

## Supported Inputs

1. **Images**:
   - Transaction receipts
   - ID documents (KYC)
   - Screenshots of fraud alerts

2. **Documents**:
   - PDFs (invoices, reports)
   - Scanned documents
   - Excel/CSV files

3. **Charts**:
   - Model performance charts
   - Transaction volume graphs
   - Financial statements

## Implementation Details

- GPT-4V or Claude 3 for vision tasks
- Tesseract or enterprise OCR for text extraction
- Image resizing and format conversion
- Prompt engineering for vision tasks
- Caching of extracted text/descriptions

## Acceptance Criteria

- Image classification accuracy >90%
- OCR text extraction accuracy >95%
- Chart data extraction matches ground truth
- Processing time <3s per image
- Supports common formats (PNG, JPG, PDF)
- Multimodal prompts work with text-only fallback

## Use Cases

- Automated KYC document verification
- Receipt scanning for loyalty points
- Fraud evidence analysis (screenshots)
- Chart interpretation for reports

## Labels

`enhancement`, `llm`, `multimodal`, `vision`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Multimodal LLM Support — Image and document understanding #469

Description

Scope

Files to Touch/Create

Supported Inputs

Implementation Details

Acceptance Criteria

Use Cases

Labels

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Multimodal LLM Support — Image and document understanding #469

Description

Description

Scope

Files to Touch/Create

Supported Inputs

Implementation Details

Acceptance Criteria

Use Cases

Labels

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions