Description
Use LLMs to automatically label and annotate data for ML model training,
reducing manual labeling effort while maintaining quality.
Scope
Build automated data labeling pipeline with LLM validation.
Files to Touch/Create
astroml/llm/labeling/__init__.py
astroml/llm/labeling/labeler.py — Core labeling logic
astroml/llm/labeling/schemas.py — Label schema definitions
astroml/llm/labeling/validators.py — Label validation
astroml/llm/labeling/consensus.py — Multi-LLM consensus
astroml/llm/labeling/human.py — Human-in-the-loop integration
astroml/tasks/labeling.py — Batch labeling worker
Labeling Tasks
- Transaction Classification: fraud/suspicious/legitimate
- Alert Categorization: pattern type, severity
- Entity Resolution: match accounts across sources
- Sentiment Analysis: user feedback categorization
- Named Entity Recognition: extract entities from text
Implementation Details
- Prompt engineering for consistent labeling
- Confidence scoring per label
- Multi-LLM voting for uncertain cases
- Human review queue for low-confidence labels
- Active learning: prioritize informative samples
- Label versioning and audit trail
Acceptance Criteria
- Label accuracy >85% without human review
- Labeling throughput >1000 items/hour
- Cost <$0.01 per item
- Low-confidence items routed to human review
- Consensus improves accuracy to >95%
- Complete audit trail of all labels
Quality Assurance
- Inter-LLM agreement metrics
- Random sample human review
- Drift detection for label distributions
- Feedback loop from model performance
Labels
enhancement, llm, data-labeling, ml
Description
Use LLMs to automatically label and annotate data for ML model training,
reducing manual labeling effort while maintaining quality.
Scope
Build automated data labeling pipeline with LLM validation.
Files to Touch/Create
astroml/llm/labeling/__init__.pyastroml/llm/labeling/labeler.py— Core labeling logicastroml/llm/labeling/schemas.py— Label schema definitionsastroml/llm/labeling/validators.py— Label validationastroml/llm/labeling/consensus.py— Multi-LLM consensusastroml/llm/labeling/human.py— Human-in-the-loop integrationastroml/tasks/labeling.py— Batch labeling workerLabeling Tasks
Implementation Details
Acceptance Criteria
Quality Assurance
Labels
enhancement,llm,data-labeling,ml