Skip to content

Add indic-tts skill - Indian TTS for everyone#28

Open
ankitjh4 wants to merge 1 commit intozocomputer:mainfrom
ankitjh4:add-indic-tts
Open

Add indic-tts skill - Indian TTS for everyone#28
ankitjh4 wants to merge 1 commit intozocomputer:mainfrom
ankitjh4:add-indic-tts

Conversation

@ankitjh4
Copy link
Copy Markdown

@ankitjh4 ankitjh4 commented Mar 4, 2026

Summary

Adds comprehensive Indian AI toolkit using Sarvam AI - a Bangalore-based AI company building models specifically for Indian languages.

About Sarvam AI

Sarvam AI is an Indian AI company building foundational models and APIs optimized for Indian languages. They provide state-of-the-art models for speech and text processing in 10+ Indian languages including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, and English.

Get API key: https://dashboard.sarvam.ai (free tier available)

Features

1. Text-to-Speech (TTS)

  • Model: Bulbul v3
  • Languages: 11 Indian languages
  • Speakers: 30+ voices (male and female)
  • Natural prosody and pronunciation

2. Document Intelligence

  • Extract text from PDFs and images (JPEG/PNG)
  • 23 supported languages including Hindi, Bengali, Tamil, Telugu, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Odia, Urdu, Assamese, Bodo, Dogri, Kashmiri, Konkani, Maithili, Manipuri, Nepali, Sanskrit, Santali, Sindhi, English
  • Output formats: Markdown, HTML, JSON
  • Full workflow: create job → upload → process → download results

3. Text Processing

  • Chat/Completion (sarvam-m model) - OpenAI-compatible API
  • Translation (mayura:v1, sarvam-translate:v1) - 23 Indian languages
  • Transliteration - Convert between scripts (e.g., Devanagari ↔ Roman)
  • Language Detection - Auto-detect language of input text

4. Speech-to-Text with Translation

Three modes for different use cases:

  • REST API - Quick transcription for audio < 30 seconds
  • WebSocket - Real-time streaming transcription with 4 output modes (translated text, original transcript, both, or bilingual)
  • Batch Processing - Process multiple audio files with speaker diarization for meeting transcription

Scripts

  • tts.py - Text-to-speech conversion
  • document_intelligence.py - PDF/image OCR extraction
  • text_processing.py - Chat, translation, transliteration, language detection
  • speech_to_text.py - STT via REST, WebSocket, or Batch

Setup

Add SARVAM_API_KEY to Zo secrets at Settings > Advanced

Changes

  • Added skills/sarvam-ai/ folder with SKILL.md and scripts/
  • 4 comprehensive Python scripts covering all Sarvam APIs
  • Full documentation with usage examples
  • Updated manifest.json with skill metadata

- Added sarvam-ai based TTS skill supporting 11 Indian languages
- Includes API key enforcement via SARVAM_API_KEY secret
- Features 30+ voices with Bulbul v3 model
- Closes: adding Indian language TTS support to Zo skills
@skeletor-js
Copy link
Copy Markdown
Collaborator

Please clean up the packaging and resubmit.

  • Put the skill in the correct registry structure. Right now the PR layout is not clean.
  • Remove the stray SKILL.md.bak file.
  • Revert the unrelated manifest.json churn and keep the diff scoped to the actual skill.
  • Keep the setup instructions aligned with Zo Secrets. Do not drift into generic env-export patterns.
  • Make sure the PR only includes the files required for this skill and nothing else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants