An interactive voice-based Urdu language tutor built with OpenAI Agents SDK. Learn Urdu through natural voice conversations with an AI tutor that provides vocabulary lessons, pronunciation guidance, cultural context, and interactive quizzes.
- 🎤 Continuous Voice Interaction: Talk naturally with automatic speech detection - no need to press buttons!
- 🤖 Smart Voice Activity Detection: Agent automatically detects when you start and stop speaking with configurable thresholds
- 📚 Vocabulary Teaching: Learn words in categories like greetings, family, and numbers
- 🗣️ Pronunciation Help: Get detailed pronunciation tips for Urdu words with phonetic guidance
- 🎯 Interactive Quizzes: Test your knowledge with beginner, intermediate, and advanced difficulty levels
- 🌍 Cultural Context: Understand the cultural background behind the language and social customs
- 📖 Practice Phrases: Learn common Urdu phrases for daily conversation
- 🎵 Natural Voice: Uses a friendly female voice (Sage) for natural language tutoring
- 🌐 Multiple Interfaces: Command-line, modern web interface, and desktop UI options
- ⚙️ Configurable Settings: Customize voice detection, audio quality, and tutor behavior via environment variables
- Python 3.12+ installed on your system
- OpenAI API Key - Get one from OpenAI Platform
- Microphone and speakers for voice interaction
-
Install dependencies:
uv sync
-
Set up environment variables: Create a
.envfile in theurdu-tutor-agentdirectory:# OpenAI API Configuration OPENAI_API_KEY=your_actual_openai_api_key_here # Voice Configuration (optional) DEFAULT_VOICE=sage SPEECH_RATE=1.0 AUDIO_SAMPLE_RATE=24000 # Voice Activity Detection (optional - for fine-tuning) SILENCE_THRESHOLD=0.005 SPEECH_THRESHOLD=0.02 MAX_SILENCE_DURATION=1.5 MIN_SPEECH_DURATION=0.3 # Tutor Configuration (optional) OPENAI_MODEL=gpt-4o-mini
-
Replace
your_actual_openai_api_key_herewith your real OpenAI API key
cd urdu-tutor-agent
python main.pycd urdu-tutor-agent
python launch_web.pycd urdu-tutor-agent
python launch_ui.pyuv run main.py
# or for direct web UI:
uv run python launch_web.py
# or for desktop UI:
uv run python launch_ui.py- Interactive Mode (Continuous Conversation): Just start speaking! The agent automatically detects when you start and stop talking
- Demo Mode: Test the agent with simulated input
- Desktop UI Mode: Clean tkinter-based desktop interface
- Web UI Mode (Recommended): Modern web-based interface with beautiful design
- No more pressing Enter to start recording
- Automatic speech detection and silence recognition
- Natural conversation flow
- Configurable voice activity detection thresholds
- Beautiful, modern design inspired by top voice assistants
- Works in any web browser (Chrome, Firefox, Safari, Edge)
- Large central microphone button for easy interaction
- Real-time status updates and visual feedback
- Gradient background and glassmorphism effects
- Conversation history panel with chat-like interface
- Responsive design that works on desktop and mobile
- Professional branding with UrduGPT logo
- Clean, minimal interface built with tkinter
- Visual conversation history with timestamps
- One-click start/stop conversation
- Real-time status indicators
- Clear instructions and easy-to-use controls
- Ask for vocabulary lessons: "Teach me some greetings in Urdu" or "Show me family words"
- Request pronunciation help: "How do I pronounce شکریہ?" or "Help me with آداب"
- Practice phrases: "Give me some phrases to practice" or "What are common Urdu expressions?"
- Take quizzes: "Quiz me on beginner level words" or "Test me on intermediate vocabulary"
- Learn culture: "Tell me about Urdu greetings culture" or "Explain family respect in Urdu culture"
- Get cultural context: Ask about respect, family traditions, or language formality
- Greetings: آداب (Adaab), السلام علیکم (Assalamu Alaikum), صبح بخیر (Subah Bakhair), شام بخیر (Shaam Bakhair), رات بخیر (Raat Bakhair)
- Family: والد (Walid - Father), والدہ (Walida - Mother), بھائی (Bhai - Brother), بہن (Behan - Sister), دادا (Dada - Grandfather), دادی (Dadi - Grandmother)
- Numbers: ایک (Aik - One), دو (Do - Two), تین (Teen - Three), چار (Char - Four), پانچ (Paanch - Five)
teach_vocabulary(category): Learn vocabulary by category (greetings, family, numbers, or random)practice_phrases(): Get common Urdu phrases with pronunciation and meaningget_pronunciation_tip(word): Get detailed pronunciation guidance for specific wordsquiz_me(difficulty): Take vocabulary quizzes with beginner, intermediate, or advanced difficultyget_cultural_context(topic): Learn cultural background about greetings, family, respect, and more
The agent uses advanced voice activity detection with configurable thresholds:
- Speech Detection: Automatically identifies when you start speaking
- Silence Detection: Recognizes when you finish speaking to process your input
- Adaptive Thresholds: Customizable sensitivity for different microphone setups
- Background Noise Handling: Filters out ambient noise while preserving speech
- Microphone not detected: Make sure your microphone and speakers are working
- No speech detected: Speak louder or closer to the microphone, adjust
SPEECH_THRESHOLDin.env - Too sensitive: Increase
SILENCE_THRESHOLDto reduce background noise sensitivity - Recording cuts off early: Increase
MAX_SILENCE_DURATIONfor longer pauses - Audio device issues: Check that
sounddevicecan access your audio devices
- Authentication error: Verify your OpenAI API key is correct in
.envfile - Rate limits: Check that you have sufficient API credits and aren't exceeding rate limits
- Network issues: Ensure stable internet connection for OpenAI API calls
- Model errors: Try switching to
gpt-4o-miniif using a different model
- Python version: Make sure you're using Python 3.12+ (
python --version) - Dependencies: Try
uv sync --refreshto reinstall dependencies - uv issues: Update uv with
pip install -U uv - Audio libraries: On Linux, you may need:
sudo apt-get install portaudio19-dev
- Speech not detected: Lower
SPEECH_THRESHOLD(try 0.01) - Too much background noise: Raise
SILENCE_THRESHOLD(try 0.01) - Recording doesn't stop: Lower
MAX_SILENCE_DURATION(try 1.0) - Cuts off mid-sentence: Raise
MAX_SILENCE_DURATION(try 2.5)
🎓 Welcome to your Urdu Tutor Voice Agent! 🎓
🗣️ Continuous Conversation Mode
📝 Instructions:
• Just start speaking - no need to press Enter!
• The agent will automatically detect when you finish
• Press Ctrl+C to exit
• Say 'goodbye' or 'exit' to end the session
🤖 Agent: Welcome! I'm your Urdu tutor. What would you like to learn today?
--- Conversation Turn 1 ---
🎤 Listening... Start speaking when ready!
💡 Tip: The agent will automatically detect when you finish speaking
🔴 Speech detected! Recording...
[You say: "Hello, can you teach me some Urdu greetings?"]
🟢 Speech ended. Processing...
🤔 Thinking...
🔊 Playing response...
[Agent responds: "آداب! Let me teach you some beautiful Urdu greetings..."]
--- Conversation Turn 2 ---
🎤 Listening... Start speaking when ready!
[Conversation continues...]
- OpenAI Agents SDK: For intelligent voice conversation capabilities
- OpenAI GPT-4o-mini: As the underlying language model
- SoundDevice: For real-time audio recording and playback
- NumPy & SciPy: For audio processing and voice activity detection
- Flask & SocketIO: For the modern web interface
- Tkinter: For the desktop GUI interface
- Python-dotenv: For environment variable management
We welcome contributions! Areas for enhancement:
- Vocabulary expansion: Add more categories (colors, food, emotions, etc.)
- Pronunciation guides: Improve phonetic representations
- Cultural context: Add more cultural background information
- Voice models: Experiment with different TTS voices
- UI improvements: Enhance the web and desktop interfaces
- Language support: Add support for other regional languages
This project uses the OpenAI Agents SDK. Please refer to OpenAI's terms of service for API usage and licensing.