Skip to content

Abdulbasit110/urdu-tutor-agent

Repository files navigation

Urdu Tutor Voice Agent 🎓

An interactive voice-based Urdu language tutor built with OpenAI Agents SDK. Learn Urdu through natural voice conversations with an AI tutor that provides vocabulary lessons, pronunciation guidance, cultural context, and interactive quizzes.

Features

  • 🎤 Continuous Voice Interaction: Talk naturally with automatic speech detection - no need to press buttons!
  • 🤖 Smart Voice Activity Detection: Agent automatically detects when you start and stop speaking with configurable thresholds
  • 📚 Vocabulary Teaching: Learn words in categories like greetings, family, and numbers
  • 🗣️ Pronunciation Help: Get detailed pronunciation tips for Urdu words with phonetic guidance
  • 🎯 Interactive Quizzes: Test your knowledge with beginner, intermediate, and advanced difficulty levels
  • 🌍 Cultural Context: Understand the cultural background behind the language and social customs
  • 📖 Practice Phrases: Learn common Urdu phrases for daily conversation
  • 🎵 Natural Voice: Uses a friendly female voice (Sage) for natural language tutoring
  • 🌐 Multiple Interfaces: Command-line, modern web interface, and desktop UI options
  • ⚙️ Configurable Settings: Customize voice detection, audio quality, and tutor behavior via environment variables

Setup

Prerequisites

  1. Python 3.12+ installed on your system
  2. OpenAI API Key - Get one from OpenAI Platform
  3. Microphone and speakers for voice interaction

Installation

  1. Install dependencies:

    uv sync
  2. Set up environment variables: Create a .env file in the urdu-tutor-agent directory:

    # OpenAI API Configuration
    OPENAI_API_KEY=your_actual_openai_api_key_here
    
    # Voice Configuration (optional)
    DEFAULT_VOICE=sage
    SPEECH_RATE=1.0
    AUDIO_SAMPLE_RATE=24000
    
    # Voice Activity Detection (optional - for fine-tuning)
    SILENCE_THRESHOLD=0.005
    SPEECH_THRESHOLD=0.02
    MAX_SILENCE_DURATION=1.5
    MIN_SPEECH_DURATION=0.3
    
    # Tutor Configuration (optional)
    OPENAI_MODEL=gpt-4o-mini
  3. Replace your_actual_openai_api_key_here with your real OpenAI API key

Usage

Running the Agent

Option 1: Main Menu (choose your mode)

cd urdu-tutor-agent
python main.py

Option 2: Modern Web Interface (recommended) 🌟

cd urdu-tutor-agent
python launch_web.py

Option 3: Desktop UI Launch

cd urdu-tutor-agent
python launch_ui.py

Option 4: Using uv (if you prefer)

uv run main.py
# or for direct web UI:
uv run python launch_web.py
# or for desktop UI:
uv run python launch_ui.py

Available Modes

  1. Interactive Mode (Continuous Conversation): Just start speaking! The agent automatically detects when you start and stop talking
  2. Demo Mode: Test the agent with simulated input
  3. Desktop UI Mode: Clean tkinter-based desktop interface
  4. Web UI Mode (Recommended): Modern web-based interface with beautiful design

New in v2.0: Continuous Conversation! 🎉

  • No more pressing Enter to start recording
  • Automatic speech detection and silence recognition
  • Natural conversation flow
  • Configurable voice activity detection thresholds

New: Modern Web Interface! 🌐

  • Beautiful, modern design inspired by top voice assistants
  • Works in any web browser (Chrome, Firefox, Safari, Edge)
  • Large central microphone button for easy interaction
  • Real-time status updates and visual feedback
  • Gradient background and glassmorphism effects
  • Conversation history panel with chat-like interface
  • Responsive design that works on desktop and mobile
  • Professional branding with UrduGPT logo

Also Available: Desktop UI! 🖥️

  • Clean, minimal interface built with tkinter
  • Visual conversation history with timestamps
  • One-click start/stop conversation
  • Real-time status indicators
  • Clear instructions and easy-to-use controls

What You Can Do

  • Ask for vocabulary lessons: "Teach me some greetings in Urdu" or "Show me family words"
  • Request pronunciation help: "How do I pronounce شکریہ?" or "Help me with آداب"
  • Practice phrases: "Give me some phrases to practice" or "What are common Urdu expressions?"
  • Take quizzes: "Quiz me on beginner level words" or "Test me on intermediate vocabulary"
  • Learn culture: "Tell me about Urdu greetings culture" or "Explain family respect in Urdu culture"
  • Get cultural context: Ask about respect, family traditions, or language formality

Vocabulary Categories

  • Greetings: آداب (Adaab), السلام علیکم (Assalamu Alaikum), صبح بخیر (Subah Bakhair), شام بخیر (Shaam Bakhair), رات بخیر (Raat Bakhair)
  • Family: والد (Walid - Father), والدہ (Walida - Mother), بھائی (Bhai - Brother), بہن (Behan - Sister), دادا (Dada - Grandfather), دادی (Dadi - Grandmother)
  • Numbers: ایک (Aik - One), دو (Do - Two), تین (Teen - Three), چار (Char - Four), پانچ (Paanch - Five)

Available Learning Tools

  • teach_vocabulary(category): Learn vocabulary by category (greetings, family, numbers, or random)
  • practice_phrases(): Get common Urdu phrases with pronunciation and meaning
  • get_pronunciation_tip(word): Get detailed pronunciation guidance for specific words
  • quiz_me(difficulty): Take vocabulary quizzes with beginner, intermediate, or advanced difficulty
  • get_cultural_context(topic): Learn cultural background about greetings, family, respect, and more

Voice Activity Detection

The agent uses advanced voice activity detection with configurable thresholds:

  • Speech Detection: Automatically identifies when you start speaking
  • Silence Detection: Recognizes when you finish speaking to process your input
  • Adaptive Thresholds: Customizable sensitivity for different microphone setups
  • Background Noise Handling: Filters out ambient noise while preserving speech

Troubleshooting

Audio Issues

  • Microphone not detected: Make sure your microphone and speakers are working
  • No speech detected: Speak louder or closer to the microphone, adjust SPEECH_THRESHOLD in .env
  • Too sensitive: Increase SILENCE_THRESHOLD to reduce background noise sensitivity
  • Recording cuts off early: Increase MAX_SILENCE_DURATION for longer pauses
  • Audio device issues: Check that sounddevice can access your audio devices

API Issues

  • Authentication error: Verify your OpenAI API key is correct in .env file
  • Rate limits: Check that you have sufficient API credits and aren't exceeding rate limits
  • Network issues: Ensure stable internet connection for OpenAI API calls
  • Model errors: Try switching to gpt-4o-mini if using a different model

Installation Issues

  • Python version: Make sure you're using Python 3.12+ (python --version)
  • Dependencies: Try uv sync --refresh to reinstall dependencies
  • uv issues: Update uv with pip install -U uv
  • Audio libraries: On Linux, you may need: sudo apt-get install portaudio19-dev

Voice Activity Detection Issues

  • Speech not detected: Lower SPEECH_THRESHOLD (try 0.01)
  • Too much background noise: Raise SILENCE_THRESHOLD (try 0.01)
  • Recording doesn't stop: Lower MAX_SILENCE_DURATION (try 1.0)
  • Cuts off mid-sentence: Raise MAX_SILENCE_DURATION (try 2.5)

Example Conversation

🎓 Welcome to your Urdu Tutor Voice Agent! 🎓
🗣️ Continuous Conversation Mode
📝 Instructions:
   • Just start speaking - no need to press Enter!
   • The agent will automatically detect when you finish
   • Press Ctrl+C to exit
   • Say 'goodbye' or 'exit' to end the session

🤖 Agent: Welcome! I'm your Urdu tutor. What would you like to learn today?

--- Conversation Turn 1 ---
🎤 Listening... Start speaking when ready!
💡 Tip: The agent will automatically detect when you finish speaking
🔴 Speech detected! Recording...
[You say: "Hello, can you teach me some Urdu greetings?"]
🟢 Speech ended. Processing...
🤔 Thinking...
🔊 Playing response...
[Agent responds: "آداب! Let me teach you some beautiful Urdu greetings..."]

--- Conversation Turn 2 ---
🎤 Listening... Start speaking when ready!
[Conversation continues...]

Technologies Used

  • OpenAI Agents SDK: For intelligent voice conversation capabilities
  • OpenAI GPT-4o-mini: As the underlying language model
  • SoundDevice: For real-time audio recording and playback
  • NumPy & SciPy: For audio processing and voice activity detection
  • Flask & SocketIO: For the modern web interface
  • Tkinter: For the desktop GUI interface
  • Python-dotenv: For environment variable management

Contributing

We welcome contributions! Areas for enhancement:

  • Vocabulary expansion: Add more categories (colors, food, emotions, etc.)
  • Pronunciation guides: Improve phonetic representations
  • Cultural context: Add more cultural background information
  • Voice models: Experiment with different TTS voices
  • UI improvements: Enhance the web and desktop interfaces
  • Language support: Add support for other regional languages

License

This project uses the OpenAI Agents SDK. Please refer to OpenAI's terms of service for API usage and licensing.

About

Interactive Urdu Tutor Voice Agent — Learn Urdu through natural conversations, pronunciation help, cultural context, and quizzes. Built with OpenAI Agents SDK + Python.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors