Skip to content

shani-lm/ASR--whisper-transcript

Repository files navigation

Whisper Transcript API

A FastAPI-based web service that generates transcripts and VTT subtitle files from audio and video files using OpenAI's Whisper model.

Features

  • 🎵 Support for multiple audio/video formats (MP3, MP4, WAV, M4A, FLAC, etc.)
  • 📝 Generate text transcripts
  • 🎬 Generate VTT subtitle files with timestamps
  • 🌍 Multi-language support with auto-detection
  • 🔄 Translation to English
  • ⚡ Multiple Whisper model sizes (tiny to large)
  • 📊 Detailed API responses with timestamps and segments
  • 🔍 Health check and model information endpoints

Installation

  1. Clone or create the project directory:

    cd whisper-transcript
  2. Create a virtual environment (recommended):

    python -m venv venv
    # On Windows:
    venv\Scripts\activate
    # On macOS/Linux:
    source venv/bin/activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Install FFmpeg (required for audio/video processing):

    Windows:

    macOS:

    brew install ffmpeg

    Linux:

    sudo apt update
    sudo apt install ffmpeg

Usage

Starting the Server

python main.py

The API will be available at http://localhost:8000

API Documentation

Once the server is running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Endpoints

1. Health Check

GET /health

2. Get Available Models

GET /models

3. Transcribe Audio/Video

POST /transcribe

Parameters:

  • file: Audio/video file (multipart/form-data)
  • language: Language code (optional, auto-detect if not provided)
  • task: "transcribe" or "translate" (default: "transcribe")
  • return_timestamps: Boolean (default: true)
  • return_vtt: Boolean (default: true)

Example Response:

{
  "filename": "audio.mp3",
  "language": "en",
  "text": "Hello, this is a test transcription...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a test"
    }
  ],
  "vtt": "WEBVTT\n\n1\n00:00:00.000 --> 00:00:02.500\nHello, this is a test\n\n",
  "timestamp": "2024-01-01T12:00:00"
}

4. Transcribe with File Output

POST /transcribe-with-files

Creates separate files for transcript, VTT, and full JSON results.

Testing the API

Use the provided test script:

python test_api.py

Or test with cURL:

curl -X POST "http://localhost:8000/transcribe" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your_audio_file.mp3" \
  -F "language=en" \
  -F "return_vtt=true"

Supported File Formats

  • Audio: MP3, WAV, M4A, FLAC, AAC, OGG
  • Video: MP4, AVI, MOV, MKV, WebM

Model Sizes

  • tiny: Fastest, least accurate (~32x realtime)
  • base: Good balance of speed and accuracy (~16x realtime) - Default
  • small: Better accuracy, slower (~6x realtime)
  • medium: High accuracy (~2x realtime)
  • large: Best accuracy, slowest (~1x realtime)

You can change the model size by modifying the MODEL_SIZE variable in main.py.

Configuration

Environment Variables

You can set these environment variables:

  • WHISPER_MODEL_SIZE: Model size (default: "base")
  • MAX_FILE_SIZE_MB: Maximum file size in MB (default: 25)
  • API_HOST: Host to bind to (default: "0.0.0.0")
  • API_PORT: Port to bind to (default: 8000)

Example with Environment Variables

# Windows PowerShell
$env:WHISPER_MODEL_SIZE="small"
$env:MAX_FILE_SIZE_MB="50"
python main.py

# Linux/macOS
export WHISPER_MODEL_SIZE="small"
export MAX_FILE_SIZE_MB="50"
python main.py

Docker Support (Optional)

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["python", "main.py"]

Build and run:

docker build -t whisper-api .
docker run -p 8000:8000 whisper-api

Performance Notes

  • First request may be slower as the Whisper model loads
  • Model loading time depends on the selected model size
  • GPU acceleration is automatically used if available (CUDA/Metal)
  • Consider using smaller models for real-time applications

Troubleshooting

  1. FFmpeg not found: Ensure FFmpeg is installed and in your PATH
  2. CUDA out of memory: Use a smaller model size or reduce batch size
  3. File too large: Increase MAX_FILE_SIZE_MB or compress your file
  4. Import errors: Ensure all dependencies are installed correctly

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors