Whisper Transcript API

A FastAPI-based web service that generates transcripts and VTT subtitle files from audio and video files using OpenAI's Whisper model.

Features

🎵 Support for multiple audio/video formats (MP3, MP4, WAV, M4A, FLAC, etc.)
📝 Generate text transcripts
🎬 Generate VTT subtitle files with timestamps
🌍 Multi-language support with auto-detection
🔄 Translation to English
⚡ Multiple Whisper model sizes (tiny to large)
📊 Detailed API responses with timestamps and segments
🔍 Health check and model information endpoints

Installation

Clone or create the project directory:
```
cd whisper-transcript
```

Create a virtual environment (recommended):

python -m venv venv
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Install FFmpeg (required for audio/video processing):

Windows:
- Download from https://ffmpeg.org/download.html
- Add to PATH environment variable
macOS:
```
brew install ffmpeg
```
Linux:
```
sudo apt update
sudo apt install ffmpeg
```

Usage

Starting the Server

python main.py

The API will be available at http://localhost:8000

API Documentation

Once the server is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Endpoints

1. Health Check

GET /health

2. Get Available Models

GET /models

3. Transcribe Audio/Video

POST /transcribe

Parameters:

file: Audio/video file (multipart/form-data)
language: Language code (optional, auto-detect if not provided)
task: "transcribe" or "translate" (default: "transcribe")
return_timestamps: Boolean (default: true)
return_vtt: Boolean (default: true)

Example Response:

{
  "filename": "audio.mp3",
  "language": "en",
  "text": "Hello, this is a test transcription...",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 2.5,
      "text": "Hello, this is a test"
    }
  ],
  "vtt": "WEBVTT\n\n1\n00:00:00.000 --> 00:00:02.500\nHello, this is a test\n\n",
  "timestamp": "2024-01-01T12:00:00"
}

4. Transcribe with File Output

POST /transcribe-with-files

Creates separate files for transcript, VTT, and full JSON results.

Testing the API

Use the provided test script:

python test_api.py

Or test with cURL:

curl -X POST "http://localhost:8000/transcribe" \
  -H "accept: application/json" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@your_audio_file.mp3" \
  -F "language=en" \
  -F "return_vtt=true"

Supported File Formats

Audio: MP3, WAV, M4A, FLAC, AAC, OGG
Video: MP4, AVI, MOV, MKV, WebM

Model Sizes

tiny: Fastest, least accurate (~32x realtime)
base: Good balance of speed and accuracy (~16x realtime) - Default
small: Better accuracy, slower (~6x realtime)
medium: High accuracy (~2x realtime)
large: Best accuracy, slowest (~1x realtime)

You can change the model size by modifying the MODEL_SIZE variable in main.py.

Configuration

Environment Variables

You can set these environment variables:

WHISPER_MODEL_SIZE: Model size (default: "base")
MAX_FILE_SIZE_MB: Maximum file size in MB (default: 25)
API_HOST: Host to bind to (default: "0.0.0.0")
API_PORT: Port to bind to (default: 8000)

Example with Environment Variables

# Windows PowerShell
$env:WHISPER_MODEL_SIZE="small"
$env:MAX_FILE_SIZE_MB="50"
python main.py

# Linux/macOS
export WHISPER_MODEL_SIZE="small"
export MAX_FILE_SIZE_MB="50"
python main.py

Docker Support (Optional)

Create a Dockerfile:

FROM python:3.9-slim

WORKDIR /app

# Install system dependencies
RUN apt-get update && apt-get install -y \
    ffmpeg \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .

EXPOSE 8000

CMD ["python", "main.py"]

Build and run:

docker build -t whisper-api .
docker run -p 8000:8000 whisper-api

Performance Notes

First request may be slower as the Whisper model loads
Model loading time depends on the selected model size
GPU acceleration is automatically used if available (CUDA/Metal)
Consider using smaller models for real-time applications

Troubleshooting

FFmpeg not found: Ensure FFmpeg is installed and in your PATH
CUDA out of memory: Use a smaller model size or reduce batch size
File too large: Increase MAX_FILE_SIZE_MB or compress your file
Import errors: Ensure all dependencies are installed correctly

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.vscode		.vscode
.gitignore		.gitignore
README.md		README.md
install_ffmpeg.bat		install_ffmpeg.bat
main.py		main.py
requirements.txt		requirements.txt
run.bat		run.bat
test_api.py		test_api.py
test_basic.py		test_basic.py
test_interface.html		test_interface.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Transcript API

Features

Installation

Usage

Starting the Server

API Documentation

Endpoints

1. Health Check

2. Get Available Models

3. Transcribe Audio/Video

4. Transcribe with File Output

Testing the API

Supported File Formats

Model Sizes

Configuration

Environment Variables

Example with Environment Variables

Docker Support (Optional)

Performance Notes

Troubleshooting

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Transcript API

Features

Installation

Usage

Starting the Server

API Documentation

Endpoints

1. Health Check

2. Get Available Models

3. Transcribe Audio/Video

4. Transcribe with File Output

Testing the API

Supported File Formats

Model Sizes

Configuration

Environment Variables

Example with Environment Variables

Docker Support (Optional)

Performance Notes

Troubleshooting

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages