Skip to content

abglnv/yolo-quantization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

YOLO Quantization → C++ Inference

Quantize YOLOv8 to INT8 with TensorRT (Python), run inference in C++.

Benchmarks

Tested on GTX 1650 4 GB, YOLOv8n, 640×640, 100 COCO val images.

Runtime Precision Mean latency P99 latency Throughput
PyTorch (Python) FP32 39.45 ms 78.78 ms 25.3 FPS
TensorRT (Python) INT8 14.44 ms 16.28 ms 69.2 FPS
TensorRT (C++) INT8 8.85 ms 12.57 ms 113.0 FPS

All three measure preprocess → forward → postprocess, excluding disk I/O, with 10 warmup passes.

  • INT8 TensorRT (Python) is 2.7× faster than FP32 PyTorch
  • INT8 TensorRT (C++) is 4.5× faster than FP32 PyTorch
  • C++ eliminates ~5.6 ms of Python/ultralytics pipeline overhead vs the Python TRT path

Project structure

├── python/
│   ├── quantize.py      # export .pt → .engine (INT8)
│   └── benchmark.py     # compare FP32 vs INT8 latency
├── cpp/
│   ├── CMakeLists.txt
│   ├── include/
│   │   ├── engine.hpp   # TensorRT engine wrapper
│   │   └── preprocess.hpp
│   └── src/
│       ├── main.cpp     # CLI inference app
│       ├── engine.cpp
│       └── preprocess.cpp
├── models/              # put .engine files here
├── data/
│   ├── calibration/     # ~100 representative images for INT8 calibration
│   └── test/            # images to benchmark on
└── pyproject.toml

Prerequisites

  • CUDA ≥ 12.4
  • TensorRT ≥ 10.0
  • OpenCV ≥ 4.6
  • CMake ≥ 3.18
  • uv (pip install uv or brew install uv)

Step 1 — Quantize (Python)

uv sync
# put ~100 representative images in data/calibration/
uv run python/quantize.py --model yolov8n.pt --data data/calibration/dataset.yaml
cp yolov8n.engine models/

Step 2 — Build C++ inference

mkdir cpp/build && cd cpp/build
cmake .. -DTRT_ROOT=/usr/local/tensorrt
make -j$(nproc)

Step 3 — Run inference

./cpp/build/infer models/yolov8n.engine data/test output/

Step 4 — Benchmark Python FP32 vs INT8

uv run python/benchmark.py --fp32 yolov8n.pt --int8 models/yolov8n.engine --images data/test

About

YOLOv8 INT8 quantization with TensorRT and C++ inference pipeline for edge deployment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors