YOLO Quantization → C++ Inference

Quantize YOLOv8 to INT8 with TensorRT (Python), run inference in C++.

Benchmarks

Tested on GTX 1650 4 GB, YOLOv8n, 640×640, 100 COCO val images.

Runtime	Precision	Mean latency	P99 latency	Throughput
PyTorch (Python)	FP32	39.45 ms	78.78 ms	25.3 FPS
TensorRT (Python)	INT8	14.44 ms	16.28 ms	69.2 FPS
TensorRT (C++)	INT8	8.85 ms	12.57 ms	113.0 FPS

All three measure preprocess → forward → postprocess, excluding disk I/O, with 10 warmup passes.

INT8 TensorRT (Python) is 2.7× faster than FP32 PyTorch
INT8 TensorRT (C++) is 4.5× faster than FP32 PyTorch
C++ eliminates ~5.6 ms of Python/ultralytics pipeline overhead vs the Python TRT path

Project structure

├── python/
│   ├── quantize.py      # export .pt → .engine (INT8)
│   └── benchmark.py     # compare FP32 vs INT8 latency
├── cpp/
│   ├── CMakeLists.txt
│   ├── include/
│   │   ├── engine.hpp   # TensorRT engine wrapper
│   │   └── preprocess.hpp
│   └── src/
│       ├── main.cpp     # CLI inference app
│       ├── engine.cpp
│       └── preprocess.cpp
├── models/              # put .engine files here
├── data/
│   ├── calibration/     # ~100 representative images for INT8 calibration
│   └── test/            # images to benchmark on
└── pyproject.toml

Prerequisites

CUDA ≥ 12.4
TensorRT ≥ 10.0
OpenCV ≥ 4.6
CMake ≥ 3.18
uv (pip install uv or brew install uv)

Step 1 — Quantize (Python)

uv sync
# put ~100 representative images in data/calibration/
uv run python/quantize.py --model yolov8n.pt --data data/calibration/dataset.yaml
cp yolov8n.engine models/

Step 2 — Build C++ inference

mkdir cpp/build && cd cpp/build
cmake .. -DTRT_ROOT=/usr/local/tensorrt
make -j$(nproc)

Step 3 — Run inference

./cpp/build/infer models/yolov8n.engine data/test output/

Step 4 — Benchmark Python FP32 vs INT8

uv run python/benchmark.py --fp32 yolov8n.pt --int8 models/yolov8n.engine --images data/test

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
cpp		cpp
python		python
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YOLO Quantization → C++ Inference

Benchmarks

Project structure

Prerequisites

Step 1 — Quantize (Python)

Step 2 — Build C++ inference

Step 3 — Run inference

Step 4 — Benchmark Python FP32 vs INT8

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

YOLO Quantization → C++ Inference

Benchmarks

Project structure

Prerequisites

Step 1 — Quantize (Python)

Step 2 — Build C++ inference

Step 3 — Run inference

Step 4 — Benchmark Python FP32 vs INT8

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages