This example provides a minimal (<2k lines) and faithful implementation of the following papers:
- Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
- Feature Pyramid Networks for Object Detection
- Mask R-CNN
with the support of:
- Multi-GPU / distributed training
- Cross-GPU BatchNorm
- Group Normalization
- Python 3; TensorFlow >= 1.6 (1.4 or 1.5 can run but may crash due to a TF bug);
- pycocotools, OpenCV.
- Pre-trained ImageNet ResNet model from tensorpack model zoo. Use the models with "-AlignPadding".
- COCO data. It needs to have the following directory structure:
COCO/DIR/
annotations/
instances_train2014.json
instances_val2014.json
instances_minival2014.json
instances_valminusminival2014.json
train2014/
COCO_train2014_*.jpg
val2014/
COCO_val2014_*.jpg
minival and valminusminival are optional. You can download them
here.
On a single machine:
./train.py --config \
MODE_MASK=True MODE_FPN=True \
DATA.BASEDIR=/path/to/COCO/DIR \
BACKBONE.WEIGHTS=/path/to/ImageNet-R50-Pad.npz \
To run distributed training, set TRAINER=horovod and refer to HorovodTrainer docs.
Options can be changed by either the command line or the config.py file.
Recommended configurations are listed in the table below.
The code is only valid for training with 1, 2, 4 or >=8 GPUs. Not training with 8 GPUs may result in different performance from the table below.
To predict on an image (and show output in a window):
./train.py --predict input.jpg --load /path/to/model --config SAME-AS-TRAINING
Evaluate the performance of a model on COCO. (Several trained models can be downloaded in model zoo:
./train.py --evaluate output.json --load /path/to/COCO-R50C4-MaskRCNN-Standard.npz \
--config MODE_MASK=True DATA.BASEDIR=/path/to/COCO/DIR
Evaluation or prediction will need the same --config used during training.
These models are trained with different configurations on trainval35k and evaluated on minival using mAP@IoU=0.50:0.95. MaskRCNN results contain both box and mask mAP.
| Backbone | mAP (box;mask) |
Detectron mAP (box;mask) |
Time | Configurations (click to expand) |
|---|---|---|---|---|
| R50-C4 | 33.1 | 18h on 8 V100s | super quickMODE_MASK=False FRCNN.BATCH_PER_IM=64PREPROC.SHORT_EDGE_SIZE=600 PREPROC.MAX_SIZE=1024TRAIN.LR_SCHEDULE=[150000,230000,280000] |
|
| R50-C4 | 36.6 | 36.5 | 44h on 8 V100s | standardMODE_MASK=False |
| R50-FPN | 37.5 | 37.91 | 28h on 8 V100s | standardMODE_MASK=False MODE_FPN=True |
| R50-C4 | 36.8;32.1 | 39h on 8 P100s | quickMODE_MASK=True FRCNN.BATCH_PER_IM=256TRAIN.LR_SCHEDULE=[150000,230000,280000] |
|
| R50-C4 | 37.8;33.1 | 37.8;32.8 | 49h on 8 V100s | standardMODE_MASK=True |
| R50-FPN | 38.2;34.9 | 38.6;34.51 | 32h on 8 V100s | standardMODE_MASK=True MODE_FPN=True |
| R50-FPN | 38.5;34.8 | 38.6;34.22 | 34h on 8 V100s | standard+ConvHeadMODE_MASK=True MODE_FPN=TrueFPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_head |
| R50-FPN | 39.5;35.2 | 39.5;34.42 | 34h on 8 V100s | standard+ConvGNHeadMODE_MASK=True MODE_FPN=TrueFPN.FRCNN_HEAD_FUNC=fastrcnn_4conv1fc_gn_head |
| R101-C4 | 40.8;35.1 | 63h on 8 V100s | standardMODE_MASK=True BACKBONE.RESNET_NUM_BLOCK=[3,4,23,3] |
1: Slightly different configurations.
2: Numbers taken from Group Normalization
Performance in Detectron can be roughly reproduced, some are better but some are worse, probably due to many tiny implementation details. Note that most of these numbers are better than what's in the paper.