Intial repository

Ting PAN
Commit 696e7be3 authored Sep 19, 2022 by Ting PAN
Showing with 13019 additions and 0 deletions
.flake8
.gitignore
LICENSE
MODEL_ZOO.md
README.md
configs/faster_rcnn/README.md
configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x.yml
configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x.yml
configs/mask_rcnn/README.md
configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x.yml
configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x.yml
configs/pascal_voc/README.md
configs/pascal_voc/voc_faster_rcnn_R_50_FPN_15e.yml
configs/pascal_voc/voc_retinanet_R_50_FPN_120e.yml
configs/pascal_voc/voc_ssd300_VGG_16_120e.yml
configs/pascal_voc/voc_ssd512_VGG_16_120e.yml
configs/retinanet/README.md
configs/retinanet/coco_retinanet_R_50_FPN_1x.yml
configs/retinanet/coco_retinanet_R_50_FPN_3x.yml
csrc/cxx/operators/mask_op.h
--- a/.flake8
+++ b/.flake8
+[flake8]
+max-line-length = 120
+ignore =  E741, # ambiguous variable name
+          F403, # ‘from module import *’ used; unable to detect undefined names
+          F405, # name may be undefined, or defined from star imports: module
+          F811, # redefinition of unused name from line N
+          F821, # undefined name
+          W503, # line break before binary operator
+          W504  # line break after binary operator
+# module imported but unused
+per-file-ignores = __init__.py: F401
--- a/.gitignore
+++ b/.gitignore
+# Compiled Object files
+*.slo
+*.lo
+*.o
+*.cuo
+# Compiled Dynamic libraries
+*.so
+*.dll
+*.dylib
+# Compiled Static libraries
+*.lai
+*.la
+*.a
+*.lib
+# Compiled python
+*.pyc
+__pycache__
+# Compiled MATLAB
+*.mex*
+# IPython notebook checkpoints
+.ipynb_checkpoints
+# Editor temporaries
+*.swp
+*~
+# Sublime Text settings
+*.sublime-workspace
+*.sublime-project
+# Eclipse Project settings
+*.*project
+.settings
+# QtCreator files
+*.user
+# VSCode files
+.vscode
+# IDEA files
+.idea
+# OSX dir files
+.DS_Store
+# Android files
+.gradle
+*.iml
+local.properties
--- a/LICENSE
+++ b/LICENSE
+Copyright (c) 2017, SeetaTech, Co.,Ltd. All rights reserved.
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
+ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/MODEL_ZOO.md
+++ b/MODEL_ZOO.md
+# Benchmark and Model Zoo
+## Introduction
+### Pretrained Models
+Refer to [Pretrained Models](data/pretrained) for details.
+## Baselines
+### Faster R-CNN
+Refer to [Faster R-CNN](configs/faster_rcnn) for details.
+### Mask R-CNN
+Refer to [Mask R-CNN](configs/mask_rcnn) for details.
+### Pascal VOC
+Refer to [Pascal VOC](configs/pascal_voc) for details.
--- a/README.md
+++ b/README.md
+# SeetaDet
+SeetaDet is a platform implementing popular object detection algorithms.
+This platform works with [**SeetaDragon**](https://dragon.seetatech.com), and uses the [**PyTorch**](https://dragon.seetatech.com/api/python/#pytorch) style.
+<img src="https://dragon.seetatech.com/download/seetadet/assets/banner.png"/>
+## Installation
+Install from PyPI:
+```bash
+pip install seeta-det
+```
+Or, clone this repository to local disk and install:
+```bash
+cd seetadet && pip install .
+```
+You can also install from the remote repository: 
+```bash
+pip install git+ssh://git@github.com/seetaresearch/seetadet.git
+```
+If you prefer to develop locally, build but not install to ***site-packages***:
+```bash
+cd seetadet && python setup.py build
+```
+## Quick Start
+### Train a detection model
+```bash
+cd tools
+python train.py --cfg <MODEL_YAML>
+```
+We have provided the default YAML examples into [configs](configs).
+### Test a detection model
+```bash
+cd tools
+python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
+```
+### Export a detection model to ONNX
+```bash
+cd tools
+python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
+```
+### Serve a detection model
+```bash
+cd tools
+python serve.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
+```
+## Benchmark and Model Zoo
+Results and models are available in the [Model Zoo](MODEL_ZOO.md).
+## License
+[BSD 2-Clause license](LICENSE)
--- a/configs/faster_rcnn/README.md
+++ b/configs/faster_rcnn/README.md
+# Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
+## Introduction
+```
+@article{Ren_2017,
+   title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks},
+   journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
+   publisher={Institute of Electrical and Electronics Engineers (IEEE)},
+   author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
+   year={2017},
+   month={Jun},
+}
+```
+## COCO Object Detection Baselines
+| Model | Lr sched | Infer time (fps) | box AP | Download |
+| :---: | :------: | :--------------: | :----: | :-----: |
+| [R50-FPN](coco_faster_rcnn_R_50_FPN_1x.yml) | 1x | 37.04 | 37.7 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/model_7abb52ab.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/logs.json) |
+| [R50-FPN](coco_faster_rcnn_R_50_FPN_3x.yml) | 3x | 37.04 | 39.8 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x/model_04e548ca.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x/logs.json) |
--- a/configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x.yml
+++ b/configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x.yml
+NUM_GPUS: 8
+MODEL:
+  TYPE: 'faster_rcnn'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+            'bus', 'train', 'truck', 'boat', 'traffic light',
+            'fire hydrant', 'stop sign', 'parking meter', 'bench',
+            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
+            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
+            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
+            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
+            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50_v1a.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
+SOLVER:
+  BASE_LR: 0.02
+  DECAY_STEPS: [60000, 80000]
+  MAX_STEPS: 90000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  IMS_PER_BATCH: 2
+  SCALES: [640, 672, 704, 736, 768, 800]
+  MAX_SIZE: 1333
+TEST:
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
+  SCALES: [800]
+  MAX_SIZE: 1333
--- a/configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x.yml
+++ b/configs/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x.yml
+NUM_GPUS: 8
+MODEL:
+  TYPE: 'faster_rcnn'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+            'bus', 'train', 'truck', 'boat', 'traffic light',
+            'fire hydrant', 'stop sign', 'parking meter', 'bench',
+            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
+            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
+            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
+            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
+            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50_v1a.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
+SOLVER:
+  BASE_LR: 0.02
+  DECAY_STEPS: [210000, 250000]
+  MAX_STEPS: 270000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  IMS_PER_BATCH: 2
+  SCALES: [640, 672, 704, 736, 768, 800]
+  MAX_SIZE: 1333
+TEST:
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
+  SCALES: [800]
+  MAX_SIZE: 1333
--- a/configs/mask_rcnn/README.md
+++ b/configs/mask_rcnn/README.md
+# Mask R-CNN
+## Introduction
+```
+@article{He_2017,
+   title={Mask R-CNN},
+   journal={2017 IEEE International Conference on Computer Vision (ICCV)},
+   publisher={IEEE},
+   author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
+   year={2017},
+   month={Oct}
+}
+```
+## COCO Instance Segmentation Baselines
+| Model | Lr sched | Infer time (fps) | box AP | mask AP | Download |
+| :---: | :------: | :---------------: | :----: | :-----: | :------: |
+| [R50-FPN](coco_mask_rcnn_R_50_FPN_1x.yml) | 1x | 30.30 | 38.3 | 34.9 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/model_b27317db.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/logs.json) |
+| [R50-FPN](coco_mask_rcnn_R_50_FPN_3x.yml) | 3x | 30.30 | 40.7 | 36.8 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x/model_6f7e3878.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x/logs.json) |
--- a/configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x.yml
+++ b/configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x.yml
+NUM_GPUS: 8
+MODEL:
+  TYPE: 'mask_rcnn'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+            'bus', 'train', 'truck', 'boat', 'traffic light',
+            'fire hydrant', 'stop sign', 'parking meter', 'bench',
+            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
+            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
+            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
+            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
+            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50_v1a.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
+SOLVER:
+  BASE_LR: 0.02
+  DECAY_STEPS: [60000, 80000]
+  MAX_STEPS: 90000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  LOADER: 'mask_train'
+  IMS_PER_BATCH: 2
+  SCALES: [640, 672, 704, 736, 768, 800]
+  MAX_SIZE: 1333
+TEST:
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
+  SCALES: [800]
+  MAX_SIZE: 1333
--- a/configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x.yml
+++ b/configs/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x.yml
+NUM_GPUS: 8
+MODEL:
+  TYPE: 'mask_rcnn'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+            'bus', 'train', 'truck', 'boat', 'traffic light',
+            'fire hydrant', 'stop sign', 'parking meter', 'bench',
+            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
+            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
+            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
+            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
+            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50_v1a.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
+SOLVER:
+  BASE_LR: 0.02
+  DECAY_STEPS: [210000, 250000]
+  MAX_STEPS: 270000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  LOADER: 'mask_train'
+  IMS_PER_BATCH: 2
+  SCALES: [640, 672, 704, 736, 768, 800]
+  MAX_SIZE: 1333
+TEST:
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
+  SCALES: [800]
+  MAX_SIZE: 1333
--- a/configs/pascal_voc/README.md
+++ b/configs/pascal_voc/README.md
+# Pascal VOC
+## Introduction
+```latex
+@Article{Everingham10,
+   author = "Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.",
+   title = "The Pascal Visual Object Classes (VOC) Challenge",
+   journal = "International Journal of Computer Vision",
+   volume = "88",
+   year = "2010",
+   number = "2",
+   month = jun,
+   pages = "303--338",
+}
+```
+## Object Detection Baselines
+### Faster R-CNN
+| Model | Lr sched | Infer time (fps) | box AP | Download |
+| :---: | :------: | :--------------: | :----: | :------: |
+| [R50-FPN](voc_faster_rcnn_R_50_FPN_15e.yml) | 15e | 47.62 | 82.1 | [model](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_faster_rcnn_R_50_FPN_15e/model_3dcb03f9.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_faster_rcnn_R_50_FPN_15e/logs.json) |
+### RetinaNet
+| Model | Lr sched | Infer time (fps) | box AP | Download |
+| :---: | :------: | :--------------: | :----: | :------: |
+| [R50-FPN](voc_retinanet_R_50_FPN_120e.yml) | 120 | 58.82 | 82.4 | [model](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_retinanet_R_50_FPN_120e/model_1ae4cd3d.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_retinanet_R_50_FPN_120e/logs.json) |
+### SSD
+| Model | Lr sched | Infer time (fps) | box AP | Download |
+| :---: | :------: | :--------------: | :----: | :------: |
+| [VGG16-SSD300](voc_ssd300_VGG_16_120e.yml) | 120 | 125 | 77.8 | [model](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_ssd300_VGG_16_120e/model_3417d961.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_ssd300_VGG_16_120e/logs.json) |
--- a/configs/pascal_voc/voc_faster_rcnn_R_50_FPN_15e.yml
+++ b/configs/pascal_voc/voc_faster_rcnn_R_50_FPN_15e.yml
+NUM_GPUS: 2
+MODEL:
+  TYPE: 'faster_rcnn'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+FPN:
+  MIN_LEVEL: 2
+  MAX_LEVEL: 6
+ANCHOR_GENERATOR:
+  STRIDES: [4, 8, 16, 32, 64]
+FAST_RCNN:
+  BBOX_REG_LOSS_TYPE: 'smooth_l1'
+SOLVER:
+  BASE_LR: 0.002
+  DECAY_STEPS: [80000, 100000]
+  MAX_STEPS: 120000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'voc_cascade_rcnn_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  USE_DIFF: True
+  IMS_PER_BATCH: 2
+  SCALES: [480, 512, 544, 576, 608, 640]
+  MAX_SIZE: 1000
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [640]
+  MAX_SIZE: 1000
+  NMS_THRESH: 0.45
--- a/configs/pascal_voc/voc_retinanet_R_50_FPN_120e.yml
+++ b/configs/pascal_voc/voc_retinanet_R_50_FPN_120e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'retinanet'
+  PRECISION: 'float32'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'resnet50.fpn'
+SOLVER:
+  BASE_LR: 0.01
+  WARM_UP_STEPS: 3000
+  DECAY_STEPS: [80000, 100000]
+  MAX_STEPS: 120000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'voc_retinanet_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  USE_DIFF: True
+  IMS_PER_BATCH: 16
+  SCALES: [512]
+  SCALES_RANGE: [0.1, 2.0]
+  MAX_SIZE: 512
+  CROP_SIZE: 512
+  COLOR_JITTER: 0.5
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [512]
+  MAX_SIZE: 512
+  CROP_SIZE: 512
+  NMS_THRESH: 0.45
--- a/configs/pascal_voc/voc_ssd300_VGG_16_120e.yml
+++ b/configs/pascal_voc/voc_ssd300_VGG_16_120e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'ssd'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'vgg16_fcn.ssd300'
+  NORM: ''
+  FREEZE_AT: 0
+  COARSEST_STRIDE: 300
+FPN:
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
+  STRIDES: [8, 16, 32, 64, 100, 300]
+  SIZES: [[30, 60], [60, 110],[110, 162],
+          [162, 213], [213, 264], [264, 315]]
+  ASPECT_RATIOS: [[1, 2, 0.5],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5],
+                  [1, 2, 0.5]]
+SOLVER:
+  BASE_LR: 0.001
+  WEIGHT_DECAY: 0.0005
+  DECAY_STEPS: [80000, 100000]
+  MAX_STEPS: 120000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'voc_ssd300_VGG_16'
+TRAIN:
+  WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  LOADER: 'ssd_train'
+  USE_DIFF: True
+  IMS_PER_BATCH: 16
+  SCALES: [300]
+  SCALES_RANGE: [0.25, 1.0]
+  COLOR_JITTER: 0.5
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 8
+  SCALES: [300]
+  NMS_THRESH: 0.45
+  SCORE_THRESH: 0.01
--- a/configs/pascal_voc/voc_ssd512_VGG_16_120e.yml
+++ b/configs/pascal_voc/voc_ssd512_VGG_16_120e.yml
+NUM_GPUS: 1
+MODEL:
+  TYPE: 'ssd'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'aeroplane', 'bicycle', 'bird', 'boat',
+            'bottle', 'bus', 'car', 'cat', 'chair',
+            'cow', 'diningtable', 'dog', 'horse',
+            'motorbike', 'person', 'pottedplant',
+            'sheep', 'sofa', 'train', 'tvmonitor']
+BACKBONE:
+  TYPE: 'vgg16_fcn.ssd512'
+  NORM: ''
+  FREEZE_AT: 0
+  COARSEST_STRIDE: 512
+FPN:
+  ACTIVATION: 'ReLU'
+ANCHOR_GENERATOR:
+  STRIDES: [8, 16, 32, 64, 128, 256, 512]
+  SIZES: [[35.84, 76.8],
+          [76.8, 153.6],
+          [153.6, 230.4],
+          [230.4, 307.2],
+          [307.2, 384.0],
+          [384.0, 460.8],
+          [460.8, 537.6]]
+  ASPECT_RATIOS: [[1, 2, 0.5],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5, 3, 0.33],
+                  [1, 2, 0.5],
+                  [1, 2, 0.5]]
+SOLVER:
+  BASE_LR: 0.001
+  WEIGHT_DECAY: 0.0005
+  DECAY_STEPS: [80000, 100000]
+  MAX_STEPS: 120000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'voc_ssd512_VGG_16'
+AUG:
+  COLOR_JITTER: 0.5
+TRAIN:
+  WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
+  DATASET: '../data/datasets/voc_trainval0712'
+  IMS_PER_BATCH: 16
+  SCALES: [512]
+  SCALES_RANGE: [0.25, 1.0]
+  LOADER: 'ssd_train'
+TEST:
+  DATASET: '../data/datasets/voc_test2007'
+  JSON_DATASET: '../data/datasets/voc_test2007.json'
+  EVALUATOR: 'voc2007'
+  IMS_PER_BATCH: 1
+  SCALES: [512]
+  NMS_THRESH: 0.45
+  SCORE_THRESH: 0.01
--- a/configs/retinanet/README.md
+++ b/configs/retinanet/README.md
+# Focal Loss for Dense Object Detection
+## Introduction
+```
+@inproceedings{lin2017focal,
+  title={Focal loss for dense object detection},
+  author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
+  booktitle={Proceedings of the IEEE international conference on computer vision},
+  year={2017}
+}
+```
+## COCO Object Detection Baselines
+| Model | Lr sched | Infer time (s/im) | box AP | Download |
+| :---: | :------: | :---------------: | :----: | :------: |
+| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
+| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
+## Pascal VOC Object Detection Baselines
+| Model | Lr sched | Infer time (s/im) | AP@0.5 | Download |
+| :---: | :------: | :---------------: | :----: | :------: |
+| [R-50-FPN-512](voc_retinanet_R-50-FPN_512_120e.yml) | 120e | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
+| [R-50-FPN-512](voc_retinanet_R-50-FPN_640_120e.yml) | 120e | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
--- a/configs/retinanet/coco_retinanet_R_50_FPN_1x.yml
+++ b/configs/retinanet/coco_retinanet_R_50_FPN_1x.yml
+NUM_GPUS: 8
+MODEL:
+  TYPE: 'retinanet'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+            'bus', 'train', 'truck', 'boat', 'traffic light',
+            'fire hydrant', 'stop sign', 'parking meter', 'bench',
+            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
+            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
+            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
+            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
+            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50_v1a.fpn'
+SOLVER:
+  BASE_LR: 0.01
+  DECAY_STEPS: [60000, 80000]
+  MAX_STEPS: 90000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'coco_retinanet_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  IMS_PER_BATCH: 2
+  SCALES: [640, 672, 704, 736, 768, 800]
+  MAX_SIZE: 1333
+TEST:
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
+  SCALES: [800]
+  MAX_SIZE: 1333
--- a/configs/retinanet/coco_retinanet_R_50_FPN_3x.yml
+++ b/configs/retinanet/coco_retinanet_R_50_FPN_3x.yml
+NUM_GPUS: 8
+MODEL:
+  TYPE: 'retinanet'
+  PRECISION: 'float16'
+  CLASSES: ['__background__',
+            'person', 'bicycle', 'car', 'motorcycle', 'airplane',
+            'bus', 'train', 'truck', 'boat', 'traffic light',
+            'fire hydrant', 'stop sign', 'parking meter', 'bench',
+            'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
+            'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
+            'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
+            'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
+            'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
+            'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
+            'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
+            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
+            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
+            'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
+            'teddy bear', 'hair drier', 'toothbrush']
+BACKBONE:
+  TYPE: 'resnet50_v1a.fpn'
+SOLVER:
+  BASE_LR: 0.01
+  DECAY_STEPS: [210000, 250000]
+  MAX_STEPS: 270000
+  SNAPSHOT_EVERY: 5000
+  SNAPSHOT_PREFIX: 'coco_retinanet_R_50_FPN'
+TRAIN:
+  WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
+  DATASET: '../data/datasets/coco_train2017'
+  IMS_PER_BATCH: 2
+  SCALES: [640, 672, 704, 736, 768, 800]
+  MAX_SIZE: 1333
+TEST:
+  DATASET: '../data/datasets/coco_val2017'
+  JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
+  EVALUATOR: 'coco'
+  IMS_PER_BATCH: 1
+  SCALES: [800]
+  MAX_SIZE: 1333
--- a/csrc/cxx/operators/mask_op.h
+++ b/csrc/cxx/operators/mask_op.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_OPERATORS_MASK_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_MASK_OP_H_
+#include <dragon/core/operator.h>
+namespace dragon {
+template <class Context>
+class PasteMaskOp final : public Operator<Context> {
+ public:
+  PasteMaskOp(const OperatorDef& def, Workspace* ws)
+      : Operator<Context>(def, ws),
+        mask_threshold_(OP_SINGLE_ARG(float, "mask_threshold", 0.5f)) {
+    INITIALIZE_OP_REPEATED_ARG(int64_t, sizes);
+  }
+  USE_OPERATOR_FUNCTIONS;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
+  }
+  template <typename T>
+  void DoRunWithType();
+ protected:
+  float mask_threshold_;
+  DECLARE_OP_REPEATED_ARG(int64_t, sizes);
+};
+DEFINE_OP_REPEATED_ARG(int64_t, PasteMaskOp, sizes);
+} // namespace dragon
+#endif // DRAGON_EXTENSION_OPERATORS_MASK_OP_H_
--- a/csrc/cxx/operators/mask_paste_op.cc
+++ b/csrc/cxx/operators/mask_paste_op.cc
+#include <dragon/core/workspace.h>
+#include "../operators/mask_op.h"
+#include "../utils/detection.h"
+namespace dragon {
+template <class Context>
+template <typename T>
+void PasteMaskOp<Context>::DoRunWithType() {
+  auto &X_masks = Input(0), &X_boxes = Input(1), *Y = Output(0);
+  vector<int64_t> Y_dims({X_masks.dim(0)});
+  int num_sizes;
+  sizes(0, &num_sizes);
+  for (int i = 0; i < num_sizes; ++i) {
+    Y_dims.push_back(sizes(i));
+  }
+  if (num_sizes == 2) {
+    detection::PasteMask(
+        Y_dims[0], // N
+        Y_dims[1], // H
+        Y_dims[2], // W
+        X_masks.dim(1), // mask_h
+        X_masks.dim(2), // mask_w
+        mask_threshold_,
+        X_masks.template data<T, Context>(),
+        X_boxes.template data<float, Context>(),
+        Y->Reshape(Y_dims)->template mutable_data<uint8_t, Context>(),
+        ctx());
+  } else {
+    LOG(FATAL) << "PasteMask" << num_sizes << "d is not supported.";
+  }
+}
+DEPLOY_CPU_OPERATOR(PasteMask);
+#ifdef USE_CUDA
+DEPLOY_CUDA_OPERATOR(PasteMask);
+#endif
+#ifdef USE_MPS
+DEPLOY_MPS_OPERATOR(PasteMask, PasteMask);
+#endif
+OPERATOR_SCHEMA(PasteMask).NumInputs(2).NumOutputs(1);
+NO_GRADIENT(PasteMask);
+} // namespace dragon
--- a/csrc/cxx/operators/nms_op.cc
+++ b/csrc/cxx/operators/nms_op.cc
+#include "../operators/nms_op.h"
+#include "../utils/detection.h"
+namespace dragon {
+template <class Context>
+template <typename T>
+void NonMaxSuppressionOp<Context>::DoRunWithType() {
+  auto &X = Input(0), *Y = Output(0);
+  CHECK(X.ndim() == 2 && X.dim(1) == 5)
+      << "\nThe dimensions of boxes should be (num_boxes, 5).";
+  detection::ApplyNMS(
+      X.dim(0),
+      X.dim(0),
+      0,
+      iou_threshold_,
+      X.template mutable_data<T, Context>(),
+      out_indices_,
+      ctx());
+  Y->template CopyFrom<int64_t>(out_indices_);
+}
+DEPLOY_CPU_OPERATOR(NonMaxSuppression);
+#ifdef USE_CUDA
+DEPLOY_CUDA_OPERATOR(NonMaxSuppression);
+#endif
+#ifdef USE_MPS
+DEPLOY_MPS_OPERATOR(NonMaxSuppression, NonMaxSuppression);
+#endif
+OPERATOR_SCHEMA(NonMaxSuppression).NumInputs(1).NumOutputs(1);
+NO_GRADIENT(NonMaxSuppression);
+} // namespace dragon
--- a/csrc/cxx/operators/nms_op.h
+++ b/csrc/cxx/operators/nms_op.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
+#include <dragon/core/operator.h>
+namespace dragon {
+template <class Context>
+class NonMaxSuppressionOp final : public Operator<Context> {
+ public:
+  NonMaxSuppressionOp(const OperatorDef& def, Workspace* ws)
+      : Operator<Context>(def, ws),
+        iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
+  USE_OPERATOR_FUNCTIONS;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
+  }
+  template <typename T>
+  void DoRunWithType();
+ protected:
+  float iou_threshold_;
+  vector<int64_t> out_indices_;
+};
+} // namespace dragon
+#endif // DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
--- a/csrc/cxx/operators/retinanet_decoder_op.cc
+++ b/csrc/cxx/operators/retinanet_decoder_op.cc
+#include "../operators/retinanet_decoder_op.h"
+#include "../utils/detection.h"
+namespace dragon {
+template <class Context>
+template <typename T>
+void RetinaNetDecoderOp<Context>::DoRunWithType() {
+  auto N = Input(SCORES).dim(0);
+  auto AxK = Input(SCORES).dim(1);
+  auto C = Input(SCORES).dim(2);
+  auto AxKxC = AxK * C;
+  auto A = int64_t(ratios_.size() * scales_.size());
+  auto num_lvls = int64_t(strides_.size());
+  // Generate anchors.
+  CHECK_EQ(Input(GRID_INFO).dim(0), num_lvls);
+  cell_anchors_.resize(strides_.size());
+  vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
+  for (int i = 0; i < strides_.size(); ++i) {
+    grid_args[i].stride = strides_[i];
+    auto& anchors = cell_anchors_[i];
+    if (int64_t(anchors.size()) == A * 4) continue;
+    anchors.resize(A * 4);
+    detection::GenerateAnchors(
+        strides_[i],
+        int64_t(ratios_.size()),
+        int64_t(scales_.size()),
+        ratios_.data(),
+        scales_.data(),
+        anchors.data());
+  }
+  // Set grid arguments.
+  auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
+  detection::SetGridArgs(AxK, A, grid_info, grid_args);
+  // Decode detections.
+  auto* scores = Input(SCORES).template data<T, Context>();
+  auto* deltas = Input(DELTAS).template data<T, CPUContext>();
+  auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
+  auto* Y = Output(0)->Reshape({N * num_lvls * pre_nms_topk_, 7});
+  auto* dets = Y->template mutable_data<float, CPUContext>();
+  int64_t size_dets = 0;
+  for (int batch_ind = 0; batch_ind < N; ++batch_ind) {
+    detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
+    im_args.batch_ind = batch_ind;
+    for (int lvl_ind = 0; lvl_ind < num_lvls; ++lvl_ind) {
+      detection::SelectTopK(
+          grid_args[lvl_ind].size * C,
+          pre_nms_topk_,
+          score_thresh_,
+          scores + batch_ind * AxKxC + grid_args[lvl_ind].offset * C,
+          scores_,
+          indices_,
+          ctx());
+      auto* offset_dets = dets + size_dets * 7;
+      auto num_dets = int64_t(indices_.size());
+      size_dets += num_dets;
+      detection::GetAnchors(
+          num_dets,
+          A, // num_cell_anchors
+          C, // num_classes
+          grid_args[lvl_ind],
+          cell_anchors_[lvl_ind].data(),
+          indices_.data(),
+          offset_dets);
+      detection::DecodeDetections(
+          num_dets,
+          AxK, // num_anchors
+          C, // num_classes
+          im_args,
+          grid_args[lvl_ind],
+          scores_.data(),
+          deltas + batch_ind * Input(DELTAS).stride(0),
+          indices_.data(),
+          offset_dets);
+    }
+  }
+  // Shrink to the correct dimensions.
+  Y->Reshape({size_dets, 7});
+}
+DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
+#ifdef USE_CUDA
+DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
+#endif
+#ifdef USE_MPS
+REGISTER_MPS_OPERATOR(RetinaNetDecoder, RetinaNetDecoderOp<CPUContext>);
+#endif
+OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(4).NumOutputs(1);
+NO_GRADIENT(RetinaNetDecoder);
+} // namespace dragon
--- a/csrc/cxx/operators/retinanet_decoder_op.h
+++ b/csrc/cxx/operators/retinanet_decoder_op.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
+#include <dragon/core/operator.h>
+namespace dragon {
+template <class Context>
+class RetinaNetDecoderOp final : public Operator<Context> {
+ public:
+  RetinaNetDecoderOp(const OperatorDef& def, Workspace* ws)
+      : Operator<Context>(def, ws),
+        strides_(OP_REPEATED_ARG(int64_t, "strides")),
+        ratios_(OP_REPEATED_ARG(float, "ratios")),
+        scales_(OP_REPEATED_ARG(float, "scales")),
+        pre_nms_topk_(OP_SINGLE_ARG(int64_t, "pre_nms_topk", 1000)),
+        score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
+  USE_OPERATOR_FUNCTIONS;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
+  }
+  template <typename T>
+  void DoRunWithType();
+  enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
+ protected:
+  float score_thresh_;
+  vector<int64_t> strides_;
+  vector<float> ratios_, scales_;
+  int64_t pre_nms_topk_;
+  vector<float> scores_;
+  vector<int64_t> indices_;
+  vector<vector<float>> cell_anchors_;
+};
+} // namespace dragon
+#endif // DRAGON_EXTENSION_PERATORS_RETINANET_DECODER_OP_H_
--- a/csrc/cxx/operators/rpn_decoder_op.cc
+++ b/csrc/cxx/operators/rpn_decoder_op.cc
+#include "../operators/rpn_decoder_op.h"
+#include "../utils/detection.h"
+namespace dragon {
+template <class Context>
+template <typename T>
+void RPNDecoderOp<Context>::DoRunWithType() {
+  auto N = Input(SCORES).dim(0);
+  auto AxK = Input(SCORES).dim(1);
+  auto A = int64_t(ratios_.size() * scales_.size());
+  auto num_lvls = int64_t(strides_.size());
+  // Generate anchors.
+  CHECK_EQ(Input(GRID_INFO).dim(0), num_lvls);
+  cell_anchors_.resize(strides_.size());
+  vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
+  for (int i = 0; i < strides_.size(); ++i) {
+    grid_args[i].stride = strides_[i];
+    auto& anchors = cell_anchors_[i];
+    if (int64_t(anchors.size()) == A * 4) continue;
+    anchors.resize(A * 4);
+    detection::GenerateAnchors(
+        strides_[i],
+        int64_t(ratios_.size()),
+        int64_t(scales_.size()),
+        ratios_.data(),
+        scales_.data(),
+        anchors.data());
+  }
+  // Set grid arguments.
+  auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
+  detection::SetGridArgs(AxK, A, grid_info, grid_args);
+  // Decode proposals.
+  auto* scores = Input(SCORES).template data<T, CPUContext>();
+  auto* deltas = Input(DELTAS).template data<T, CPUContext>();
+  auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
+  auto* Y = Output("Y")->Reshape({N * num_lvls * pre_nms_topk_, 5});
+  auto* dets = Y->template mutable_data<float, CPUContext>();
+  for (int batch_ind = 0; batch_ind < N; ++batch_ind) {
+    detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
+    im_args.batch_ind = batch_ind;
+    for (int lvl_ind = 0; lvl_ind < num_lvls; ++lvl_ind) {
+      detection::SelectTopK(
+          grid_args[lvl_ind].size,
+          pre_nms_topk_,
+          0.f,
+          scores + batch_ind * AxK + grid_args[lvl_ind].offset,
+          scores_,
+          indices_,
+          (CPUContext*)nullptr); // Faster.
+      indices_.resize(pre_nms_topk_, indices_.back());
+      auto* offset_dets = dets + lvl_ind * pre_nms_topk_ * 5;
+      detection::GetAnchors(
+          pre_nms_topk_,
+          A, // num_cell_anchors
+          grid_args[lvl_ind],
+          cell_anchors_[lvl_ind].data(),
+          indices_.data(),
+          offset_dets);
+      detection::DecodeProposals(
+          pre_nms_topk_,
+          AxK, // num_anchors
+          im_args,
+          grid_args[lvl_ind],
+          scores_.data(),
+          deltas + batch_ind * Input(DELTAS).stride(0),
+          indices_.data(),
+          offset_dets);
+      detection::SortBoxes<T, detection::Box5d<T>>(pre_nms_topk_, offset_dets);
+    }
+  }
+  // Apply NMS.
+  auto* dets_v2 = Y->template data<float, Context>();
+  int64_t size_rois = 0;
+  scores_.resize(N * post_nms_topk_);
+  indices_.resize(N * post_nms_topk_);
+  for (int batch_ind = 0; batch_ind < N; ++batch_ind) {
+    std::priority_queue<std::pair<float, int64_t>> pq;
+    for (int lvl_ind = 0; lvl_ind < num_lvls; ++lvl_ind) {
+      const auto offset = lvl_ind * pre_nms_topk_;
+      detection::ApplyNMS(
+          pre_nms_topk_, // N
+          pre_nms_topk_, // K
+          offset * 5, // boxes_offset
+          nms_thresh_,
+          dets_v2,
+          nms_indices_,
+          ctx());
+      for (size_t i = 0; i < nms_indices_.size(); ++i) {
+        const auto index = nms_indices_[i] + offset;
+        pq.push(std::make_pair(*(dets + index * 5 + 4), index));
+      }
+    }
+    for (int i = 0; i < post_nms_topk_ && !pq.empty(); ++i) {
+      scores_[size_rois] = batch_ind;
+      indices_[size_rois++] = pq.top().second;
+      pq.pop();
+    }
+  }
+  // Apply Histogram.
+  detection::ApplyHistogram(
+      size_rois,
+      min_level_,
+      max_level_,
+      canonical_level_,
+      canonical_scale_,
+      dets,
+      scores_.data(),
+      indices_.data(),
+      output_rois_);
+  // Copy to outputs.
+  for (int i = 0; i < OutputSize(); ++i) {
+    const auto& rois = output_rois_[i];
+    vector<int64_t> dims({int64_t(rois.size()) / 5, 5});
+    auto* Yi = Output(i)->Reshape(dims);
+    std::memcpy(
+        Yi->template mutable_data<T, CPUContext>(),
+        rois.data(),
+        sizeof(T) * rois.size());
+  }
+}
+DEPLOY_CPU_OPERATOR(RPNDecoder);
+#ifdef USE_CUDA
+DEPLOY_CUDA_OPERATOR(RPNDecoder);
+#endif
+#ifdef USE_MPS
+DEPLOY_MPS_OPERATOR(RPNDecoder, RPNDecoder);
+#endif
+OPERATOR_SCHEMA(RPNDecoder).NumInputs(4).NumOutputs(1, INT_MAX);
+NO_GRADIENT(RPNDecoder);
+} // namespace dragon
--- a/csrc/cxx/operators/rpn_decoder_op.h
+++ b/csrc/cxx/operators/rpn_decoder_op.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
+#define DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
+#include <dragon/core/operator.h>
+namespace dragon {
+template <class Context>
+class RPNDecoderOp final : public Operator<Context> {
+ public:
+  RPNDecoderOp(const OperatorDef& def, Workspace* ws)
+      : Operator<Context>(def, ws),
+        strides_(OP_REPEATED_ARG(int64_t, "strides")),
+        ratios_(OP_REPEATED_ARG(float, "ratios")),
+        scales_(OP_REPEATED_ARG(float, "scales")),
+        pre_nms_topk_(OP_SINGLE_ARG(int64_t, "pre_nms_topk", 1000)),
+        post_nms_topk_(OP_SINGLE_ARG(int64_t, "post_nms_topk", 1000)),
+        nms_thresh_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
+        min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
+        max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
+        canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
+        canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
+  USE_OPERATOR_FUNCTIONS;
+  void RunOnDevice() override {
+    DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
+  }
+  template <typename T>
+  void DoRunWithType();
+  enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
+ protected:
+  float nms_thresh_;
+  vector<int64_t> strides_;
+  vector<float> ratios_, scales_;
+  int64_t min_level_, max_level_;
+  int64_t pre_nms_topk_, post_nms_topk_;
+  int64_t canonical_level_, canonical_scale_;
+  vector<float> scores_;
+  vector<int64_t> indices_, nms_indices_;
+  vector<vector<float>> cell_anchors_;
+  vector<vector<float>> output_rois_;
+};
+} // namespace dragon
+#endif // DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
--- a/csrc/cxx/setup.py
+++ b/csrc/cxx/setup.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build cpp extensions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import glob
+import dragon
+from dragon.utils import cpp_extension
+from setuptools import setup
+Extension = cpp_extension.CppExtension
+if (dragon.cuda.is_available() and
+        cpp_extension.CUDA_HOME is not None):
+    Extension = cpp_extension.CUDAExtension
+elif dragon.mps.is_available():
+    Extension = cpp_extension.MPSExtension
+def find_sources(*dirs):
+    ext_suffixes = ['.cc']
+    if Extension is cpp_extension.CUDAExtension:
+        ext_suffixes.append('.cu')
+    elif Extension is cpp_extension.MPSExtension:
+        ext_suffixes.append('.mm')
+    sources = []
+    for path in dirs:
+        for ext_suffix in ext_suffixes:
+            sources += glob.glob(path + '/*' + ext_suffix, recursive=True)
+    return sources
+ext_modules = [
+    Extension(
+        name='seetadet.ops._C',
+        sources=find_sources('**'),
+    ),
+]
+setup(
+    name='seetadet',
+    ext_modules=ext_modules,
+    cmdclass={'build_ext': cpp_extension.BuildExtension},
+)
--- a/csrc/cxx/utils/detection.h
+++ b/csrc/cxx/utils/detection.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_H_
+#include "../utils/detection/anchors.h"
+#include "../utils/detection/bbox.h"
+#include "../utils/detection/mask.h"
+#include "../utils/detection/nms.h"
+#include "../utils/detection/proposals.h"
+#include "../utils/detection/types.h"
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_H_
--- a/csrc/cxx/utils/detection/anchors.h
+++ b/csrc/cxx/utils/detection/anchors.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+/*!
+ * Anchor Functions.
+ */
+template <typename IndexT>
+inline void SetGridArgs(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const IndexT* grid_info,
+    vector<GridArgs<IndexT>>& grid_args) {
+  IndexT grid_offset = 0;
+  for (int i = 0; i < grid_args.size(); ++i, grid_info += 2) {
+    auto& args = grid_args[i];
+    args.h = grid_info[0];
+    args.w = grid_info[1];
+    args.size = num_cell_anchors * args.h * args.w;
+    args.offset = grid_offset;
+    grid_offset += args.size;
+  }
+  std::stringstream ss;
+  if (grid_offset != num_anchors) {
+    ss << "Mismatched number of anchors. (Excepted ";
+    ss << num_anchors << ", Got " << grid_offset << ")";
+    for (int i = 0; i < grid_args.size(); ++i) {
+      ss << "\nGrid #" << i << ": "
+         << "A=" << num_cell_anchors << ", H=" << grid_args[i].h
+         << ", W=" << grid_args[i].w << "\n";
+    }
+  }
+  if (!ss.str().empty()) LOG(FATAL) << ss.str();
+}
+template <typename T>
+inline void GenerateAnchors(
+    const int stride,
+    const int num_ratios,
+    const int num_scales,
+    const T* ratios,
+    const T* scales,
+    T* anchors) {
+  T* offset_anchors = anchors;
+  T x = T(0.5) * T(stride), y = T(0.5) * T(stride);
+  for (int i = 0; i < num_ratios; ++i) {
+    const T ratio_w = std::sqrt(T(1) / ratios[i]);
+    const T ratio_h = ratio_w * ratios[i];
+    for (int j = 0; j < num_scales; ++j) {
+      offset_anchors[0] = -x * ratio_w * scales[j];
+      offset_anchors[1] = -y * ratio_h * scales[j];
+      offset_anchors[2] = x * ratio_w * scales[j];
+      offset_anchors[3] = y * ratio_h * scales[j];
+      offset_anchors += 4;
+    }
+  }
+}
+template <typename T>
+inline void GetAnchors(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const GridArgs<int64_t>& args,
+    const T* cell_anchors,
+    const int64_t* indices,
+    T* anchors) {
+  for (int i = 0; i < num_anchors; ++i) {
+    auto index = indices[i];
+    const auto w = index % args.w;
+    index /= args.w;
+    const auto h = index % args.h;
+    index /= args.h;
+    const auto shift_x = T(w * args.stride);
+    const auto shift_y = T(h * args.stride);
+    auto* offset_anchors = anchors + i * 5;
+    const auto* offset_cell_anchors = cell_anchors + index * 4;
+    offset_anchors[0] = shift_x + offset_cell_anchors[0];
+    offset_anchors[1] = shift_y + offset_cell_anchors[1];
+    offset_anchors[2] = shift_x + offset_cell_anchors[2];
+    offset_anchors[3] = shift_y + offset_cell_anchors[3];
+  }
+}
+template <typename T>
+inline void GetAnchors(
+    const int num_anchors,
+    const int num_cell_anchors,
+    const int num_classes,
+    const GridArgs<int64_t>& args,
+    const T* cell_anchors,
+    const int64_t* indices,
+    T* anchors) {
+  for (int i = 0; i < num_anchors; ++i) {
+    auto index = indices[i];
+    index /= num_classes;
+    const auto w = index % args.w;
+    index /= args.w;
+    const auto h = index % args.h;
+    index /= args.h;
+    const auto shift_x = T(w * args.stride);
+    const auto shift_y = T(h * args.stride);
+    auto* offset_anchors = anchors + i * 7 + 1;
+    const auto* offset_cell_anchors = cell_anchors + index * 4;
+    offset_anchors[0] = shift_x + offset_cell_anchors[0];
+    offset_anchors[1] = shift_y + offset_cell_anchors[1];
+    offset_anchors[2] = shift_x + offset_cell_anchors[2];
+    offset_anchors[3] = shift_y + offset_cell_anchors[3];
+  }
+}
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
--- a/csrc/cxx/utils/detection/bbox.h
+++ b/csrc/cxx/utils/detection/bbox.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
+#include "../../utils/detection/types.h"
+#if defined(__CUDACC__)
+#define HOSTDEVICE_DECL inline __host__ __device__
+#else
+#define HOSTDEVICE_DECL inline
+#endif
+namespace dragon {
+namespace detection {
+/*
+ * BBox Functions.
+ */
+template <typename T, class BoxT>
+inline void SortBoxes(const int N, T* data, bool descend = true) {
+  auto* boxes = reinterpret_cast<BoxT*>(data);
+  std::sort(boxes, boxes + N, [descend](BoxT lhs, BoxT rhs) {
+    return descend ? (lhs.score > rhs.score) : (lhs.score < rhs.score);
+  });
+}
+/*
+ * BBox Utilities.
+ */
+namespace utils {
+template <typename T>
+HOSTDEVICE_DECL bool CheckIoU(const T thresh, const T* a, const T* b) {
+#if defined(__CUDACC__)
+  const T x1 = max(a[0], b[0]);
+  const T y1 = max(a[1], b[1]);
+  const T x2 = min(a[2], b[2]);
+  const T y2 = min(a[3], b[3]);
+  const T width = max(T(0), x2 - x1);
+  const T height = max(T(0), y2 - y1);
+#else
+  const T x1 = std::max(a[0], b[0]);
+  const T y1 = std::max(a[1], b[1]);
+  const T x2 = std::min(a[2], b[2]);
+  const T y2 = std::min(a[3], b[3]);
+  const T width = std::max(T(0), x2 - x1);
+  const T height = std::max(T(0), y2 - y1);
+#endif
+  const T inter = width * height;
+  const T Sa = (a[2] - a[0]) * (a[3] - a[1]);
+  const T Sb = (b[2] - b[0]) * (b[3] - b[1]);
+  return inter >= thresh * (Sa + Sb - inter);
+}
+template <typename T>
+inline void BBoxTransform(
+    const T dx,
+    const T dy,
+    const T dw,
+    const T dh,
+    const T im_w,
+    const T im_h,
+    const T im_scale_h,
+    const T im_scale_w,
+    T* bbox) {
+  const T w = bbox[2] - bbox[0];
+  const T h = bbox[3] - bbox[1];
+  const T ctr_x = bbox[0] + T(0.5) * w;
+  const T ctr_y = bbox[1] + T(0.5) * h;
+  const T pred_ctr_x = dx * w + ctr_x;
+  const T pred_ctr_y = dy * h + ctr_y;
+  const T pred_w = std::exp(dw) * w;
+  const T pred_h = std::exp(dh) * h;
+  const T x1 = pred_ctr_x - T(0.5) * pred_w;
+  const T y1 = pred_ctr_y - T(0.5) * pred_h;
+  const T x2 = pred_ctr_x + T(0.5) * pred_w;
+  const T y2 = pred_ctr_y + T(0.5) * pred_h;
+  bbox[0] = std::max(T(0), std::min(x1, im_w)) / im_scale_w;
+  bbox[1] = std::max(T(0), std::min(y1, im_h)) / im_scale_h;
+  bbox[2] = std::max(T(0), std::min(x2, im_w)) / im_scale_w;
+  bbox[3] = std::max(T(0), std::min(y2, im_h)) / im_scale_h;
+}
+template <typename T>
+inline int GetBBoxLevel(
+    const int lvl_min,
+    const int lvl_max,
+    const int lvl0,
+    const int s0,
+    T* bbox) {
+  const T w = bbox[2] - bbox[0];
+  const T h = bbox[3] - bbox[1];
+  if (w <= T(0) || h <= T(0)) return -1;
+  const T s = std::sqrt(w * h);
+  const int lvl = lvl0 + std::log2(s / s0 + T(1e-6));
+  return std::min(std::max(lvl, lvl_min), lvl_max);
+}
+} // namespace utils
+} // namespace detection
+} // namespace dragon
+#undef HOSTDEVICE_DECL
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
--- a/csrc/cxx/utils/detection/cpu/mask.cc
+++ b/csrc/cxx/utils/detection/cpu/mask.cc
+#include <dragon/core/context.h>
+#include "../../../utils/detection/mask.h"
+namespace dragon {
+namespace detection {
+namespace {
+template <typename IndexT>
+inline bool WithinBounds2d(IndexT h, IndexT w, IndexT H, IndexT W) {
+  return h >= IndexT(0) && h < H && w >= IndexT(0) && w < W;
+}
+template <typename T>
+void _PasteMask(
+    const int N,
+    const int H,
+    const int W,
+    const int mask_h,
+    const int mask_w,
+    const T thresh,
+    const T* masks,
+    const float* boxes,
+    uint8_t* im) {
+  const auto HxW = H * W;
+  for (int n = 0; n < N; ++n) {
+    const auto count = H * W;
+    const float* box = boxes + n * 4;
+    const T* mask = masks + n * mask_h * mask_w;
+    uint8_t* offset_im = im + n * H * W;
+    const float box_w_half = (box[2] - box[0]) * 0.5f;
+    const float box_h_half = (box[3] - box[1]) * 0.5f;
+    const float mask_w_half = float(mask_w) * 0.5f;
+    const float mask_h_half = float(mask_w) * 0.5f;
+    for (int index = 0; index < HxW; ++index) {
+      const int w = index % W;
+      const int h = index / W;
+      const float gx = (float(w) + 0.5f - box[0]) / box_w_half;
+      const float gy = (float(h) + 0.5f - box[1]) / box_h_half;
+      const float ix = gx * mask_w_half - 0.5f;
+      const float iy = gy * mask_h_half - 0.5f;
+      const int ix_nw = floorf(ix);
+      const int iy_nw = floorf(iy);
+      const int ix_ne = ix_nw + 1;
+      const int iy_ne = iy_nw;
+      const int ix_sw = ix_nw;
+      const int iy_sw = iy_nw + 1;
+      const int ix_se = ix_nw + 1;
+      const int iy_se = iy_nw + 1;
+      T nw = T((ix_se - ix) * (iy_se - iy));
+      T ne = T((ix - ix_sw) * (iy_sw - iy));
+      T sw = T((ix_ne - ix) * (iy - iy_ne));
+      T se = T((ix - ix_nw) * (iy - iy_nw));
+      T val = T(0);
+      if (WithinBounds2d(iy_nw, ix_nw, mask_h, mask_w)) {
+        val += mask[iy_nw * mask_w + ix_nw] * nw;
+      }
+      if (WithinBounds2d(iy_ne, ix_ne, mask_h, mask_w)) {
+        val += mask[iy_ne * mask_w + ix_ne] * ne;
+      }
+      if (WithinBounds2d(iy_sw, ix_sw, mask_h, mask_w)) {
+        val += mask[iy_sw * mask_w + ix_sw] * sw;
+      }
+      if (WithinBounds2d(iy_se, ix_se, mask_h, mask_w)) {
+        val += mask[iy_se * mask_w + ix_se] * se;
+      }
+      *(offset_im++) = (val >= thresh ? uint8_t(1) : uint8_t(0));
+    }
+  }
+}
+} // namespace
+template <>
+void PasteMask<float, CPUContext>(
+    const int N,
+    const int H,
+    const int W,
+    const int mask_h,
+    const int mask_w,
+    const float thresh,
+    const float* masks,
+    const float* boxes,
+    uint8_t* im,
+    CPUContext* ctx) {
+  _PasteMask(N, H, W, mask_h, mask_w, thresh, masks, boxes, im);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/cpu/nms.cc
+++ b/csrc/cxx/utils/detection/cpu/nms.cc
+#include <dragon/core/context.h>
+#include "../../../utils/detection/bbox.h"
+#include "../../../utils/detection/nms.h"
+namespace dragon {
+namespace detection {
+template <>
+void ApplyNMS<float, CPUContext>(
+    const int N,
+    const int K,
+    const int boxes_offset,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    CPUContext* ctx) {
+  boxes = boxes + boxes_offset;
+  int num_selected = 0;
+  indices.resize(K);
+  vector<char> is_dead(N, 0);
+  for (int i = 0; i < N; ++i) {
+    if (is_dead[i]) continue;
+    indices[num_selected++] = i;
+    if (num_selected >= K) break;
+    for (int j = i + 1; j < N; ++j) {
+      if (is_dead[j]) continue;
+      if (!utils::CheckIoU(thresh, &boxes[i * 5], &boxes[j * 5])) continue;
+      is_dead[j] = 1;
+    }
+  }
+  indices.resize(num_selected);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/cpu/proposals.cc
+++ b/csrc/cxx/utils/detection/cpu/proposals.cc
+#include <dragon/core/context.h>
+#include "../../../utils/detection/proposals.h"
+namespace dragon {
+namespace detection {
+namespace {
+template <typename KeyT, typename ValueT>
+inline void
+ArgPartition(const int N, const int K, const ValueT* values, KeyT* keys) {
+  std::nth_element(keys, keys + K, keys + N, [&values](KeyT lhs, KeyT rhs) {
+    return values[lhs] > values[rhs];
+  });
+}
+} // namespace
+template <>
+void SelectTopK<float, CPUContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* scores,
+    vector<float>& out_scores,
+    vector<int64_t>& out_indices,
+    CPUContext* ctx) {
+  int num_selected = 0;
+  out_indices.resize(N);
+  if (thresh > 0.f) {
+    for (int i = 0; i < N; ++i) {
+      if (scores[i] > thresh) {
+        out_indices[num_selected++] = i;
+      }
+    }
+  } else {
+    num_selected = N;
+    std::iota(out_indices.begin(), out_indices.end(), 0);
+  }
+  if (num_selected > K) {
+    ArgPartition(num_selected, K, scores, out_indices.data());
+    out_scores.resize(K);
+    out_indices.resize(K);
+    for (int i = 0; i < K; ++i) {
+      out_scores[i] = scores[out_indices[i]];
+    }
+  } else {
+    out_scores.resize(num_selected);
+    out_indices.resize(num_selected);
+    for (int i = 0; i < num_selected; ++i) {
+      out_scores[i] = scores[out_indices[i]];
+    }
+  }
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/cuda/mask.cu
+++ b/csrc/cxx/utils/detection/cuda/mask.cu
+#include <dragon/core/context_cuda.h>
+#include "../../../utils/detection/mask.h"
+namespace dragon {
+namespace detection {
+namespace {
+template <typename IndexT>
+inline __device__ bool WithinBounds2d(IndexT h, IndexT w, IndexT H, IndexT W) {
+  return h >= IndexT(0) && h < H && w >= IndexT(0) && w < W;
+}
+template <typename T>
+__global__ void _PasteMask(
+    const int nthreads,
+    const int H,
+    const int W,
+    const int mask_h,
+    const int mask_w,
+    const T thresh,
+    const T* masks,
+    const float* boxes,
+    uint8_t* im) {
+  CUDA_1D_KERNEL_LOOP(index, nthreads) {
+    const int w = index % W;
+    const int h = index / W % H;
+    const int n = index / (H * W);
+    const float* box = boxes + n * 4;
+    const T* mask = masks + n * mask_h * mask_w;
+    const float gx = (float(w) + 0.5f - box[0]) / (box[2] - box[0]) * 2.f;
+    const float gy = (float(h) + 0.5f - box[1]) / (box[3] - box[1]) * 2.f;
+    const float ix = (gx * float(mask_w) - 1.f) * 0.5f;
+    const float iy = (gy * float(mask_h) - 1.f) * 0.5f;
+    const int ix_nw = floorf(ix);
+    const int iy_nw = floorf(iy);
+    const int ix_ne = ix_nw + 1;
+    const int iy_ne = iy_nw;
+    const int ix_sw = ix_nw;
+    const int iy_sw = iy_nw + 1;
+    const int ix_se = ix_nw + 1;
+    const int iy_se = iy_nw + 1;
+    T nw = T((ix_se - ix) * (iy_se - iy));
+    T ne = T((ix - ix_sw) * (iy_sw - iy));
+    T sw = T((ix_ne - ix) * (iy - iy_ne));
+    T se = T((ix - ix_nw) * (iy - iy_nw));
+    T val = T(0);
+    if (WithinBounds2d(iy_nw, ix_nw, mask_h, mask_w)) {
+      val += mask[iy_nw * mask_w + ix_nw] * nw;
+    }
+    if (WithinBounds2d(iy_ne, ix_ne, mask_h, mask_w)) {
+      val += mask[iy_ne * mask_w + ix_ne] * ne;
+    }
+    if (WithinBounds2d(iy_sw, ix_sw, mask_h, mask_w)) {
+      val += mask[iy_sw * mask_w + ix_sw] * sw;
+    }
+    if (WithinBounds2d(iy_se, ix_se, mask_h, mask_w)) {
+      val += mask[iy_se * mask_w + ix_se] * se;
+    }
+    im[index] = (val >= thresh ? uint8_t(1) : uint8_t(0));
+  }
+}
+} // namespace
+template <>
+void PasteMask<float, CUDAContext>(
+    const int N,
+    const int H,
+    const int W,
+    const int mask_h,
+    const int mask_w,
+    const float thresh,
+    const float* masks,
+    const float* boxes,
+    uint8_t* im,
+    CUDAContext* ctx) {
+  const auto NxHxW = N * H * W;
+  _PasteMask<<<CUDA_BLOCKS(NxHxW), CUDA_THREADS, 0, ctx->cuda_stream()>>>(
+      NxHxW, H, W, mask_h, mask_w, thresh, masks, boxes, im);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/cuda/nms.cu
+++ b/csrc/cxx/utils/detection/cuda/nms.cu
+#include <dragon/core/context_cuda.h>
+#include <dragon/core/workspace.h>
+#include "../../../utils/detection/bbox.h"
+#include "../../../utils/detection/nms.h"
+#include "../../../utils/detection/utils.h"
+namespace dragon {
+namespace detection {
+namespace {
+#define NUM_THREADS 64
+template <typename T>
+__global__ void _NonMaxSuppression(
+    const int N,
+    const T thresh,
+    const T* boxes,
+    uint64_t* mask) {
+  const int row_start = blockIdx.y;
+  const int col_start = blockIdx.x;
+  if (row_start > col_start) return;
+  const int row_size = min(N - row_start * NUM_THREADS, NUM_THREADS);
+  const int col_size = min(N - col_start * NUM_THREADS, NUM_THREADS);
+  __shared__ T block_boxes[NUM_THREADS * 4];
+  if (threadIdx.x < col_size) {
+    auto* offset_block_boxes = block_boxes + threadIdx.x * 4;
+    auto* offset_boxes = boxes + (col_start * NUM_THREADS + threadIdx.x) * 5;
+#pragma unroll
+    for (int i = 0; i < 4; ++i) {
+      *(offset_block_boxes++) = *(offset_boxes++);
+    }
+  }
+  __syncthreads();
+  if (threadIdx.x < row_size) {
+    const int index = row_start * NUM_THREADS + threadIdx.x;
+    const T* offset_boxes = boxes + index * 5;
+    uint64_t val = 0;
+    const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
+    for (int i = start; i < col_size; ++i) {
+      if (utils::CheckIoU(thresh, offset_boxes, block_boxes + i * 4)) {
+        val |= (uint64_t(1) << i);
+      }
+    }
+    mask[index * gridDim.x + col_start] = val;
+  }
+}
+} // namespace
+template <>
+void ApplyNMS<float, CUDAContext>(
+    const int N,
+    const int K,
+    const int boxes_offset,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    CUDAContext* ctx) {
+  boxes = boxes + boxes_offset;
+  const auto num_blocks = utils::DivUp(N, NUM_THREADS);
+  auto* NMS_mask = ctx->workspace()->CreateTensor("NMS_mask");
+  NMS_mask->Reshape({N * num_blocks});
+  auto* mask = reinterpret_cast<uint64_t*>(
+      NMS_mask->template mutable_data<int64_t, CUDAContext>());
+  vector<uint64_t> mask_host(N * num_blocks);
+  _NonMaxSuppression<<<
+      dim3(num_blocks, num_blocks),
+      NUM_THREADS,
+      0,
+      ctx->cuda_stream()>>>(N, thresh, boxes, mask);
+  CUDA_CHECK(cudaMemcpyAsync(
+      mask_host.data(),
+      mask,
+      mask_host.size() * sizeof(uint64_t),
+      cudaMemcpyDeviceToHost,
+      ctx->cuda_stream()));
+  ctx->FinishDeviceComputation();
+  vector<uint64_t> is_dead(num_blocks);
+  memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
+  int num_selected = 0;
+  indices.resize(K);
+  for (int i = 0; i < N; ++i) {
+    const int nblock = i / NUM_THREADS, inblock = i % NUM_THREADS;
+    if (!(is_dead[nblock] & (uint64_t(1) << inblock))) {
+      indices[num_selected++] = i;
+      if (num_selected >= K) break;
+      auto* offset_mask = &mask_host[0] + i * num_blocks;
+      for (int j = nblock; j < num_blocks; ++j) {
+        is_dead[j] |= offset_mask[j];
+      }
+    }
+  }
+  indices.resize(num_selected);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/cuda/proposals.cu
+++ b/csrc/cxx/utils/detection/cuda/proposals.cu
+#include <dragon/core/context_cuda.h>
+#include <dragon/core/workspace.h>
+#include <dragon/utils/device/common_thrust.h>
+#include "../../../utils/detection/iterator.h"
+#include "../../../utils/detection/proposals.h"
+namespace dragon {
+namespace detection {
+namespace {
+template <typename KeyT, typename ValueT>
+struct ThresholdFunctor {
+  ThresholdFunctor(ValueT thresh) : thresh_(thresh) {}
+  inline __device__ bool operator()(
+      const thrust::tuple<KeyT, ValueT>& kv) const {
+    return thrust::get<1>(kv) > thresh_;
+  }
+  ValueT thresh_;
+};
+template <typename IterT>
+inline void ArgPartition(const int N, const int K, IterT data) {
+  std::nth_element(
+      data,
+      data + K,
+      data + N,
+      [](const typename IterT::value_type& lhs,
+         const typename IterT::value_type& rhs) {
+        return *lhs.value_ptr > *rhs.value_ptr;
+      });
+}
+} // namespace
+template <>
+void SelectTopK<float, CUDAContext>(
+    const int N,
+    const int K,
+    const float thresh,
+    const float* scores,
+    vector<float>& out_scores,
+    vector<int64_t>& out_indices,
+    CUDAContext* ctx) {
+  int num_selected = N;
+  int64_t* indices = nullptr;
+  if (thresh > 0.f) {
+    indices = ctx->workspace()->data<int64_t, CUDAContext>(N, "BufferKernel");
+    auto policy = thrust::cuda::par.on(ctx->cuda_stream());
+    auto functor = ThresholdFunctor<int64_t, float>(thresh);
+    thrust::sequence(policy, indices, indices + N);
+    auto kv = thrust::make_tuple(indices, const_cast<float*>(scores));
+    auto first = thrust::make_zip_iterator(kv);
+    auto last = thrust::partition(policy, first, first + N, functor);
+    num_selected = last - first;
+  }
+  out_scores.resize(num_selected);
+  out_indices.resize(num_selected);
+  CUDA_CHECK(cudaMemcpyAsync(
+      out_scores.data(),
+      scores,
+      num_selected * sizeof(float),
+      cudaMemcpyDeviceToHost,
+      ctx->cuda_stream()));
+  if (thresh > 0.f) {
+    CUDA_CHECK(cudaMemcpyAsync(
+        out_indices.data(),
+        indices,
+        num_selected * sizeof(int64_t),
+        cudaMemcpyDeviceToHost,
+        ctx->cuda_stream()));
+  } else {
+    std::iota(out_indices.begin(), out_indices.end(), 0);
+  }
+  ctx->FinishDeviceComputation();
+  if (num_selected > K) {
+    auto iter = KeyValueMapIterator<KeyValueMap<int64_t, float>>(
+        out_indices.data(), out_scores.data());
+    ArgPartition(num_selected, K, iter);
+    out_scores.resize(K);
+    out_indices.resize(K);
+  }
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/iterator.h
+++ b/csrc/cxx/utils/detection/iterator.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *     <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
+#include <dragon/core/common.h>
+namespace dragon {
+namespace detection {
+template <typename MapT>
+class KeyValueMapIterator
+    : public std::iterator<std::input_iterator_tag, MapT> {
+ public:
+  typedef KeyValueMapIterator self_type;
+  typedef ptrdiff_t difference_type;
+  typedef MapT value_type;
+  typedef MapT& reference;
+  KeyValueMapIterator(
+      typename MapT::key_type* key_ptr,
+      typename MapT::value_type* value_ptr)
+      : key_ptr_(key_ptr), value_ptr_(value_ptr) {}
+  self_type operator++(int) {
+    self_type ret = *this;
+    key_ptr_++;
+    value_ptr_++;
+    return ret;
+  }
+  self_type operator++() {
+    key_ptr_++;
+    value_ptr_++;
+    return *this;
+  }
+  self_type operator--() {
+    key_ptr_--;
+    value_ptr_--;
+    return *this;
+  }
+  self_type operator--(int) {
+    self_type ret = *this;
+    key_ptr_--;
+    value_ptr_--;
+    return ret;
+  }
+  reference operator*() const {
+    if (map_.key_ptr != key_ptr_) {
+      map_.key_ptr = key_ptr_;
+      map_.value_ptr = value_ptr_;
+    }
+    return map_;
+  }
+  self_type operator+(difference_type n) const {
+    return self_type(key_ptr_ + n, value_ptr_ + n);
+  }
+  self_type& operator+=(difference_type n) {
+    key_ptr_ += n;
+    value_ptr_ += n;
+    return *this;
+  }
+  self_type operator-(difference_type n) const {
+    return self_type(key_ptr_ - n, value_ptr_ - n);
+  }
+  self_type& operator-=(difference_type n) {
+    key_ptr_ -= n;
+    value_ptr_ -= n;
+    return *this;
+  }
+  difference_type operator-(self_type other) const {
+    return key_ptr_ - other.key_ptr_;
+  }
+  bool operator<(const self_type& rhs) const {
+    return key_ptr_ < rhs.key_ptr_;
+  }
+  bool operator<=(const self_type& rhs) const {
+    return key_ptr_ <= rhs.key_ptr_;
+  }
+  bool operator==(const self_type& rhs) const {
+    return key_ptr_ == rhs.key_ptr_;
+  }
+  bool operator!=(const self_type& rhs) const {
+    return key_ptr_ != rhs.key_ptr_;
+  }
+ private:
+  mutable MapT map_;
+  typename MapT::key_type* key_ptr_;
+  typename MapT::value_type* value_ptr_;
+};
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
--- a/csrc/cxx/utils/detection/mask.h
+++ b/csrc/cxx/utils/detection/mask.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_MASK_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_MASK_H_
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+/*
+ * Mask Functions.
+ */
+template <typename T, class Context>
+void PasteMask(
+    const int N,
+    const int H,
+    const int W,
+    const int mask_h,
+    const int mask_w,
+    const float thresh,
+    const T* masks,
+    const float* boxes,
+    uint8_t* im,
+    Context* ctx);
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_MASK_H_
--- a/csrc/cxx/utils/detection/mps/mask.mm
+++ b/csrc/cxx/utils/detection/mps/mask.mm
+#include <dragon/core/context_mps.h>
+#include "../../../utils/detection/mask.h"
+namespace dragon {
+namespace detection {
+namespace {
+const static string METAL_SHADERS = R"(
+#include <metal_stdlib>
+using namespace metal;
+constant int int_arg1 [[function_constant(0)]];     // H
+constant int int_arg2 [[function_constant(1)]];     // W
+constant int int_arg3 [[function_constant(2)]];     // mask_h
+constant int int_arg4 [[function_constant(3)]];     // mask_w
+constant float float_arg1 [[function_constant(4)]]; // thresh
+template <typename IndexT>
+bool WithinBounds2d(IndexT h, IndexT w, IndexT H, IndexT W) {
+  return h >= IndexT(0) && h < H && w >= IndexT(0) && w < W;
+}
+template <typename T>
+kernel void PasteMask(
+    device const T* masks,
+    device const float* boxes,
+    device uint8_t* im,
+    const uint index [[thread_position_in_grid]]) {
+  const int w = int(index) % int_arg2;
+  const int h = int(index) / int_arg2 % int_arg1;
+  const int n = int(index) / (int_arg2 * int_arg1);
+  device const float* box = boxes + n * 4;
+  device const T* mask = masks + n * int_arg3 * int_arg4;
+  const float gx = (float(w) + 0.5f - box[0]) / (box[2] - box[0]) * 2.f;
+  const float gy = (float(h) + 0.5f - box[1]) / (box[3] - box[1]) * 2.f;
+  const float ix = (gx * float(int_arg4) - 1.f) * 0.5f;
+  const float iy = (gy * float(int_arg3) - 1.f) * 0.5f;
+  const int ix_nw = floor(ix);
+  const int iy_nw = floor(iy);
+  const int ix_ne = ix_nw + 1;
+  const int iy_ne = iy_nw;
+  const int ix_sw = ix_nw;
+  const int iy_sw = iy_nw + 1;
+  const int ix_se = ix_nw + 1;
+  const int iy_se = iy_nw + 1;
+  T nw = T((ix_se - ix) * (iy_se - iy));
+  T ne = T((ix - ix_sw) * (iy_sw - iy));
+  T sw = T((ix_ne - ix) * (iy - iy_ne));
+  T se = T((ix - ix_nw) * (iy - iy_nw));
+  T val = T(0);
+  if (WithinBounds2d(iy_nw, ix_nw, int_arg3, int_arg4)) {
+    val += mask[iy_nw * int_arg4 + ix_nw] * nw;
+  }
+  if (WithinBounds2d(iy_ne, ix_ne, int_arg3, int_arg4)) {
+    val += mask[iy_ne * int_arg4 + ix_ne] * ne;
+  }
+  if (WithinBounds2d(iy_sw, ix_sw, int_arg3, int_arg4)) {
+    val += mask[iy_sw * int_arg4 + ix_sw] * sw;
+  }
+  if (WithinBounds2d(iy_se, ix_se, int_arg3, int_arg4)) {
+    val += mask[iy_se * int_arg4 + ix_se] * se;
+  }
+  im[index] = (val >= T(float_arg1) ? uint8_t(1) : uint8_t(0));
+}
+#define INSTANTIATE_KERNEL(T) \
+  template [[host_name("PasteMask_"#T)]] \
+  kernel void PasteMask( \
+    device const T*, device const float*, device uint8_t*, uint);
+INSTANTIATE_KERNEL(float);
+#undef INSTANTIATE_KERNEL
+)";
+} // namespace
+template <>
+void PasteMask<float, MPSContext>(
+    const int N,
+    const int H,
+    const int W,
+    const int mask_h,
+    const int mask_w,
+    const float thresh,
+    const float* masks,
+    const float* boxes,
+    uint8_t* im,
+    MPSContext* ctx) {
+  auto kernel = MPSKernel::TypedString<float>("PasteMask");
+  auto args = vector<MPSConstant>({
+      MPSConstant(&H, MTLDataTypeInt, 0),
+      MPSConstant(&W, MTLDataTypeInt, 1),
+      MPSConstant(&mask_h, MTLDataTypeInt, 2),
+      MPSConstant(&mask_w, MTLDataTypeInt, 3),
+      MPSConstant(&thresh, MTLDataTypeFloat, 4),
+  });
+  auto* command_buffer = ctx->mps_stream()->command_buffer();
+  auto* encoder = [command_buffer computeCommandEncoder];
+  auto* pso = MPSKernel(kernel, METAL_SHADERS).GetState(ctx, args);
+  [encoder setComputePipelineState:pso];
+  [encoder setBuffer:id<MTLBuffer>(masks) offset:0 atIndex:0];
+  [encoder setBuffer:id<MTLBuffer>(boxes) offset:0 atIndex:1];
+  [encoder setBuffer:id<MTLBuffer>(im) offset:0 atIndex:2];
+  MPSDispatchThreads((N * H * W), encoder, pso);
+  [encoder endEncoding];
+  [encoder release];
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/mps/nms.mm
+++ b/csrc/cxx/utils/detection/mps/nms.mm
+#include <dragon/core/context_mps.h>
+#include <dragon/core/workspace.h>
+#include "../../../utils/detection/nms.h"
+#include "../../../utils/detection/utils.h"
+namespace dragon {
+namespace detection {
+namespace {
+#define NUM_THREADS 64
+const static string METAL_SHADERS = R"(
+#include <metal_stdlib>
+using namespace metal;
+constant uint uint_arg1 [[function_constant(0)]];
+constant float float_arg1 [[function_constant(1)]];
+template <typename T>
+bool CheckIoU(const T thresh, device const T* a, threadgroup T* b) {
+  const T x1 = max(a[0], b[0]);
+  const T y1 = max(a[1], b[1]);
+  const T x2 = min(a[2], b[2]);
+  const T y2 = min(a[3], b[3]);
+  const T width = max(T(0), x2 - x1);
+  const T height = max(T(0), y2 - y1);
+  const T inter = width * height;
+  const T Sa = (a[2] - a[0]) * (a[3] - a[1]);
+  const T Sb = (b[2] - b[0]) * (b[3] - b[1]);
+  return inter >= thresh * (Sa + Sb - inter);
+}
+template <typename T>
+kernel void NonMaxSuppression(
+    device const T* boxes,
+    device uint64_t* mask,
+    const uint2 gridDim [[threadgroups_per_grid]],
+    const uint2 blockIdx [[threadgroup_position_in_grid]],
+    const uint2 threadIdx [[thread_position_in_threadgroup]]) {
+  const uint row_start = blockIdx.y;
+  const uint col_start = blockIdx.x;
+  if (row_start > col_start) return;
+  const uint row_size = min(uint_arg1 - row_start * uint(64), uint(64));
+  const uint col_size = min(uint_arg1 - col_start * uint(64), uint(64));
+  threadgroup T block_boxes[256];
+  if (threadIdx.x < col_size) {
+    threadgroup T* offset_block_boxes = block_boxes + threadIdx.x * 4;
+    device const T* offset_boxes = boxes + (col_start * uint(64) + threadIdx.x) * 5;
+    for (int i = 0; i < 4; ++i) {
+      *(offset_block_boxes++) = *(offset_boxes++);
+    }
+  }
+  threadgroup_barrier(mem_flags::mem_threadgroup);
+  if (threadIdx.x < row_size) {
+    const uint index = row_start * uint(64) + threadIdx.x;
+    device const T* offset_boxes = boxes + index * 5;
+    uint64_t val = 0;
+    const uint start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
+    for (uint i = start; i < col_size; ++i) {
+      if (CheckIoU(T(float_arg1), offset_boxes, block_boxes + i * 4)) {
+        val |= (uint64_t(1) << i);
+      }
+    }
+    mask[index * gridDim.x + col_start] = val;
+  }
+}
+#define INSTANTIATE_KERNEL(T) \
+  template [[host_name("NonMaxSuppression_"#T)]] \
+  kernel void NonMaxSuppression( \
+      device const T*, device uint64_t*, uint2, uint2, uint2);
+INSTANTIATE_KERNEL(float);
+#undef INSTANTIATE_KERNEL
+)";
+} // namespace
+template <>
+void ApplyNMS<float, MPSContext>(
+    const int N,
+    const int K,
+    const int boxes_offset,
+    const float thresh,
+    const float* boxes,
+    vector<int64_t>& indices,
+    MPSContext* ctx) {
+  const auto num_blocks = utils::DivUp(N, NUM_THREADS);
+  auto* NMS_mask = ctx->workspace()->CreateTensor("NMS_mask");
+  NMS_mask->Reshape({N * num_blocks});
+  auto* mask = reinterpret_cast<uint64_t*>(
+      NMS_mask->template mutable_data<int64_t, MPSContext>());
+  auto kernel = MPSKernel::TypedString<float>("NonMaxSuppression");
+  const uint arg1 = N;
+  auto args = vector<MPSConstant>({
+      MPSConstant(&arg1, MTLDataTypeUInt, 0),
+      MPSConstant(&thresh, MTLDataTypeFloat, 1),
+  });
+  auto* command_buffer = ctx->mps_stream()->command_buffer();
+  auto* encoder = [command_buffer computeCommandEncoder];
+  auto* pso = MPSKernel(kernel, METAL_SHADERS).GetState(ctx, args);
+  [encoder setComputePipelineState:pso];
+  [encoder setBuffer:id<MTLBuffer>(boxes) offset:boxes_offset * 4 atIndex:0];
+  [encoder setBuffer:id<MTLBuffer>(mask) offset:0 atIndex:1];
+  [encoder dispatchThreadgroups:MTLSizeMake(num_blocks, num_blocks, 1)
+          threadsPerThreadgroup:MTLSizeMake(NUM_THREADS, 1, 1)];
+  [encoder endEncoding];
+  [encoder release];
+  ctx->FinishDeviceComputation();
+  mask = reinterpret_cast<uint64_t*>(
+      const_cast<int64_t*>(NMS_mask->template data<int64_t, CPUContext>()));
+  vector<uint64_t> is_dead(num_blocks);
+  memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
+  int num_selected = 0;
+  indices.resize(K);
+  for (int i = 0; i < N; ++i) {
+    const int nblock = i / NUM_THREADS, inblock = i % NUM_THREADS;
+    if (!(is_dead[nblock] & (uint64_t(1) << inblock))) {
+      indices[num_selected++] = i;
+      if (num_selected >= K) break;
+      auto* offset_mask = mask + i * num_blocks;
+      for (int j = nblock; j < num_blocks; ++j) {
+        is_dead[j] |= offset_mask[j];
+      }
+    }
+  }
+  indices.resize(num_selected);
+}
+} // namespace detection
+} // namespace dragon
--- a/csrc/cxx/utils/detection/nms.h
+++ b/csrc/cxx/utils/detection/nms.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+template <typename T, class Context>
+void ApplyNMS(
+    const int N,
+    const int K,
+    const int boxes_offset,
+    const T thresh,
+    const T* boxes,
+    vector<int64_t>& indices,
+    Context* ctx);
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
--- a/csrc/cxx/utils/detection/proposals.h
+++ b/csrc/cxx/utils/detection/proposals.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
+#include "../../utils/detection/bbox.h"
+#include "../../utils/detection/types.h"
+namespace dragon {
+namespace detection {
+template <typename T, class Context>
+void SelectTopK(
+    const int N,
+    const int K,
+    const float thresh,
+    const T* input_scores,
+    vector<T>& output_scores,
+    vector<int64_t>& output_indices,
+    Context* ctx);
+template <typename T>
+void DecodeProposals(
+    const int num_proposals,
+    const int num_anchors,
+    const ImageArgs<int64_t>& im_args,
+    const GridArgs<int64_t>& grid_args,
+    const T* scores,
+    const T* deltas,
+    const int64_t* indices,
+    T* proposals) {
+  T* offset_proposals = proposals;
+  const int64_t index_min = grid_args.offset;
+  const T* offset_dx = deltas;
+  const T* offset_dy = deltas + num_anchors;
+  const T* offset_dw = deltas + num_anchors * 2;
+  const T* offset_dh = deltas + num_anchors * 3;
+  for (int i = 0; i < num_proposals; ++i) {
+    const auto index = indices[i] + index_min;
+    utils::BBoxTransform(
+        offset_dx[index],
+        offset_dy[index],
+        offset_dw[index],
+        offset_dh[index],
+        T(im_args.w),
+        T(im_args.h),
+        T(1),
+        T(1),
+        offset_proposals);
+    offset_proposals[4] = scores[i];
+    offset_proposals += 5;
+  }
+}
+template <typename T>
+void DecodeDetections(
+    const int num_dets,
+    const int num_anchors,
+    const int num_classes,
+    const ImageArgs<int64_t>& im_args,
+    const GridArgs<int64_t>& grid_args,
+    const T* scores,
+    const T* deltas,
+    const int64_t* indices,
+    T* dets) {
+  T* offset_dets = dets;
+  const int64_t index_min = num_classes * grid_args.offset;
+  const T* offset_dx = deltas;
+  const T* offset_dy = deltas + num_anchors;
+  const T* offset_dw = deltas + num_anchors * 2;
+  const T* offset_dh = deltas + num_anchors * 3;
+  for (int i = 0; i < num_dets; ++i) {
+    const auto index = (indices[i] + index_min) / num_classes;
+    utils::BBoxTransform(
+        offset_dx[index],
+        offset_dy[index],
+        offset_dw[index],
+        offset_dh[index],
+        T(im_args.w),
+        T(im_args.h),
+        T(im_args.scale_h),
+        T(im_args.scale_w),
+        offset_dets + 1);
+    offset_dets[0] = T(im_args.batch_ind);
+    offset_dets[5] = scores[i];
+    offset_dets[6] = T((indices[i] + index_min) % num_classes + 1);
+    offset_dets += 7;
+  }
+}
+template <typename T>
+inline void ApplyHistogram(
+    const int N,
+    const int lvl_min,
+    const int lvl_max,
+    const int lvl0,
+    const int s0,
+    const T* boxes,
+    const T* batch_indices,
+    const int64_t* box_indices,
+    vector<vector<T>>& output_rois) {
+  int K = 0;
+  vector<int> keep_indices(N), bin_indices(N);
+  vector<int> bin_count(lvl_max - lvl_min + 1, 0);
+  for (int i = 0; i < N; ++i) {
+    const T* offset_boxes = boxes + box_indices[i] * 5;
+    auto lvl = utils::GetBBoxLevel(lvl_min, lvl_max, lvl0, s0, offset_boxes);
+    if (lvl < 0) continue; // Empty.
+    keep_indices[K++] = i;
+    bin_indices[i] = lvl - lvl_min;
+    bin_count[lvl - lvl_min]++;
+  }
+  keep_indices.resize(K);
+  output_rois.resize(lvl_max - lvl_min + 1);
+  for (int i = 0; i < output_rois.size(); ++i) {
+    auto& rois = output_rois[i];
+    rois.resize(std::max(bin_count[i], 1) * 5, T(0));
+    if (bin_count[i] == 0) rois[0] = T(-1); // Ignored.
+  }
+  for (auto i : keep_indices) {
+    const T* offset_boxes = boxes + box_indices[i] * 5;
+    const auto bin_index = bin_indices[i];
+    const auto roi_index = --bin_count[bin_index];
+    auto& rois = output_rois[bin_index];
+    T* offset_rois = rois.data() + roi_index * 5;
+    offset_rois[0] = batch_indices[i];
+    offset_rois[1] = offset_boxes[0];
+    offset_rois[2] = offset_boxes[1];
+    offset_rois[3] = offset_boxes[2];
+    offset_rois[4] = offset_boxes[3];
+  }
+}
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
--- a/csrc/cxx/utils/detection/types.h
+++ b/csrc/cxx/utils/detection/types.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *    <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
+#include <dragon/core/common.h>
+namespace dragon {
+namespace detection {
+template <typename T>
+struct Box4d {
+  T x1, y1, x2, y2;
+};
+template <typename T>
+struct Box5d {
+  T x1, y1, x2, y2, score;
+};
+template <typename IndexT>
+struct ImageArgs {
+  ImageArgs(const float* im_info) {
+    h = im_info[0], w = im_info[1];
+    scale_h = im_info[2], scale_w = im_info[3];
+  }
+  IndexT batch_ind, h, w;
+  float scale_h, scale_w;
+};
+template <typename IndexT>
+struct GridArgs {
+  IndexT h, w, stride, size, offset;
+};
+template <typename KeyT, typename ValueT>
+struct KeyValueMap {
+  typedef KeyT key_type;
+  typedef ValueT value_type;
+  friend void swap(KeyValueMap& x, KeyValueMap& y) {
+    std::swap(*x.key_ptr, *y.key_ptr);
+    std::swap(*x.value_ptr, *y.value_ptr);
+  }
+  KeyT* key_ptr = nullptr;
+  ValueT* value_ptr = nullptr;
+};
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
--- a/csrc/cxx/utils/detection/utils.h
+++ b/csrc/cxx/utils/detection/utils.h
+/*!
+ * Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+ *
+ * Licensed under the BSD 2-Clause License.
+ * You should have received a copy of the BSD 2-Clause License
+ * along with the software. If not, See,
+ *
+ *     <https://opensource.org/licenses/BSD-2-Clause>
+ *
+ * ------------------------------------------------------------
+ */
+#ifndef DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
+#define DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
+namespace dragon {
+namespace detection {
+/*
+ * Detection Utilities.
+ */
+namespace utils {
+template <typename T>
+inline T DivUp(const T a, const T b) {
+  return (a + b - T(1)) / b;
+}
+} // namespace utils
+} // namespace detection
+} // namespace dragon
+#endif // DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
--- a/csrc/pyx/cython_bbox.pyx
+++ b/csrc/pyx/cython_bbox.pyx
+# --------------------------------------------------------
+# Fast R-CNN
+# Copyright (c) 2015 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Written by Sergey Karayev
+# --------------------------------------------------------
+cimport cython
+import numpy as np
+cimport numpy as np
+DTYPE = np.float
+ctypedef np.float_t DTYPE_t
+@cython.boundscheck(False)
+def bbox_overlaps(
+        np.ndarray[DTYPE_t, ndim=2] boxes,
+        np.ndarray[DTYPE_t, ndim=2] query_boxes):
+    """
+    Parameters
+    ----------
+    boxes: (N, 4) ndarray of float
+    query_boxes: (K, 4) ndarray of float
+    Returns
+    -------
+    overlaps: (N, K) ndarray of overlap between boxes and query_boxes
+    """
+    cdef unsigned int N = boxes.shape[0]
+    cdef unsigned int K = query_boxes.shape[0]
+    cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
+    cdef DTYPE_t iw, ih, box_area
+    cdef DTYPE_t ua
+    cdef unsigned int k, n
+    with nogil:
+        for k in range(K):
+            box_area = (
+                (query_boxes[k, 2] - query_boxes[k, 0]) *
+                (query_boxes[k, 3] - query_boxes[k, 1])
+            )
+            for n in range(N):
+                iw = (
+                    min(boxes[n, 2], query_boxes[k, 2]) -
+                    max(boxes[n, 0], query_boxes[k, 0])
+                )
+                if iw > 0:
+                    ih = (
+                        min(boxes[n, 3], query_boxes[k, 3]) -
+                        max(boxes[n, 1], query_boxes[k, 1])
+                    )
+                    if ih > 0:
+                        ua = float(
+                            (boxes[n, 2] - boxes[n, 0]) *
+                            (boxes[n, 3] - boxes[n, 1]) +
+                            box_area - iw * ih
+                        )
+                        overlaps[n, k] = iw * ih / ua
+    return overlaps
--- a/csrc/pyx/cython_nms.pyx
+++ b/csrc/pyx/cython_nms.pyx
+# --------------------------------------------------------
+# Fast R-CNN
+# Copyright (c) 2015 Microsoft
+# Licensed under The MIT License [see LICENSE for details]
+# Written by Ross Girshick
+# --------------------------------------------------------
+cimport cython
+import numpy as np
+cimport numpy as np
+cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
+    return a if a >= b else b
+cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
+    return a if a <= b else b
+@cython.boundscheck(False)
+@cython.cdivision(True)
+@cython.wraparound(False)
+def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
+    cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
+    cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
+    cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
+    cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
+    cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
+    cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1) * (y2 - y1)
+    cdef np.ndarray[np.intp_t, ndim=1] order = scores.argsort()[::-1]
+    cdef int ndets = dets.shape[0]
+    cdef np.ndarray[np.int_t, ndim=1] suppressed = \
+            np.zeros((ndets), dtype=np.int)
+    # nominal indices
+    cdef int _i, _j
+    # sorted indices
+    cdef int i, j
+    # temp variables for box i's (the box currently under consideration)
+    cdef np.float32_t ix1, iy1, ix2, iy2, iarea
+    # variables for computing overlap with box j (lower scoring box)
+    cdef np.float32_t xx1, yy1, xx2, yy2
+    cdef np.float32_t w, h
+    cdef np.float32_t inter, ovr
+    keep = []
+    for _i in range(ndets):
+        i = order[_i]
+        if suppressed[i] == 1:
+            continue
+        keep.append(i)
+        ix1 = x1[i]
+        iy1 = y1[i]
+        ix2 = x2[i]
+        iy2 = y2[i]
+        iarea = areas[i]
+        for _j in range(_i + 1, ndets):
+            j = order[_j]
+            if suppressed[j] == 1:
+                continue
+            xx1 = max(ix1, x1[j])
+            yy1 = max(iy1, y1[j])
+            xx2 = min(ix2, x2[j])
+            yy2 = min(iy2, y2[j])
+            w = max(0.0, xx2 - xx1)
+            h = max(0.0, yy2 - yy1)
+            inter = w * h
+            ovr = inter / (iarea + areas[j] - inter)
+            if ovr >= thresh:
+                suppressed[j] = 1
+    return keep
+@cython.boundscheck(False)
+@cython.cdivision(True)
+@cython.wraparound(False)
+def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float thresh,
+                 unsigned int method=0, float sigma=0.5, float score_thresh=0.001):
+    cdef unsigned int N = boxes.shape[0]
+    cdef float iw, ih, box_area
+    cdef float ua
+    cdef int pos = 0
+    cdef float maxscore = 0
+    cdef int maxpos = 0
+    cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov
+    for i in range(N):
+        maxscore = boxes[i, 4]
+        maxpos = i
+        tx1 = boxes[i,0]
+        ty1 = boxes[i,1]
+        tx2 = boxes[i,2]
+        ty2 = boxes[i,3]
+        ts = boxes[i,4]
+        pos = i + 1
+        # get max box
+        while pos < N:
+            if maxscore < boxes[pos, 4]:
+                maxscore = boxes[pos, 4]
+                maxpos = pos
+            pos = pos + 1
+        # add max box as a detection
+        boxes[i,0] = boxes[maxpos,0]
+        boxes[i,1] = boxes[maxpos,1]
+        boxes[i,2] = boxes[maxpos,2]
+        boxes[i,3] = boxes[maxpos,3]
+        boxes[i,4] = boxes[maxpos,4]
+        # swap ith box with position of max box
+        boxes[maxpos,0] = tx1
+        boxes[maxpos,1] = ty1
+        boxes[maxpos,2] = tx2
+        boxes[maxpos,3] = ty2
+        boxes[maxpos,4] = ts
+        tx1 = boxes[i,0]
+        ty1 = boxes[i,1]
+        tx2 = boxes[i,2]
+        ty2 = boxes[i,3]
+        ts = boxes[i,4]
+        pos = i + 1
+        # NMS iterations, note that N changes if detection boxes fall below threshold
+        while pos < N:
+            x1 = boxes[pos, 0]
+            y1 = boxes[pos, 1]
+            x2 = boxes[pos, 2]
+            y2 = boxes[pos, 3]
+            s = boxes[pos, 4]
+            area = (x2 - x1) * (y2 - y1)
+            iw = min(tx2, x2) - max(tx1, x1)
+            if iw > 0:
+                ih = min(ty2, y2) - max(ty1, y1)
+                if ih > 0:
+                    ua = float((tx2 - tx1) * (ty2 - ty1) + area - iw * ih)
+                    ov = iw * ih / ua #iou between max box and detection box
+                    if method == 1: # linear
+                        if ov > thresh:
+                            weight = 1 - ov
+                        else:
+                            weight = 1
+                    elif method == 2: # gaussian
+                        weight = np.exp(-(ov * ov) / sigma)
+                    else: # original NMS
+                        if ov > thresh:
+                            weight = 0
+                        else:
+                            weight = 1
+                    boxes[pos, 4] = weight * boxes[pos, 4]
+            # if box score falls below threshold, discard the box by swapping with last box
+            # update N
+                    if boxes[pos, 4] < score_thresh:
+                        boxes[pos,0] = boxes[N-1, 0]
+                        boxes[pos,1] = boxes[N-1, 1]
+                        boxes[pos,2] = boxes[N-1, 2]
+                        boxes[pos,3] = boxes[N-1, 3]
+                        boxes[pos,4] = boxes[N-1, 4]
+                        N = N - 1
+                        pos = pos - 1
+            pos = pos + 1
+    keep = [i for i in range(N)]
+    return keep
--- a/csrc/pyx/setup.py
+++ b/csrc/pyx/setup.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build cython extensions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from distutils.core import setup
+from distutils.extension import Extension
+import os
+from Cython.Build import cythonize
+from Cython.Distutils import build_ext
+import numpy as np
+def clean_builds():
+    """Clean the builds."""
+    for file in os.listdir('./'):
+        if file.endswith('.c'):
+            os.remove(file)
+ext_modules = [
+    Extension(
+        'seetadet.utils.bbox.cython_bbox',
+        ['cython_bbox.pyx'],
+        extra_compile_args=['-w'],
+        include_dirs=[np.get_include()],
+    ),
+    Extension(
+        'seetadet.utils.nms.cython_nms',
+        ['cython_nms.pyx'],
+        extra_compile_args=['-w'],
+        include_dirs=[np.get_include()],
+    ),
+]
+setup(
+    name='seetadet',
+    ext_modules=cythonize(
+        ext_modules, compiler_directives={'language_level': '3'}),
+    cmdclass={'build_ext': build_ext},
+)
+clean_builds()
--- a/data/datasets/README.md
+++ b/data/datasets/README.md
+# Datasets
+## Introduction
+This folder is kept for the record and json datasets.
+Please prepare the datasets following the [documentation](../../scripts/datasets/README.md).
--- a/data/images/coco_val2017_000000001000.jpg
+++ b/data/images/coco_val2017_000000001000.jpg
--- a/data/images/coco_val2017_000000031322.jpg
+++ b/data/images/coco_val2017_000000031322.jpg
--- a/data/images/coco_val2017_000000113589.jpg
+++ b/data/images/coco_val2017_000000113589.jpg
--- a/data/images/coco_val2017_000000560256.jpg
+++ b/data/images/coco_val2017_000000560256.jpg
--- a/data/images/coco_val2017_000000570688.jpg
+++ b/data/images/coco_val2017_000000570688.jpg
--- a/data/pretrained/README.md
+++ b/data/pretrained/README.md
+# Pretrained Models
+## Introduction
+This folder is kept for the pretrained models.
+## ImageNet Pretrained Models
+### Training settings
+- ResNet models trained with 200 epochs follow the procedure in arXiv.1812.01187.
+### ResNet
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [R50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls90e.pkl) | 90e | 76.53 | 93.16 | Ours |
+| [R50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls200e.pkl) | 200e | 78.64 | 94.30 | Ours |
+| [R50-A](https://dragon.seetatech.com/download/seetadet/pretrained/R-50-A_in1k_cls120e.pkl) | 120e | 75.30 | 92.20 | MSRA |
+### MobileNet
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [MobileNetV2](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV2_in1k_cls300e.pkl) | 300e | 71.88 | 90.29 | TorchVision |
+| [MobileNetV3L](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV3L_in1k_cls600e.pkl) | 600e | 74.04 | 91.34 | TorchVision |
+### VGG
+| Model | Lr sched | Acc@1 | Acc@5 | Source |
+| :---: | :------: | :---: | :---: | :----: |
+| [VGG16-FCN](https://dragon.seetatech.com/download/seetadet/pretrained/VGG-16-FCN_in1k.pkl) | - | - | - | weiliu89 |
--- a/requirements.txt
+++ b/requirements.txt
+# Python dependencies required for development.
+opencv-python
+Pillow
+pyyaml
+prettytable
+matplotlib
+codewithgpu
+shapely
+Cython
+pycocotools
--- a/scripts/datasets/README.md
+++ b/scripts/datasets/README.md
+# Prepare Datasets
+## Create Datasets for PASCAL VOC
+We assume that raw dataset has the following structure:
+```
+VOC<year>
+|_ JPEGImages
+|  |_ <im-1-name>.jpg
+|  |_ ...
+|  |_ <im-N-name>.jpg
+|_ Annotations
+|  |_ <im-1-name>.xml
+|  |_ ...
+|  |_ <im-N-name>.xml
+|_ ImageSets
+|  |_ Main
+|  |  |_ trainval.txt
+|  |  |_ test.txt
+|  |  |_ ...
+```
+Create record and json dataset by:
+```
+python pascal_voc.py \
+  --rec /path/to/datasets/voc_trainval0712 \
+  --gt /path/to/datasets/voc_trainval0712.json \
+  --images /path/to/VOC2007/JPEGImages \
+           /path/to/VOC2012/JPEGImages \
+  --annotations /path/to/VOC2007/Annotations \
+                /path/to/VOC2012/Annotations \
+  --splits /path/to/VOC2007/ImageSets/Main/trainval.txt \
+           /path/to/VOC2012/ImageSets/Main/trainval.txt
+```
+## Create Datasets for COCO
+We assume that raw dataset has the following structure:
+```
+COCO
+|_ images
+|  |_ train2017
+|  |  |_ <im-1-name>.jpg
+|  |  |_ ...
+|  |  |_ <im-N-name>.jpg
+|_ annotations
+|  |_ instances_train2017.json
+|  |_ ...
+```
+Create record dataset by:
+```
+python coco.py \
+  --rec /path/to/datasets/coco_train2017 \
+  --images /path/to/COCO/images/train2017 \
+  --annotations /path/to/COCO/annotations/instances_train2017.json
+```
--- a/scripts/datasets/coco.py
+++ b/scripts/datasets/coco.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare MS COCO datasets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import os
+import sys
+import time
+import codewithgpu
+from pycocotools.coco import COCO
+from pycocotools.mask import frPyObjects
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare MS COCO datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to write record dataset')
+    parser.add_argument(
+        '--images',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of images folder')
+    parser.add_argument(
+        '--annotations',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of annotations folder')
+    parser.add_argument(
+        '--splits',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of split file')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def make_example(img_id, img_file, cocoGt):
+    """Return the record example."""
+    img_meta = cocoGt.imgs[img_id]
+    img_anns = cocoGt.loadAnns(cocoGt.getAnnIds(imgIds=[img_id]))
+    cat_id_to_cat = dict((v['id'], v['name'])
+                         for v in cocoGt.cats.values())
+    with open(img_file, 'rb') as f:
+        img_bytes = bytes(f.read())
+    height, width = img_meta['height'], img_meta['width']
+    example = {'id': str(img_id), 'height': height, 'width': width,
+               'depth': 3, 'content': img_bytes, 'object': []}
+    for ann in img_anns:
+        x1 = float(max(0, ann['bbox'][0]))
+        y1 = float(max(0, ann['bbox'][1]))
+        x2 = float(min(width, x1 + max(0, ann['bbox'][2])))
+        y2 = float(min(height, y1 + max(0, ann['bbox'][3])))
+        mask, polygons = b'', []
+        segm = ann.get('segmentation', None)
+        if segm is not None and isinstance(segm, list):
+            for p in ann['segmentation']:
+                if len(p) < 6:
+                    print('Remove Invalid segm.')
+            # Valid polygons have >= 3 points, so require >= 6 coordinates
+            polygons = [p for p in ann['segmentation'] if len(p) >= 6]
+        elif segm is not None:
+            # Crowd masks.
+            # Some are encoded with wrong height or width.
+            # Do not use them or decoding error is inevitable.
+            rle = frPyObjects(ann['segmentation'], height, width)
+            assert type(rle) == dict
+            mask = rle['counts']
+        example['object'].append({
+            'name': cat_id_to_cat[ann['category_id']],
+            'xmin': x1, 'ymin': y1, 'xmax': x2, 'ymax': y2,
+            'mask': mask, 'polygons': polygons,
+            'difficult': ann.get('iscrowd', 0)})
+    return example
+def write_dataset(args):
+    assert len(args.images) == len(args.annotations)
+    if os.path.exists(args.rec):
+        raise ValueError('The record path is already exist.')
+    os.makedirs(args.rec)
+    print('Write record dataset to {}'.format(args.rec))
+    writer = codewithgpu.RecordWriter(
+        path=args.rec,
+        features={
+            'id': 'string',
+            'content': 'bytes',
+            'height': 'int64',
+            'width': 'int64',
+            'depth': 'int64',
+            'object': [{
+                'name': 'string',
+                'xmin': 'float64',
+                'ymin': 'float64',
+                'xmax': 'float64',
+                'ymax': 'float64',
+                'mask': 'bytes',
+                'polygons': [['float64']],
+                'difficult': 'int64',
+            }]
+        }
+    )
+    # Scan all available entries.
+    print('Scan entries...')
+    entries, cocoGts = [], []
+    for ann_file in args.annotations:
+        cocoGts.append(COCO(ann_file))
+    if args.splits is not None:
+        assert len(args.splits) == len(args.images)
+        for i, split in enumerate(args.splits):
+            f = open(split, 'r')
+            for line in f.readlines():
+                filename = line.strip()
+                img_id = int(filename)
+                img_file = os.path.join(args.images[i], filename + '.jpg')
+                entries.append((img_id, img_file, cocoGts[i]))
+            f.close()
+    else:
+        for i, cocoGt in enumerate(cocoGts):
+            for info in cocoGt.imgs.values():
+                img_id = info['id']
+                img_file = os.path.join(args.images[i], info['file_name'])
+                entries.append((img_id, img_file, cocoGts[i]))
+    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
+    start_time = time.time()
+    for i, entry in enumerate(entries):
+        if i > 0 and i % 2000 == 0:
+            now_time = time.time()
+            print('{} / {} in {:.2f} sec'.format(
+                i, len(entries), now_time - start_time))
+        writer.write(make_example(*entry))
+    now_time = time.time()
+    print('{} / {} in {:.2f} sec'.format(
+        len(entries), len(entries), now_time - start_time))
+    writer.close()
+    end_time = time.time()
+    data_size = os.path.getsize(args.rec + '/00000.data') * 1e-6
+    print('{} images take {:.2f} MB in {:.2f} sec.'
+          .format(len(entries), data_size, end_time - start_time))
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is not None:
+        write_dataset(args)
--- a/scripts/datasets/json_dataset.py
+++ b/scripts/datasets/json_dataset.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare JSON datasets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import json
+import os
+import sys
+import codewithgpu
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to read record')
+    parser.add_argument(
+        '--gt',
+        default=None,
+        help='path to write json ground-truth')
+    parser.add_argument(
+        '--categories',
+        nargs='+',
+        type=str,
+        default=None,
+        help='dataset object categories')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def get_image_id(image_name):
+    image_id = image_name.split('_')[-1].split('.')[0]
+    try:
+        return int(image_id)
+    except ValueError:
+        return image_name
+def write_dataset(args):
+    dataset = {'images': [], 'categories': [], 'annotations': []}
+    record_dataset = codewithgpu.RecordDataset(args.rec)
+    cat_to_cat_id = dict(zip(args.categories,
+                             range(1, len(args.categories) + 1)))
+    print('Writing json dataset to {}'.format(args.gt))
+    for cat in args.categories:
+        dataset['categories'].append({
+            'name': cat, 'id': cat_to_cat_id[cat]})
+    for example in record_dataset:
+        image_id = get_image_id(example['id'])
+        dataset['images'].append({
+            'id': image_id, 'height': example['height'],
+            'width': example['width']})
+        for obj in example['object']:
+            if 'x2' in obj:
+                x1, y1, x2, y2 = obj['x1'], obj['y1'], obj['x2'], obj['y2']
+            elif 'xmin' in obj:
+                x1, y1, x2, y2 = obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']
+            else:
+                x1, y1, x2, y2 = obj['bbox']
+            w, h = x2 - x1, y2 - y1
+            dataset['annotations'].append({
+                'id': str(len(dataset['annotations'])),
+                'bbox': [x1, y1, w, h],
+                'area': w * h,
+                'iscrowd': obj.get('difficult', 0),
+                'image_id': image_id,
+                'category_id': cat_to_cat_id[obj['name']]})
+    with open(args.gt, 'w') as f:
+        json.dump(dataset, f)
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is None or not os.path.exists(args.rec):
+        raise ValueError('Specify the prepared record dataset.')
+    if args.gt is None:
+        raise ValueError('Specify the path to write json dataset.')
+    write_dataset(args)
--- a/scripts/datasets/pascal_voc.py
+++ b/scripts/datasets/pascal_voc.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Prepare PASCAL VOC datasets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import os
+import sys
+import time
+import codewithgpu
+import cv2
+import numpy as np
+import xml.etree.ElementTree
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Prepare PASCAL VOC datasets')
+    parser.add_argument(
+        '--rec',
+        default=None,
+        help='path to write record dataset')
+    parser.add_argument(
+        '--gt',
+        default=None,
+        help='path to write json dataset')
+    parser.add_argument(
+        '--images',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of images folder')
+    parser.add_argument(
+        '--annotations',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of annotations folder')
+    parser.add_argument(
+        '--splits',
+        nargs='+',
+        type=str,
+        default=None,
+        help='path of split file')
+    if len(sys.argv) == 1:
+        parser.print_help()
+        sys.exit(1)
+    return parser.parse_args()
+def make_example(img_file, xml_file):
+    """Return the record example."""
+    tree = xml.etree.ElementTree.parse(xml_file)
+    filename = os.path.split(xml_file)[-1]
+    objects = tree.findall('object')
+    size = tree.find('size')
+    example = {'id': filename.split('.')[0], 'object': []}
+    with open(img_file, 'rb') as f:
+        img_bytes = bytes(f.read())
+    if size is not None:
+        example['height'] = int(size.find('height').text)
+        example['width'] = int(size.find('width').text)
+        example['depth'] = int(size.find('depth').text)
+    else:
+        img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
+        example['height'], example['width'], example['depth'] = img.shape
+    example['content'] = img_bytes
+    for obj in objects:
+        bbox = obj.find('bndbox')
+        is_diff = 0
+        if obj.find('difficult') is not None:
+            is_diff = int(obj.find('difficult').text) == 1
+        example['object'].append({
+            'name': obj.find('name').text.strip(),
+            'xmin': float(bbox.find('xmin').text),
+            'ymin': float(bbox.find('ymin').text),
+            'xmax': float(bbox.find('xmax').text),
+            'ymax': float(bbox.find('ymax').text),
+            'difficult': is_diff})
+    return example
+def write_dataset(args):
+    """Write the record dataset."""
+    assert len(args.splits) == len(args.images)
+    assert len(args.splits) == len(args.annotations)
+    if os.path.exists(args.rec):
+        raise ValueError('The record path is already exist.')
+    os.makedirs(args.rec)
+    print('Write record dataset to {}'.format(args.rec))
+    writer = codewithgpu.RecordWriter(
+        path=args.rec,
+        features={
+            'id': 'string',
+            'content': 'bytes',
+            'height': 'int64',
+            'width': 'int64',
+            'depth': 'int64',
+            'object': [{
+                'name': 'string',
+                'xmin': 'float64',
+                'ymin': 'float64',
+                'xmax': 'float64',
+                'ymax': 'float64',
+                'difficult': 'int64',
+            }]
+        }
+    )
+    # Scan all available entries.
+    print('Scan entries...')
+    entries = []
+    for i, split in enumerate(args.splits):
+        with open(split, 'r') as f:
+            lines = f.readlines()
+        for line in lines:
+            filename = line.strip()
+            img_file = os.path.join(args.images[i], filename + '.jpg')
+            ann_file = os.path.join(args.annotations[i], filename + '.xml')
+            entries.append((img_file, ann_file))
+    # Parse and write into record file.
+    print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
+    start_time = time.time()
+    for i, (img_file, xml_file) in enumerate(entries):
+        if i > 0 and i % 2000 == 0:
+            now_time = time.time()
+            print('{} / {} in {:.2f} sec'.format(
+                i, len(entries), now_time - start_time))
+        writer.write(make_example(img_file, xml_file))
+    now_time = time.time()
+    print('{} / {} in {:.2f} sec'.format(
+        len(entries), len(entries), now_time - start_time))
+    writer.close()
+    end_time = time.time()
+    data_size = os.path.getsize(args.rec + '/00000.data') * 1e-6
+    print('{} images take {:.2f} MB in {:.2f} sec.'
+          .format(len(entries), data_size, end_time - start_time))
+def write_json_dataset(args):
+    """Write the json dataset."""
+    categories = ['aeroplane', 'bicycle', 'bird', 'boat',
+                  'bottle', 'bus', 'car', 'cat', 'chair',
+                  'cow', 'diningtable', 'dog', 'horse',
+                  'motorbike', 'person', 'pottedplant',
+                  'sheep', 'sofa', 'train', 'tvmonitor']
+    import subprocess
+    scirpt = os.path.dirname(os.path.abspath(__file__)) + '/json_dataset.py'
+    cmd = '{} {} '.format(sys.executable, scirpt)
+    cmd += '--rec {} --gt {} '.format(args.rec, args.gt)
+    cmd += '--categories {} '.format(' '.join(categories))
+    return subprocess.call(cmd, shell=True)
+if __name__ == '__main__':
+    args = parse_args()
+    if args.rec is not None:
+        write_dataset(args)
+    if args.gt is not None:
+        write_json_dataset(args)
--- a/seetadet/__init__.py
+++ b/seetadet/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""A platform implementing popular object detection algorithms."""
+from __future__ import absolute_import as _absolute_import
+from __future__ import division as _division
+from __future__ import print_function as _print_function
+# Version
+from seetadet.version import version as __version__
--- a/seetadet/core/__init__.py
+++ b/seetadet/core/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/core/config/__init__.py
+++ b/seetadet/core/config/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Platform configurations."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# Variables
+from seetadet.core.config.defaults import cfg  # noqa
--- a/seetadet/core/config/defaults.py
+++ b/seetadet/core/config/defaults.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Default configurations."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.core.config.yacs import CfgNode
+_C = cfg = CfgNode()
+# ------------------------------------------------------------
+# Training options
+# ------------------------------------------------------------
+_C.TRAIN = CfgNode()
+# Initialize network with weights from this file
+_C.TRAIN.WEIGHTS = ''
+# The train dataset
+_C.TRAIN.DATASET = ''
+# The loader type for training
+_C.TRAIN.LOADER = 'det_train'
+# The number of workers to load train data
+_C.TRAIN.NUM_WORKERS = 3
+# Scales to use during training (can list multiple scales)
+# Each scale is the pixel size of an image shortest side
+_C.TRAIN.SCALES = (640,)
+# Range to jitter the image scales randomly
+_C.TRAIN.SCALES_RANGE = (1.0, 1.0)
+# Longest side to resize the input image
+_C.TRAIN.MAX_SIZE = 1000
+# Size to crop the input image
+_C.TRAIN.CROP_SIZE = 0
+# Images to use per mini-batch
+_C.TRAIN.IMS_PER_BATCH = 1
+# Use the difficult (occluded/crowd) objects
+_C.TRAIN.USE_DIFF = False
+# The probability to distort the color
+_C.TRAIN.COLOR_JITTER = 0.0
+# ------------------------------------------------------------
+# Testing options
+# ------------------------------------------------------------
+_C.TEST = CfgNode()
+# The test dataset
+_C.TEST.DATASET = ''
+# THE JSON format dataset with annotations for evaluation
+_C.TEST.JSON_DATASET = ''
+# The loader type for testing
+_C.TEST.LOADER = 'det_test'
+# The evaluator type for dataset
+_C.TEST.EVALUATOR = ''
+# Scales to use during testing (can list multiple scales)
+# Each scale is the pixel size of an image's shortest side
+_C.TEST.SCALES = (640,)
+# Max pixel size of the longest side of a scaled input image
+_C.TEST.MAX_SIZE = 1000
+# Size to crop the input image
+_C.TEST.CROP_SIZE = 0
+# Images to use per mini-batch
+_C.TEST.IMS_PER_BATCH = 1
+# The threshold for predicting boxes
+_C.TEST.SCORE_THRESH = 0.05
+# Overlap threshold used for NMS
+_C.TEST.NMS_THRESH = 0.5
+# Maximum number of detections to return per image
+# 100 is based on the limit established for the COCO dataset
+_C.TEST.DETECTIONS_PER_IM = 100
+# ------------------------------------------------------------
+# Model options
+# ------------------------------------------------------------
+_C.MODEL = CfgNode()
+# The model type
+_C.MODEL.TYPE = ''
+# The compute precision
+_C.MODEL.PRECISION = 'float32'
+# The name for each object class
+_C.MODEL.CLASSES = ['__background__']
+# Pixel mean and stddev values for image normalization (BGR order)
+_C.MODEL.PIXEL_MEAN = [103.53, 116.28, 123.675]
+_C.MODEL.PIXEL_STD = [57.375, 57.12, 58.395]
+# Focal loss parameters
+_C.MODEL.FOCAL_LOSS_ALPHA = 0.25
+_C.MODEL.FOCAL_LOSS_GAMMA = 2.0
+# ------------------------------------------------------------
+# Backbone options
+# ------------------------------------------------------------
+_C.BACKBONE = CfgNode()
+# The backbone type
+_C.BACKBONE.TYPE = ''
+# The normalization in backbone modules
+_C.BACKBONE.NORM = 'FrozenBN'
+# The drop path rate in backbone
+_C.BACKBONE.DROP_PATH_RATE = 0.0
+# Freeze the first stages/blocks of backbone
+_C.BACKBONE.FREEZE_AT = 2
+# Stride of the coarsest feature
+# This is needed so the input can be padded properly
+_C.BACKBONE.COARSEST_STRIDE = 32
+# ------------------------------------------------------------
+# FPN options
+# ------------------------------------------------------------
+_C.FPN = CfgNode()
+# Finest level of the FPN pyramid
+_C.FPN.MIN_LEVEL = 3
+# Coarsest level of the FPN pyramid
+_C.FPN.MAX_LEVEL = 7
+# Starting level of the top-down fusing
+_C.FPN.FUSE_LEVEL = 5
+# Number of blocks to stack in the FPN
+_C.FPN.NUM_BLOCKS = 1
+# Channel dimension of the FPN feature levels
+_C.FPN.DIM = 256
+# The FPN conv module
+_C.FPN.CONV = 'Conv2d'
+# The fpn normalization module
+_C.FPN.NORM = ''
+# The fpn activation module
+_C.FPN.ACTIVATION = ''
+# The feature fusion method
+_C.FPN.FUSE_TYPE = 'sum'
+# ------------------------------------------------------------
+# Anchor generator options
+# ------------------------------------------------------------
+_C.ANCHOR_GENERATOR = CfgNode()
+# The stride of each level
+_C.ANCHOR_GENERATOR.STRIDES = [8, 16, 32, 64, 128]
+# The anchor size of each stride
+_C.ANCHOR_GENERATOR.SIZES = [[32], [64], [128], [256], [512]]
+# The aspect ratios of each stride
+_C.ANCHOR_GENERATOR.ASPECT_RATIOS = [[0.5, 1.0, 2.0]]
+# ------------------------------------------------------------
+# RPN options
+# ------------------------------------------------------------
+_C.RPN = CfgNode()
+# Total number of rpn training anchors per image
+_C.RPN.BATCH_SIZE = 256
+# Fraction of foreground anchors per training batch
+_C.RPN.POSITIVE_FRACTION = 0.5
+# IoU overlap ratio for labeling an anchor as positive
+# Anchors with >= iou overlap are labeled positive
+_C.RPN.POSITIVE_OVERLAP = 0.7
+# IoU overlap ratio for labeling an anchor as negative
+# Anchors with < iou overlap are labeled negative
+_C.RPN.NEGATIVE_OVERLAP = 0.3
+# NMS threshold used on RPN proposals
+_C.RPN.NMS_THRESH = 0.7
+# Number of top scoring boxes to keep before NMS to RPN proposals
+_C.RPN.PRE_NMS_TOPK_TRAIN = 2000
+_C.RPN.PRE_NMS_TOPK_TEST = 1000
+# Number of top scoring boxes to keep after NMS to RPN proposals
+_C.RPN.POST_NMS_TOPK_TRAIN = 1000
+_C.RPN.POST_NMS_TOPK_TEST = 1000
+# The number of conv layers to stack in the head
+_C.RPN.NUM_CONV = 1
+# The optional loss for bbox regression
+_C.RPN.BBOX_REG_LOSS_TYPE = 'l1'
+# Weight for bbox regression loss
+_C.RPN.BBOX_REG_LOSS_WEIGHT = 1.0
+# ------------------------------------------------------------
+# RetinaNet options
+# ------------------------------------------------------------
+_C.RETINANET = CfgNode()
+# Number of conv layers to stack in the head
+_C.RETINANET.NUM_CONV = 4
+# The head conv module
+_C.RETINANET.CONV = 'Conv2d'
+# The head normalization module
+_C.RETINANET.NORM = ''
+# The head activation module
+_C.RETINANET.ACTIVATION = 'ReLU'
+# IoU overlap ratio for labeling an anchor as positive
+# Anchors with >= iou overlap are labeled positive
+_C.RETINANET.POSITIVE_OVERLAP = 0.5
+# IoU overlap ratio for labeling an anchor as negative
+# Anchors with < iou overlap are labeled negative
+_C.RETINANET.NEGATIVE_OVERLAP = 0.4
+# Number of top scoring boxes to keep before NMS
+_C.RETINANET.PRE_NMS_TOPK = 1000
+# The bbox regression loss type
+_C.RETINANET.BBOX_REG_LOSS_TYPE = 'l1'
+# The weight for bbox regression loss
+_C.RETINANET.BBOX_REG_LOSS_WEIGHT = 1.0
+# ------------------------------------------------------------
+# FastRCNN options
+# ------------------------------------------------------------
+_C.FAST_RCNN = CfgNode()
+# Total number of training RoIs per image
+_C.FAST_RCNN.BATCH_SIZE = 512
+# The finest level of RoI feature
+_C.FAST_RCNN.MIN_LEVEL = 2
+# The coarsest level of RoI feature
+_C.FAST_RCNN.MAX_LEVEL = 5
+# Fraction of foreground RoIs per training batch
+_C.FAST_RCNN.POSITIVE_FRACTION = 0.25
+# IoU overlap ratio for labeling a RoI as positive
+# RoIs with >= iou overlap are labeled positive
+_C.FAST_RCNN.POSITIVE_OVERLAP = 0.5
+# IoU overlap ratio for labeling a RoI as negative
+# RoIs with < iou overlap are labeled negative
+_C.FAST_RCNN.NEGATIVE_OVERLAP = 0.5
+# RoI pooler type
+_C.FAST_RCNN.POOLER_TYPE = 'RoIAlignV2'
+# The output size of of RoI pooler
+_C.FAST_RCNN.POOLER_RESOLUTION = 7
+# The resampling window size of RoI pooler
+_C.FAST_RCNN.POOLER_SAMPLING_RATIO = 0
+# The number of conv layers to stack in the head
+_C.FAST_RCNN.NUM_CONV = 0
+# The number of fc layers to stack in the head
+_C.FAST_RCNN.NUM_FC = 2
+# The hidden dimension of conv head
+_C.FAST_RCNN.CONV_HEAD_DIM = 256
+# The hidden dimension of fc head
+_C.FAST_RCNN.FC_HEAD_DIM = 1024
+# The head normalization module
+_C.FAST_RCNN.NORM = ''
+# Use class agnostic for bbox regression or not
+_C.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC = False
+# The bbox regression loss type
+_C.FAST_RCNN.BBOX_REG_LOSS_TYPE = 'l1'
+# The weight for bbox regression loss
+_C.FAST_RCNN.BBOX_REG_LOSS_WEIGHT = 1.0
+# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
+_C.FAST_RCNN.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
+# ------------------------------------------------------------
+# MaskRCNN options
+# ------------------------------------------------------------
+_C.MASK_RCNN = CfgNode()
+# RoI pooler type
+_C.MASK_RCNN.POOLER_TYPE = 'RoIAlignV2'
+# The output size of of RoI pooler
+_C.MASK_RCNN.POOLER_RESOLUTION = 14
+# The resampling window size of RoI pooler
+_C.MASK_RCNN.POOLER_SAMPLING_RATIO = 0
+# The number of conv layers to stack in the head
+_C.MASK_RCNN.NUM_CONV = 4
+# The hidden dimension of conv head
+_C.MASK_RCNN.CONV_HEAD_DIM = 256
+# The head normalization module
+_C.MASK_RCNN.NORM = ''
+# ------------------------------------------------------------
+# CascadeRCNN options
+# ------------------------------------------------------------
+_C.CASCADE_RCNN = CfgNode()
+# Make mask predictions or not
+_C.CASCADE_RCNN.MASK_ON = False
+# IoU overlap ratios for labeling a RoI as positive
+# RoIs with >= iou overlap are labeled positive
+_C.CASCADE_RCNN.POSITIVE_OVERLAP = (0.5, 0.6, 0.7)
+# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
+_C.CASCADE_RCNN.BBOX_REG_WEIGHTS = (
+    (10.0, 10.0, 5.0, 5.0),
+    (20.0, 20.0, 10.0, 10.0),
+    (30.0, 30.0, 15.0, 15.0),
+)
+# ------------------------------------------------------------
+# SSD options
+# ------------------------------------------------------------
+_C.SSD = CfgNode()
+# Fraction of foreground anchors per training batch
+_C.SSD.POSITIVE_FRACTION = 0.25
+# IoU overlap ratio for labeling an anchor as positive
+# Anchors with >= iou overlap are labeled positive
+_C.SSD.POSITIVE_OVERLAP = 0.5
+# IoU overlap ratio for labeling an anchor as negative
+# Anchors with < iou overlap are labeled negative
+_C.SSD.NEGATIVE_OVERLAP = 0.5
+# Number of top scoring boxes to keep before NMS
+_C.SSD.PRE_NMS_TOPK = 300
+# The optional loss for bbox regression
+# Values supported: 'l1', 'smooth_l1', 'giou'
+_C.SSD.BBOX_REG_LOSS_TYPE = 'l1'
+# Weight for bbox regression loss
+_C.SSD.BBOX_REG_LOSS_WEIGHT = 1.0
+# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
+_C.SSD.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
+# ------------------------------------------------------------
+# Solver options
+# ------------------------------------------------------------
+_C.SOLVER = CfgNode()
+# The interval to display logs
+_C.SOLVER.DISPLAY = 20
+# The interval to snapshot a model
+_C.SOLVER.SNAPSHOT_EVERY = 5000
+# Prefix to yield the path: <prefix>_iter_XYZ.pkl
+_C.SOLVER.SNAPSHOT_PREFIX = ''
+# Loss scaling factor for mixed precision training
+_C.SOLVER.LOSS_SCALE = 1024.0
+# Maximum number of SGD iterations
+_C.SOLVER.MAX_STEPS = 40000
+# Base learning rate for the specified scheduler
+_C.SOLVER.BASE_LR = 0.001
+# Minimal learning rate for the specified scheduler
+_C.SOLVER.MIN_LR = 0.0
+# The decay intervals for LRScheduler
+_C.SOLVER.DECAY_STEPS = []
+# The decay factor for exponential LRScheduler
+_C.SOLVER.DECAY_GAMMA = 0.1
+# Warm up to ``BASE_LR`` over this number of steps
+_C.SOLVER.WARM_UP_STEPS = 1000
+# Start the warm up from ``BASE_LR`` * ``FACTOR``
+_C.SOLVER.WARM_UP_FACTOR = 1.0 / 1000
+# The type of optimizier
+_C.SOLVER.OPTIMIZER = 'SGD'
+# The type of lr scheduler
+_C.SOLVER.LR_POLICY = 'steps_with_decay'
+# The layer-wise lr decay
+_C.SOLVER.LAYER_LR_DECAY = 1.0
+# Momentum to use with SGD
+_C.SOLVER.MOMENTUM = 0.9
+# L2 regularization for weight parameters
+_C.SOLVER.WEIGHT_DECAY = 0.0001
+# L2 norm factor for clipping gradients
+_C.SOLVER.CLIP_NORM = 0.0
+# ------------------------------------------------------------
+# Misc options
+# ------------------------------------------------------------
+# Number of GPUs for distributed training
+_C.NUM_GPUS = 1
+# Random seed for reproducibility
+_C.RNG_SEED = 3
+# Default GPU device index
+_C.GPU_ID = 0
--- a/seetadet/core/config/yacs.py
+++ b/seetadet/core/config/yacs.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# Codes are based on:
+#
+#    <https://github.com/rbgirshick/yacs/blob/master/yacs/config.py>
+#
+# ------------------------------------------------------------
+"""Yet Another Configuration System (YACS)."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import copy
+import numpy as np
+import yaml
+class CfgNode(dict):
+    """Node for configuration options."""
+    IMMUTABLE = '__immutable__'
+    def __init__(self, *args, **kwargs):
+        super(CfgNode, self).__init__(*args, **kwargs)
+        self.__dict__[CfgNode.IMMUTABLE] = False
+    def clone(self):
+        """Recursively copy this CfgNode."""
+        return copy.deepcopy(self)
+    def freeze(self):
+        """Make this CfgNode and all of its children immutable."""
+        self._immutable(True)
+    def is_frozen(self):
+        """Return mutability."""
+        return self.__dict__[CfgNode.IMMUTABLE]
+    def merge_from_file(self, cfg_filename):
+        """Load a yaml config file and merge it into this CfgNode."""
+        with open(cfg_filename, 'r') as f:
+            other_cfg = CfgNode(yaml.safe_load(f))
+        self.merge_from_other_cfg(other_cfg)
+    def merge_from_list(self, cfg_list):
+        """Merge config (keys, values) in a list into this CfgNode."""
+        assert len(cfg_list) % 2 == 0
+        from ast import literal_eval
+        for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
+            key_list = k.split('.')
+            d = self
+            for sub_key in key_list[:-1]:
+                assert sub_key in d
+                d = d[sub_key]
+            sub_key = key_list[-1]
+            assert sub_key in d
+            try:
+                value = literal_eval(v)
+            except:  # noqa
+                # Handle the case when v is a string literal
+                value = v
+            if type(value) != type(d[sub_key]):  # noqa
+                raise TypeError('Type {} does not match original type {}'
+                                .format(type(value), type(d[sub_key])))
+            d[sub_key] = value
+    def merge_from_other_cfg(self, other_cfg):
+        """Merge ``other_cfg`` into this CfgNode."""
+        _merge_a_into_b(other_cfg, self)
+    def _immutable(self, is_immutable):
+        """Set immutability recursively to all nested CfgNode."""
+        self.__dict__[CfgNode.IMMUTABLE] = is_immutable
+        for v in self.__dict__.values():
+            if isinstance(v, CfgNode):
+                v._immutable(is_immutable)
+        for v in self.values():
+            if isinstance(v, CfgNode):
+                v._immutable(is_immutable)
+    def __getattr__(self, name):
+        if name in self.__dict__:
+            return self.__dict__[name]
+        elif name in self:
+            return self[name]
+        else:
+            raise AttributeError(name)
+    def __repr__(self):
+        return "{}({})".format(self.__class__.__name__,
+                               super(CfgNode, self).__repr__())
+    def __setattr__(self, name, value):
+        if not self.__dict__[CfgNode.IMMUTABLE]:
+            if name in self.__dict__:
+                self.__dict__[name] = value
+            else:
+                self[name] = value
+        else:
+            raise AttributeError(
+                'Attempted to set "{}" to "{}", but CfgNode is immutable'
+                .format(name, value))
+    def __str__(self):
+        def _indent(s_, num_spaces):
+            s = s_.split("\n")
+            if len(s) == 1:
+                return s_
+            first = s.pop(0)
+            s = [(num_spaces * " ") + line for line in s]
+            s = "\n".join(s)
+            s = first + "\n" + s
+            return s
+        r = ""
+        s = []
+        for k, v in sorted(self.items()):
+            seperator = "\n" if isinstance(v, CfgNode) else " "
+            attr_str = "{}:{}{}".format(str(k), seperator, str(v))
+            attr_str = _indent(attr_str, 2)
+            s.append(attr_str)
+        r += "\n".join(s)
+        return r
+def _merge_a_into_b(a, b):
+    """Merge config dictionary a into config dictionary b, clobbering the
+       options in b whenever they are also specified in a."""
+    if not isinstance(a, dict):
+        return
+    for k, v in a.items():
+        # a must specify keys that are in b
+        if k not in b:
+            raise KeyError('{} is not a valid config key'.format(k))
+        # The types must match, too
+        v = _check_and_coerce_cfg_value_type(v, b[k], k)
+        # Recursively merge dicts
+        if type(v) is CfgNode:
+            try:
+                _merge_a_into_b(a[k], b[k])
+            except:  # noqa
+                print('Error under config key: {}'.format(k))
+                raise
+        else:
+            b[k] = v
+def _check_and_coerce_cfg_value_type(value_a, value_b, key):
+    """Check if the value type matched."""
+    type_a, type_b = type(value_a), type(value_b)
+    if type_a is type_b:
+        return value_a
+    if type_b is float and type_a is int:
+        return float(value_a)
+    # Exceptions: numpy arrays, strings, tuple<->list
+    if isinstance(value_b, np.ndarray):
+        value_a = np.array(value_a, dtype=value_b.dtype)
+    elif isinstance(value_a, tuple) and isinstance(value_b, list):
+        value_a = list(value_a)
+    elif isinstance(value_a, list) and isinstance(value_b, tuple):
+        value_a = tuple(value_a)
+    elif isinstance(value_a, dict) and isinstance(value_b, CfgNode):
+        value_a = CfgNode(value_a)
+    else:
+        raise ValueError(
+            'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
+            'key: {}'.format(type_b, type_a, value_b, value_a, key))
+    return value_a
--- a/seetadet/core/coordinator.py
+++ b/seetadet/core/coordinator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Experiment coordinator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+import os.path as osp
+import time
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.utils import logging
+class Coordinator(object):
+    """Manage the unique experiments."""
+    def __init__(self, cfg_file, exp_dir=None):
+        cfg.merge_from_file(cfg_file)
+        if exp_dir is None:
+            name = time.strftime('%Y%m%d_%H%M%S',
+                                 time.localtime(time.time()))
+            exp_dir = '../experiments/{}'.format(name)
+            if not osp.exists(exp_dir):
+                os.makedirs(exp_dir)
+        else:
+            if not osp.exists(exp_dir):
+                raise ValueError('Invalid experiment dir: ' + exp_dir)
+        self.exp_dir = exp_dir
+    def path_at(self, file, auto_create=True):
+        try:
+            path = osp.abspath(osp.join(self.exp_dir, file))
+            if auto_create and not osp.exists(path):
+                os.makedirs(path)
+        except OSError:
+            path = osp.abspath(osp.join('/tmp', file))
+            if auto_create and not osp.exists(path):
+                os.makedirs(path)
+        return path
+    def get_checkpoint(self, step=None, last_idx=1, wait=False):
+        path = self.path_at('checkpoints')
+        def locate(last_idx=None):
+            files = os.listdir(path)
+            files = list(filter(lambda x: '_iter_' in x and
+                                          x.endswith('.pkl'), files))
+            file_steps = []
+            for i, file in enumerate(files):
+                file_step = int(file.split('_iter_')[-1].split('.')[0])
+                if step == file_step:
+                    return osp.join(path, files[i]), file_step
+                file_steps.append(file_step)
+            if step is None:
+                if len(files) == 0:
+                    return None, 0
+                if last_idx > len(files):
+                    return None, 0
+                file = files[np.argsort(file_steps)[-last_idx]]
+                file_step = file_steps[np.argsort(file_steps)[-last_idx]]
+                return osp.join(path, file), file_step
+            return None, 0
+        file, file_step = locate(last_idx)
+        while file is None and wait:
+            logging.info('Wait for checkpoint at {}.'.format(step))
+            time.sleep(10)
+            file, file_step = locate(last_idx)
+        return file, file_step
--- a/seetadet/core/engine/__init__.py
+++ b/seetadet/core/engine/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/core/engine/build.py
+++ b/seetadet/core/engine/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for training library."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from seetadet.core.config import cfg
+from seetadet.core.engine import lr_scheduler
+def build_optimizer(params, **kwargs):
+    """Build the optimizer."""
+    args = {'lr': cfg.SOLVER.BASE_LR,
+            'weight_decay': cfg.SOLVER.WEIGHT_DECAY,
+            'clip_norm': cfg.SOLVER.CLIP_NORM,
+            'grad_scale': 1.0 / cfg.SOLVER.LOSS_SCALE}
+    optimizer = kwargs.pop('optimizer', cfg.SOLVER.OPTIMIZER)
+    if optimizer == 'SGD':
+        args['momentum'] = cfg.SOLVER.MOMENTUM
+    args.update(kwargs)
+    return getattr(torch.optim, optimizer)(params, **args)
+def build_lr_scheduler(**kwargs):
+    """Build the LR scheduler."""
+    args = {'lr_max': cfg.SOLVER.BASE_LR,
+            'lr_min': cfg.SOLVER.MIN_LR,
+            'warmup_steps': cfg.SOLVER.WARM_UP_STEPS,
+            'warmup_factor': cfg.SOLVER.WARM_UP_FACTOR}
+    policy = kwargs.pop('policy', cfg.SOLVER.LR_POLICY)
+    args.update(kwargs)
+    if policy == 'steps_with_decay':
+        return lr_scheduler.MultiStepLR(
+            decay_steps=cfg.SOLVER.DECAY_STEPS,
+            decay_gamma=cfg.SOLVER.DECAY_GAMMA, **args)
+    elif policy == 'linear_decay':
+        return lr_scheduler.LinearLR(
+            decay_step=(cfg.SOLVER.DECAY_STEPS or [1])[0],
+            max_steps=cfg.SOLVER.MAX_STEPS, **args)
+    elif policy == 'cosine_decay':
+        return lr_scheduler.CosineLR(
+            decay_step=(cfg.SOLVER.DECAY_STEPS or [1])[0],
+            max_steps=cfg.SOLVER.MAX_STEPS, **args)
+    return lr_scheduler.ConstantLR(**args)
+def build_tensorboard(log_dir):
+    """Build the tensorboard."""
+    try:
+        from dragon.utils.tensorboard import tf
+        from dragon.utils.tensorboard import TensorBoard
+        # Avoid using of GPUs by TF API.
+        if tf is not None:
+            tf.config.set_visible_devices([], 'GPU')
+        return TensorBoard(log_dir)
+    except ImportError:
+        return None
--- a/seetadet/core/engine/lr_scheduler.py
+++ b/seetadet/core/engine/lr_scheduler.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Learning rate schedulers."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import math
+class ConstantLR(object):
+    """Constant LR scheduler."""
+    def __init__(self, **kwargs):
+        self._lr_max = kwargs.pop('lr_max')
+        self._lr_min = kwargs.pop('lr_min', 0)
+        self._warmup_steps = kwargs.pop('warmup_steps', 0)
+        self._warmup_factor = kwargs.pop('warmup_factor', 0)
+        if kwargs:
+            raise ValueError('Unexpected arguments: ' + ','.join(v for v in kwargs))
+        self._step_count = 0
+        self._last_decay = 1.
+    def step(self):
+        self._step_count += 1
+    def get_lr(self):
+        if self._step_count < self._warmup_steps:
+            alpha = (self._step_count + 1.) / self._warmup_steps
+            return self._lr_max * (alpha + (1. - alpha) * self._warmup_factor)
+        return self._lr_min + (self._lr_max - self._lr_min) * self.get_decay()
+    def get_decay(self):
+        return self._last_decay
+class CosineLR(ConstantLR):
+    """LR scheduler with cosine decay."""
+    def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
+        super(CosineLR, self).__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
+        self._decay_step = decay_step
+        self._max_steps = max_steps
+    def get_decay(self):
+        t = self._step_count - self._warmup_steps
+        t_max = self._max_steps - self._warmup_steps
+        if t > 0 and t % self._decay_step == 0:
+            self._last_decay = .5 * (1. + math.cos(math.pi * t / t_max))
+        return self._last_decay
+class MultiStepLR(ConstantLR):
+    """LR scheduler with multi-steps decay."""
+    def __init__(self, lr_max, decay_steps, decay_gamma, **kwargs):
+        super(MultiStepLR, self).__init__(lr_max=lr_max, **kwargs)
+        self._decay_steps = decay_steps
+        self._decay_gamma = decay_gamma
+        self._stage_count = 0
+        self._num_stages = len(decay_steps)
+    def get_decay(self):
+        if self._stage_count < self._num_stages:
+            k = self._decay_steps[self._stage_count]
+            while self._step_count >= k:
+                self._stage_count += 1
+                if self._stage_count >= self._num_stages:
+                    break
+                k = self._decay_steps[self._stage_count]
+            self._last_decay = self._decay_gamma ** self._stage_count
+        return self._last_decay
+class LinearLR(ConstantLR):
+    """LR scheduler with linear decay."""
+    def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
+        super(LinearLR, self).__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
+        self._decay_step = decay_step
+        self._max_steps = max_steps
+    def get_decay(self):
+        t = self._step_count - self._warmup_steps
+        t_max = self._max_steps - self._warmup_steps
+        if t > 0 and t % self._decay_step == 0:
+            self._last_decay = 1. - float(t) / t_max
+        return self._last_decay
+if __name__ == '__main__':
+    def extract_label(scheduler):
+        class_name = scheduler.__class__.__name__
+        label = class_name + '('
+        if class_name == 'CosineLR':
+            label += 'α=' + str(scheduler._decay_step)
+        elif class_name == 'LinearCosineLR':
+            label += 'α=' + str(scheduler._decay_step)
+        elif class_name == 'MultiStepLR':
+            label += 'α=' + str(scheduler._decay_steps) + ', '
+            label += 'γ=' + str(scheduler._decay_gamma)
+        elif class_name == 'StepLR':
+            label += 'α=' + str(scheduler._decay_step) + ', '
+            label += 'γ=' + str(scheduler._decay_gamma)
+        label += ')'
+        return label
+    vis = True
+    max_steps = 120
+    shared_args = {
+        'lr_max': 0.0004,
+        'warmup_steps': 0,
+        'warmup_factor': 0.,
+    }
+    schedulers = [
+        # CosineLR(lr_min=0., decay_step=1, max_steps=max_steps, **shared_args),
+        CosineLR(lr_min=1e-6, decay_step=1, max_steps=140, **shared_args),
+    ]
+    for i in range(max_steps):
+        info = 'Step = %d\n' % i
+        for scheduler in schedulers:
+            if i == 0:
+                scheduler.lr_seq = []
+            info += '  * {}: {}\n'.format(
+                extract_label(scheduler),
+                scheduler.get_lr())
+            scheduler.lr_seq.append(scheduler.get_lr())
+            scheduler.step()
+        if not vis:
+            print(info)
+    if vis:
+        import matplotlib.pyplot as plt
+        plt.figure(1)
+        plt.title('Visualization of different LR Schedulers')
+        plt.xlabel('Step')
+        plt.ylabel('Learning Rate')
+        line = '-'
+        colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
+        for i, scheduler in enumerate(schedulers):
+            plt.plot(
+                range(max_steps),
+                scheduler.lr_seq,
+                colors[i] + line,
+                linewidth=1.,
+                label=extract_label(scheduler),
+            )
+        plt.legend()
+        plt.grid(linestyle='--')
+        plt.show()
+        plt.savefig('x.png')
--- a/seetadet/core/engine/test_engine.py
+++ b/seetadet/core/engine/test_engine.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Testing engine."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import datetime
+import multiprocessing as mp
+import codewithgpu
+from dragon.vm import torch
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.build import build_evaluator
+from seetadet.models.build import build_detector
+from seetadet.modules.build import build_inference
+from seetadet.utils import logging
+from seetadet.utils import profiler
+from seetadet.utils import vis
+class InferenceCommand(codewithgpu.InferenceCommand):
+    """Command to run inference."""
+    def __init__(self, input_queue, output_queue, kwargs):
+        super(InferenceCommand, self).__init__(input_queue, output_queue)
+        self.kwargs = kwargs
+    def build_env(self):
+        """Build the environment."""
+        cfg.merge_from_other_cfg(self.kwargs['cfg'])
+        cfg.GPU_ID = self.kwargs['device']
+        cfg.freeze()
+        logging.set_root(self.kwargs.get('verbose', True))
+        self.batch_size = cfg.TEST.IMS_PER_BATCH
+        self.batch_timeout = self.kwargs.get('batch_timeout', None)
+        if self.kwargs.get('deterministic', False):
+            torch.backends.cudnn.deterministic = True
+    def build_model(self):
+        """Build and return the model."""
+        return build_detector(self.kwargs['device'], self.kwargs['weights'])
+    def build_module(self, model):
+        """Build and return the inference module."""
+        return build_inference(model)
+    def send_results(self, module, indices, imgs):
+        """Send the batch results."""
+        results = module.get_results(imgs)
+        time_diffs = module.get_time_diffs()
+        time_diffs['im_detect'] += time_diffs.pop('im_detect_mask', 0.)
+        for i, outputs in enumerate(results):
+            outputs['im_shape'] = imgs[i].shape
+            self.output_queue.put((indices[i], time_diffs, outputs))
+def filter_outputs(outputs, max_dets=100):
+    """Limit the max number of detections."""
+    if max_dets <= 0:
+        return outputs
+    boxes = outputs.pop('boxes')
+    masks = outputs.pop('masks', None)
+    scores, num_classes = [], len(boxes)
+    for i in range(num_classes):
+        if len(boxes[i]) > 0:
+            scores.append(boxes[i][:, -1])
+    scores = np.hstack(scores) if len(scores) > 0 else []
+    if len(scores) > max_dets:
+        thr = np.sort(scores)[-max_dets]
+        for i in range(num_classes):
+            if len(boxes[i]) < 1:
+                continue
+            keep = np.where(boxes[i][:, -1] >= thr)[0]
+            boxes[i] = boxes[i][keep]
+            if masks is not None:
+                masks[i] = masks[i][keep]
+    outputs['boxes'] = boxes
+    outputs['masks'] = masks
+    return outputs
+def extend_results(index, collection, results):
+    """Add image results to the collection."""
+    if results is None:
+        return
+    for _ in range(len(results) - len(collection)):
+        collection.append([])
+    for i in range(1, len(results)):
+        for _ in range(index - len(collection[i]) + 1):
+            collection[i].append([])
+        collection[i][index] = results[i]
+def run_test(
+    test_cfg,
+    weights,
+    output_dir,
+    devices,
+    deterministic=False,
+    read_every=100,
+    vis_thresh=0,
+    vis_output_dir=None,
+):
+    """Run a model testing.
+    Parameters
+    ----------
+    test_cfg : CfgNode
+        The cfg for testing.
+    weights : str
+        The path of model weights to load.
+    output_dir : str
+        The path to save results.
+    devices : Sequence[int]
+        The index of computing devices.
+    deterministic : bool, optional, default=False
+        Set cudnn deterministic or not.
+    read_every : int, optional, default=100
+        Read every N images to distribute to devices.
+    vis_thresh : float, optional, default=0
+        The score threshold for visualization.
+    vis_output_dir : str, optional
+        The path to save visualizations.
+    """
+    cfg.merge_from_other_cfg(test_cfg)
+    evaluator = build_evaluator(output_dir)
+    devices = devices if devices else [cfg.GPU_ID]
+    num_devices = len(devices)
+    num_images = evaluator.num_images
+    max_dets = cfg.TEST.DETECTIONS_PER_IM
+    read_stride = float(num_devices * cfg.TEST.IMS_PER_BATCH)
+    read_every = int(np.ceil(read_every / read_stride) * read_stride)
+    visualizer = vis.Visualizer(cfg.MODEL.CLASSES, vis_thresh)
+    queues = [mp.Queue() for _ in range(num_devices + 1)]
+    commands = [InferenceCommand(
+        queues[i], queues[-1], kwargs={
+            'cfg': test_cfg,
+            'weights': weights,
+            'device': devices[i],
+            'deterministic': deterministic,
+            'verbose': i == 0,
+        }) for i in range(num_devices)]
+    actors = [mp.Process(target=command.run) for command in commands]
+    for actor in actors:
+        actor.start()
+    timers = collections.defaultdict(profiler.Timer)
+    all_boxes, all_masks, vis_images = [], [], {}
+    for count in range(1, num_images + 1):
+        img_id, img = evaluator.get_image()
+        queues[count % num_devices].put((count - 1, img))
+        if vis_thresh > 0 and vis_output_dir:
+            filename = vis_output_dir + '/%s.png' % img_id
+            vis_images[count - 1] = (filename, img)
+        if count % read_every > 0 and count < num_images:
+            continue
+        if count == num_images:
+            for i in range(num_devices):
+                queues[i].put((-1, None))
+        for _ in range(((count - 1) % read_every + 1)):
+            index, time_diffs, outputs = queues[-1].get()
+            outputs = filter_outputs(outputs, max_dets)
+            extend_results(index, all_boxes, outputs['boxes'])
+            extend_results(index, all_masks, outputs.get('masks', None))
+            for name, diff in time_diffs.items():
+                timers[name].add_diff(diff)
+            if vis_thresh > 0 and vis_output_dir:
+                filename, img = vis_images[index]
+                visualizer.draw_instances(
+                    img=img,
+                    boxes=outputs['boxes'],
+                    masks=outputs.get('masks', None)).save(filename)
+                del vis_images[index]
+        avg_time = sum([t.average_time for t in timers.values()])
+        eta_seconds = avg_time * (num_images - count)
+        print('\rim_detect: {:d}/{:d} [{:.3f}s + {:.3f}s] (eta: {})'
+              .format(count, num_images,
+                      timers['im_detect'].average_time,
+                      timers['misc'].average_time,
+                      str(datetime.timedelta(seconds=int(eta_seconds)))),
+              end='')
+    print('\nEvaluating detections...')
+    evaluator.eval_bbox(all_boxes)
+    if len(all_masks) > 0:
+        print('Evaluating segmentations...')
+        evaluator.eval_segm(all_boxes, all_masks)
--- a/seetadet/core/engine/train_engine.py
+++ b/seetadet/core/engine/train_engine.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Training engine."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import functools
+import os
+from dragon.vm import torch
+from seetadet.core.config import cfg
+from seetadet.core.engine.build import build_lr_scheduler
+from seetadet.core.engine.build import build_optimizer
+from seetadet.core.engine.build import build_tensorboard
+from seetadet.core.engine.utils import count_params
+from seetadet.core.engine.utils import get_device
+from seetadet.core.engine.utils import get_param_groups
+from seetadet.data.build import build_loader_train
+from seetadet.models.build import build_detector
+from seetadet.utils import logging
+from seetadet.utils import profiler
+class Trainer(object):
+    """Schedule the iterative model training."""
+    def __init__(self, coordinator, start_iter=0):
+        # Build loader.
+        self.loader = build_loader_train()
+        # Build model.
+        self.model = build_detector(training=True)
+        self.model.load_weights(cfg.TRAIN.WEIGHTS, strict=start_iter > 0)
+        self.model.to(device=get_device(cfg.GPU_ID))
+        if cfg.MODEL.PRECISION.lower() == 'float16':
+            self.model.half()
+        # Build optimizer.
+        self.loss_scale = cfg.SOLVER.LOSS_SCALE
+        param_groups_getter = get_param_groups
+        if cfg.SOLVER.LAYER_LR_DECAY < 1.0:
+            lr_scale_getter = functools.partial(
+                self.model.backbone.get_lr_scale,
+                decay=cfg.SOLVER.LAYER_LR_DECAY)
+            param_groups_getter = functools.partial(
+                param_groups_getter, lr_scale_getter=lr_scale_getter)
+        self.optimizer = build_optimizer(param_groups_getter(self.model))
+        self.scheduler = build_lr_scheduler()
+        # Build monitor.
+        self.coordinator = coordinator
+        self.metrics = collections.OrderedDict()
+        self.board = None
+    @property
+    def iter(self):
+        return self.scheduler._step_count
+    def snapshot(self):
+        """Save the checkpoint of current iterative step."""
+        f = cfg.SOLVER.SNAPSHOT_PREFIX
+        f += '_iter_{}.pkl'.format(self.iter)
+        f = os.path.join(self.coordinator.path_at('checkpoints'), f)
+        if logging.is_root() and not os.path.exists(f):
+            torch.save(self.model.state_dict(), f, pickle_protocol=4)
+            logging.info('Wrote snapshot to: {:s}'.format(f))
+    def add_metrics(self, stats):
+        """Add or update the metrics."""
+        for k, v in stats['metrics'].items():
+            if k not in self.metrics:
+                self.metrics[k] = profiler.SmoothedValue()
+            self.metrics[k].update(v)
+    def display_metrics(self, stats):
+        """Send metrics to the monitor."""
+        logging.info('Iteration %d, lr = %.8f, time = %.2fs'
+                     % (stats['iter'], stats['lr'], stats['time']))
+        for k, v in self.metrics.items():
+            logging.info(' ' * 4 + 'Train net output({}): {:.4f} ({:.4f})'
+                         .format(k, stats['metrics'][k], v.average()))
+        if self.board is not None:
+            self.board.scalar_summary('lr', stats['lr'], stats['iter'])
+            self.board.scalar_summary('time', stats['time'], stats['iter'])
+            for k, v in self.metrics.items():
+                self.board.scalar_summary(k, v.average(), stats['iter'])
+    def step(self):
+        stats = {'iter': self.iter}
+        metrics = collections.defaultdict(float)
+        # Run forward.
+        timer = profiler.Timer().tic()
+        inputs = self.loader()
+        outputs, losses = self.model(inputs), []
+        for k, v in outputs.items():
+            if 'loss' in k:
+                if isinstance(v, (tuple, list)):
+                    losses.append(sum(v[1:], v[0]).mul_(1. / len(v)))
+                    metrics.update(dict(('stage%d_' % (i + 1) + k, float(x))
+                                        for i, x in enumerate(v)))
+                else:
+                    losses.append(v)
+                    metrics[k] += float(v)
+        # Run backward.
+        losses = sum(losses[1:], losses[0])
+        if self.loss_scale != 1.0:
+            losses *= self.loss_scale
+        losses.backward()
+        # Apply update.
+        stats['lr'] = self.scheduler.get_lr()
+        for group in self.optimizer.param_groups:
+            group['lr'] = stats['lr'] * group.get('lr_scale', 1.0)
+        self.optimizer.step()
+        self.scheduler.step()
+        stats['time'] = timer.toc()
+        stats['metrics'] = collections.OrderedDict(sorted(metrics.items()))
+        return stats
+    def train_model(self, start_iter=0):
+        """Network training loop."""
+        timer = profiler.Timer()
+        max_steps = cfg.SOLVER.MAX_STEPS
+        display_every = cfg.SOLVER.DISPLAY
+        progress_every = 10 * display_every
+        snapshot_every = cfg.SOLVER.SNAPSHOT_EVERY
+        self.scheduler._step_count = start_iter
+        while self.iter < max_steps:
+            with timer.tic_and_toc():
+                stats = self.step()
+            self.add_metrics(stats)
+            if stats['iter'] % display_every == 0:
+                self.display_metrics(stats)
+            if self.iter % progress_every == 0:
+                logging.info(profiler.get_progress(timer, self.iter, max_steps))
+            if self.iter % snapshot_every == 0:
+                self.snapshot()
+                self.metrics.clear()
+def run_train(coordinator, start_iter=0, enable_tensorboard=False):
+    """Start a network training task."""
+    trainer = Trainer(coordinator, start_iter=start_iter)
+    if enable_tensorboard and logging.is_root():
+        trainer.board = build_tensorboard(coordinator.path_at('logs'))
+    logging.info('#Params: %.2fM' % count_params(trainer.model))
+    logging.info('Start training...')
+    trainer.train_model(start_iter)
+    trainer.snapshot()
--- a/seetadet/core/engine/utils.py
+++ b/seetadet/core/engine/utils.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Engine utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import importlib.machinery
+import os
+import dragon
+from dragon.core.framework import backend
+from dragon.vm import torch
+def count_params(module):
+    """Return the number of parameters in MB."""
+    return sum([v.size().numel() for v in module.parameters()]) / 1e6
+def freeze_module(module):
+    """Freeze parameters of given module."""
+    module.eval()
+    for param in module.parameters():
+        param.requires_grad = False
+def get_device(index):
+    """Create the available device object."""
+    if torch.cuda.is_available():
+        return torch.device('cuda', index)
+    try:
+        if torch.backends.mps.is_available():
+            return torch.device('mps', index)
+    except AttributeError:
+        pass
+    return torch.device('cpu')
+def get_param_groups(module, lr_scale_getter=None):
+    """Separate parameters into groups."""
+    memo, groups = {}, collections.OrderedDict()
+    for name, param in module.named_parameters():
+        if not param.requires_grad:
+            continue
+        attrs = collections.OrderedDict()
+        if lr_scale_getter:
+            attrs['lr_scale'] = lr_scale_getter(name)
+        memo[name] = param.shape
+        no_weight_decay = not (name.endswith('weight') and param.dim() > 1)
+        no_weight_decay = getattr(param, 'no_weight_decay', no_weight_decay)
+        if no_weight_decay:
+            attrs['weight_decay'] = 0
+        group_name = '/'.join(['%s:%s' % (v[0], v[1]) for v in list(attrs.items())])
+        if group_name not in groups:
+            groups[group_name] = {'params': []}
+            groups[group_name].update(attrs)
+        groups[group_name]['params'].append(param)
+    return list(groups.values())
+def load_library(library_prefix):
+    """Load a shared library."""
+    loader_details = (importlib.machinery.ExtensionFileLoader,
+                      importlib.machinery.EXTENSION_SUFFIXES)
+    library_prefix = os.path.abspath(library_prefix)
+    lib_dir, fullname = os.path.split(library_prefix)
+    finder = importlib.machinery.FileFinder(lib_dir, loader_details)
+    ext_specs = finder.find_spec(fullname)
+    if ext_specs is None:
+        raise ImportError('Could not find the pre-built library '
+                          'for <%s>.' % library_prefix)
+    backend.load_library(ext_specs.origin)
+def synchronize_device(device):
+    """Synchronize the computation of device."""
+    if device.type == 'cuda':
+        torch.cuda.synchronize(device)
+    elif device.type == 'mps':
+        dragon.mps.synchronize(device.index)
--- a/seetadet/core/registry.py
+++ b/seetadet/core/registry.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Registry class."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import functools
+class Registry(object):
+    """Registry class."""
+    def __init__(self, name):
+        self.name = name
+        self.registry = collections.OrderedDict()
+    def has(self, key):
+        return key in self.registry
+    def register(self, name, func=None, **kwargs):
+        def decorated(inner_function):
+            for key in (name if isinstance(
+                    name, (tuple, list)) else [name]):
+                self.registry[key] = \
+                    functools.partial(inner_function, **kwargs)
+            return inner_function
+        if func is not None:
+            return decorated(func)
+        return decorated
+    def get(self, name, default=None):
+        if name is None:
+            return None
+        if not self.has(name):
+            if default is not None:
+                return default
+            raise KeyError("`%s` is not registered in <%s>."
+                           % (name, self.name))
+        return self.registry[name]
+    def try_get(self, name):
+        if self.has(name):
+            return self.get(name)
+        return None
--- a/seetadet/data/__init__.py
+++ b/seetadet/data/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.data import datasets
+from seetadet.data import evaluators
+from seetadet.data import pipelines
--- a/seetadet/data/anchors/__init__.py
+++ b/seetadet/data/anchors/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/data/anchors/rpn.py
+++ b/seetadet/data/anchors/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Anchor generator for RPN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+class AnchorGenerator(object):
+    """Generate anchors for bbox regression."""
+    def __init__(self, strides, sizes, aspect_ratios,
+                 scales_per_octave=1):
+        self.strides = strides
+        self.sizes = _align_args(strides, sizes)
+        self.aspect_ratios = _align_args(strides, aspect_ratios)
+        for i in range(len(self.sizes)):
+            octave_sizes = []
+            for j in range(1, scales_per_octave):
+                scale = 2 ** (float(j) / scales_per_octave)
+                octave_sizes += [x * scale for x in self.sizes[i]]
+            self.sizes[i] += octave_sizes
+        self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
+        self.cell_anchors = []
+        for i in range(len(strides)):
+            self.cell_anchors.append(generate_anchors(
+                strides[i], self.aspect_ratios[i], self.sizes[i]))
+        self.grid_shapes = None
+        self.grid_anchors = None
+        self.grid_coords = None
+    def reset_grid(self, max_size):
+        """Reset the grid."""
+        self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
+        self.grid_coords = self.get_coords(self.grid_shapes)
+        self.grid_anchors = self.get_anchors(self.grid_shapes)
+    def num_cell_anchors(self, index=0):
+        """Return number of cell anchors."""
+        return self.cell_anchors[index].shape[0]
+    def num_anchors(self, shapes):
+        """Return the number of grid anchors."""
+        return sum(self.cell_anchors[i].shape[0] * np.prod(shapes[i])
+                   for i in range(len(shapes)))
+    def get_slices(self, shapes):
+        slices, offset = [], 0
+        for i, shape in enumerate(shapes):
+            num = self.cell_anchors[i].shape[0] * np.prod(shape)
+            slices.append(slice(offset, offset + num))
+            offset = offset + num
+        return slices
+    def get_coords(self, shapes):
+        """Return the x-y coordinates of grid anchors."""
+        xs, ys = [], []
+        for i in range(len(shapes)):
+            height, width = shapes[i]
+            x, y = np.arange(0, width), np.arange(0, height)
+            x, y = np.meshgrid(x, y)
+            # Add A anchors (A,) to cell K shifts (K,)
+            # to get shift coords (A, K)
+            xs.append(np.tile(x.flatten(), self.cell_anchors[i].shape[0]))
+            ys.append(np.tile(y.flatten(), self.cell_anchors[i].shape[0]))
+        return np.concatenate(xs), np.concatenate(ys)
+    def get_anchors(self, shapes):
+        """Return the grid anchors."""
+        grid_anchors = []
+        for i in range(len(shapes)):
+            h, w = shapes[i]
+            shift_x = np.arange(0, w) * self.strides[i]
+            shift_y = np.arange(0, h) * self.strides[i]
+            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
+            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
+                                shift_x.ravel(), shift_y.ravel())).transpose()
+            shifts = shifts.astype(self.cell_anchors[i].dtype)
+            # Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
+            # to get shift anchors (A, K, 4)
+            a, k = self.num_cell_anchors(i), shifts.shape[0]
+            anchors = (self.cell_anchors[i].reshape((a, 1, 4)) +
+                       shifts.reshape((1, k, 4)))
+            grid_anchors.append(anchors.reshape((a * k, 4)))
+        return np.vstack(grid_anchors)
+    def narrow_anchors(self, shapes, inds, return_anchors=False):
+        """Return the valid anchors on given shapes."""
+        max_shapes = self.grid_shapes
+        anchors = self.grid_anchors
+        x_coords, y_coords = self.grid_coords
+        offset1 = offset2 = num1 = num2 = 0
+        out_inds, out_anchors = [], []
+        for i in range(len(max_shapes)):
+            num1 += self.num_cell_anchors(i) * np.prod(max_shapes[i])
+            num2 += self.num_cell_anchors(i) * np.prod(shapes[i])
+            inds_keep = inds[np.where((inds >= offset1) & (inds < num1))[0]]
+            anchors_keep = anchors[inds_keep] if return_anchors else None
+            x, y = x_coords[inds_keep], y_coords[inds_keep]
+            z = ((inds_keep - offset1) // max_shapes[i][1]) // max_shapes[i][0]
+            keep = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
+            inds_keep = (z * shapes[i][0] + y) * shapes[i][1] + x + offset2
+            out_inds.append(inds_keep[keep])
+            out_anchors.append(anchors_keep[keep] if return_anchors else None)
+            offset1, offset2 = num1, num2
+        outputs = [np.concatenate(out_inds)]
+        if return_anchors:
+            outputs += [np.concatenate(out_anchors)]
+        return outputs[0] if len(outputs) == 1 else outputs
+def generate_anchors(stride=16, ratios=(0.5, 1, 2), sizes=(32, 64, 128, 256, 512)):
+    """Generate anchors by enumerating aspect ratios and sizes."""
+    scales = np.array(sizes) / stride
+    base_anchor = np.array([-stride / 2., -stride / 2., stride / 2., stride / 2.])
+    ratio_anchors = _ratio_enum(base_anchor, ratios)
+    anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
+                         for i in range(ratio_anchors.shape[0])])
+    return anchors.astype('float32')
+def _whctrs(anchor):
+    """Return the xywh of an anchor."""
+    w = anchor[2] - anchor[0]
+    h = anchor[3] - anchor[1]
+    x_ctr = anchor[0] + 0.5 * w
+    y_ctr = anchor[1] + 0.5 * h
+    return w, h, x_ctr, y_ctr
+def _mkanchors(ws, hs, x_ctr, y_ctr):
+    """Return a sef of anchors by widths, heights and center."""
+    ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
+    return np.hstack((x_ctr - 0.5 * ws, y_ctr - 0.5 * hs,
+                      x_ctr + 0.5 * ws, y_ctr + 0.5 * hs))
+def _ratio_enum(anchor, ratios):
+    """Enumerate a set of anchors by aspect ratios."""
+    w, h, x_ctr, y_ctr = _whctrs(anchor)
+    ws = np.sqrt(w * h / ratios)
+    hs = ws * ratios
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _scale_enum(anchor, scales):
+    """Enumerate a set of anchors by scales."""
+    w, h, x_ctr, y_ctr = _whctrs(anchor)
+    ws, hs = w * scales, h * scales
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _align_args(strides, args):
+    """Align the args to the strides."""
+    args = (args * len(strides)) if len(args) == 1 else args
+    assert len(args) == len(strides)
+    return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
--- a/seetadet/data/anchors/ssd.py
+++ b/seetadet/data/anchors/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Anchor generator for SSD head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+class AnchorGenerator(object):
+    """Generate anchors for bbox regression."""
+    def __init__(self, strides, sizes, aspect_ratios):
+        self.strides = strides
+        self.sizes = _align_args(strides, sizes)
+        self.aspect_ratios = _align_args(strides, aspect_ratios)
+        self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
+        self.cell_anchors = []
+        for i in range(len(strides)):
+            self.cell_anchors.append(generate_anchors(
+                self.aspect_ratios[i], self.sizes[i]))
+        self.grid_shapes = None
+        self.grid_anchors = None
+    def reset_grid(self, max_size):
+        """Reset the grid."""
+        self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
+        self.grid_anchors = self.get_anchors(self.grid_shapes)
+    def num_cell_anchors(self, index=0):
+        """Return number of cell anchors."""
+        return self.cell_anchors[index].shape[0]
+    def get_anchors(self, shapes):
+        """Return the grid anchors."""
+        grid_anchors = []
+        for i in range(len(shapes)):
+            h, w = shapes[i]
+            shift_x = (np.arange(0, w) + 0.5) * self.strides[i]
+            shift_y = (np.arange(0, h) + 0.5) * self.strides[i]
+            shift_x, shift_y = np.meshgrid(shift_x, shift_y)
+            shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
+                                shift_x.ravel(), shift_y.ravel())).transpose()
+            shifts = shifts.astype(self.cell_anchors[i].dtype)
+            # Add a anchors (1, A, 4) to cell K shifts (K, 1, 4)
+            # to get shift anchors (K, A, 4) and reshape to (K * A, 4)
+            a = self.cell_anchors[i].shape[0]
+            k = shifts.shape[0]
+            anchors = (self.cell_anchors[i].reshape((1, a, 4)) +
+                       shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
+            grid_anchors.append(anchors.reshape((k * a, 4)))
+        return np.vstack(grid_anchors)
+def generate_anchors(ratios, sizes):
+    """Generate anchors by enumerating aspect ratios and sizes."""
+    min_size, max_size = sizes
+    base_anchor = np.array([-min_size / 2., -min_size / 2.,
+                            min_size / 2., min_size / 2.])
+    ratio_anchors = _ratio_enum(base_anchor, ratios)
+    size_anchors = _size_enum(base_anchor, min_size, max_size)
+    anchors = np.vstack([ratio_anchors[:1], size_anchors, ratio_anchors[1:]])
+    return anchors.astype('float32')
+def _whctrs(anchor):
+    """Return the xywh of an anchor."""
+    w = anchor[2] - anchor[0]
+    h = anchor[3] - anchor[1]
+    x_ctr = anchor[0] + 0.5 * w
+    y_ctr = anchor[1] + 0.5 * h
+    return w, h, x_ctr, y_ctr
+def _mkanchors(ws, hs, x_ctr, y_ctr):
+    """Return a sef of anchors by widths, heights and center."""
+    ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
+    return np.hstack((x_ctr - 0.5 * ws, y_ctr - 0.5 * hs,
+                      x_ctr + 0.5 * ws, y_ctr + 0.5 * hs))
+def _ratio_enum(anchor, ratios):
+    """Enumerate a set of anchors by aspect ratios."""
+    w, h, x_ctr, y_ctr = _whctrs(anchor)
+    hs = np.sqrt(w * h / ratios)
+    ws = hs * ratios
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _size_enum(anchor, min_size, max_size):
+    """Enumerate a anchor for size wrt base_anchor."""
+    _, _, x_ctr, y_ctr = _whctrs(anchor)
+    ws = hs = np.sqrt([min_size * max_size])
+    return _mkanchors(ws, hs, x_ctr, y_ctr)
+def _align_args(strides, args):
+    """Align the args to the strides."""
+    args = (args * len(strides)) if len(args) == 1 else args
+    assert len(args) == len(strides)
+    return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
+if __name__ == '__main__':
+    anchor_generator = AnchorGenerator(
+        strides=(8, 16, 32, 64, 100, 300),
+        sizes=((30, 60), (60, 110), (110, 162),
+               (162, 213), (213, 264), (264, 315)),
+        aspect_ratios=((1, 2, 0.5),
+                       (1, 2, 0.5, 3, 0.33),
+                       (1, 2, 0.5, 3, 0.33),
+                       (1, 2, 0.5, 3, 0.33),
+                       (1, 2, 0.5),
+                       (1, 2, 0.5)))
+    anchor_generator.reset_grid(max_size=300)
+    assert anchor_generator.grid_anchors.shape == (8732, 4)
--- a/seetadet/data/assigners.py
+++ b/seetadet/data/assigners.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Ground-truth assigners."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+from seetadet.utils.bbox import bbox_overlaps
+class MaxIoUAssigner(object):
+    """Assign ground-truth to boxes according to the IoU."""
+    def __init__(
+        self,
+        pos_iou_thr=0.5,
+        neg_iou_thr=0.5,
+        min_pos_iou=0.0,
+        match_low_quality=True,
+        gt_max_assign_all=True,
+    ):
+        """Create a ``MaxIoUAssigner``.
+        Parameters
+        ----------
+        pos_iou_thr : float, optional, default=0.5
+            The minimum IoU overlap to label positives.
+        neg_iou_thr : float, optional, default=0.5
+            The maximum IoU overlap to label negatives.
+        min_pos_iou : float, optional, default=0.0
+            The minimum IoU overlap to match low quality.
+        match_low_quality : bool, optional, default=True
+            Assign boxes for each gt box or not.
+        gt_max_assign_all : bool, optional, default=True
+            Assign all boxes with max overlaps for gt boxes or not.
+        """
+        self.pos_iou_thr = pos_iou_thr
+        self.neg_iou_thr = neg_iou_thr
+        self.min_pos_iou = min_pos_iou
+        self.match_low_quality = match_low_quality
+        self.gt_max_assign_all = gt_max_assign_all
+    def assign(self, boxes, gt_boxes):
+        # Initialize assigns with ignored index "-1".
+        num_boxes = len(boxes)
+        labels = np.empty((num_boxes,), 'int8')
+        labels.fill(-1)
+        # Overlaps between the anchors and the gt boxes.
+        overlaps = bbox_overlaps(boxes, gt_boxes)
+        max_overlaps = overlaps.max(axis=1)
+        # Background: below threshold IoU.
+        labels[max_overlaps < self.neg_iou_thr] = 0
+        # Foreground: above threshold IoU.
+        labels[max_overlaps >= self.pos_iou_thr] = 1
+        # Foreground: for each gt, assign anchor(s) with highest overlap.
+        if self.match_low_quality:
+            if self.gt_max_assign_all:
+                gt_max_overlaps = overlaps.max(axis=0)
+                if self.min_pos_iou > 0:
+                    for i in np.where(gt_max_overlaps >= self.min_pos_iou)[0]:
+                        labels[overlaps[:, i] == gt_max_overlaps[i]] = 1
+                else:
+                    labels[np.where(overlaps == gt_max_overlaps)[0]] = 1
+            else:
+                labels[overlaps.argmax(axis=0)] = 1
+        # Return the assigned labels for future development.
+        return labels
--- a/seetadet/data/build.py
+++ b/seetadet/data/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for data."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.core.config import cfg
+from seetadet.core.registry import Registry
+LOADERS = Registry('loaders')
+DATASETS = Registry('datasets')
+EVALUATORS = Registry('evaluators')
+ANCHOR_SAMPLERS = Registry('anchor_samplers')
+def build_anchor_sampler():
+    """Build the anchor sampler."""
+    return ANCHOR_SAMPLERS.try_get(cfg.MODEL.TYPE)()
+def build_dataset(path):
+    """Build the dataset."""
+    keys = path.split('://')
+    if len(keys) >= 2:
+        return DATASETS.get(keys[0])(keys[1])
+    return DATASETS.get('default')(path)
+def build_loader_train(**kwargs):
+    """Build the train loader."""
+    args = {'dataset': cfg.TRAIN.DATASET,
+            'batch_size': cfg.TRAIN.IMS_PER_BATCH,
+            'num_workers': cfg.TRAIN.NUM_WORKERS,
+            'shuffle': True, 'contiguous': True}
+    args.update(kwargs)
+    return LOADERS.get(cfg.TRAIN.LOADER)(**args)
+def build_loader_test(**kwargs):
+    """Build the test loader."""
+    args = {'dataset': cfg.TEST.DATASET,
+            'batch_size': cfg.TEST.IMS_PER_BATCH,
+            'shuffle': False, 'contiguous': False}
+    args.update(kwargs)
+    return LOADERS.get(cfg.TEST.LOADER)(**args)
+def build_evaluator(output_dir, **kwargs):
+    """Build the evaluator."""
+    evaluator_type = cfg.TEST.EVALUATOR
+    if not evaluator_type:
+        return None
+    args = {'output_dir': output_dir,
+            'classes': cfg.MODEL.CLASSES}
+    if evaluator_type == 'voc2007':
+        args['use_07_metric'] = True
+    args.update(kwargs)
+    evaluator = EVALUATORS.get(evaluator_type)(**args)
+    ann_file = cfg.TEST.JSON_DATASET
+    if ann_file:
+        evaluator.load_annotations(ann_file)
+    return evaluator
--- a/seetadet/data/datasets/__init__.py
+++ b/seetadet/data/datasets/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Datasets."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.data.datasets import dataset
+from seetadet.data.datasets.datum import AnnotatedDatum
--- a/seetadet/data/datasets/dataset.py
+++ b/seetadet/data/datasets/dataset.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Base dataset."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import codewithgpu
+from seetadet.core.config import cfg
+from seetadet.data.build import DATASETS
+class Dataset(object):
+    """Base dataset class."""
+    def __init__(self, source):
+        self.source = source
+        self.classes = cfg.MODEL.CLASSES
+        self.num_classes = len(self.classes)
+        self.class_to_ind = dict(zip(self.classes, range(self.num_classes)))
+    @property
+    def getter(self):
+        """Return the dataset getter."""
+        return type(self)
+    @property
+    def size(self):
+        """Return the dataset size."""
+        return 0
+@DATASETS.register('default')
+class RecordDataset(Dataset):
+    def __init__(self, source):
+        super(RecordDataset, self).__init__(source)
+    @property
+    def getter(self):
+        """Return the dataset getter."""
+        return codewithgpu.RecordDataset
+    @property
+    def size(self):
+        """Return the dataset size."""
+        return self.getter(self.source).size
--- a/seetadet/data/datasets/datum.py
+++ b/seetadet/data/datasets/datum.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Annotated datum."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import cv2
+import numpy as np
+class AnnotatedDatum(object):
+    """Wrapper for annotated datum."""
+    def __init__(self, example):
+        self._example = example
+        self._img = None
+    @property
+    def id(self):
+        """Return the example id."""
+        return self._example['id']
+    @property
+    def height(self):
+        """Return the image height."""
+        return self._example['height']
+    @property
+    def width(self):
+        """Return the image width."""
+        return self._example['width']
+    @property
+    def img(self):
+        """Return the image array."""
+        if self._img is None:
+            img_bytes = np.frombuffer(self._example['content'], 'uint8')
+            self._img = cv2.imdecode(img_bytes, cv2.IMREAD_COLOR)
+        return self._img
+    @property
+    def objects(self):
+        """Return the annotated objects."""
+        objects = []
+        for obj in self._example['object']:
+            mask = obj.get('mask', None)
+            polygons = obj.get('polygons', None)
+            if 'x3' in obj:
+                poly = np.array([obj['x1'], obj['y1'],
+                                 obj['x2'], obj['y2'],
+                                 obj['x3'], obj['y3'],
+                                 obj['x4'], obj['y4']], 'float32')
+                x, y, w, h = cv2.boundingRect(poly.reshape((-1, 2)))
+                bbox = [x, y, x + w, y + h]
+                polygons = [poly]
+            elif 'x2' in obj:
+                bbox = [obj['x1'], obj['y1'], obj['x2'], obj['y2']]
+            elif 'xmin' in obj:
+                bbox = [obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']]
+            else:
+                bbox = obj['bbox']
+            objects.append({'name': obj['name'],
+                            'bbox': bbox,
+                            'difficult': obj.get('difficult', 0)})
+            if mask is not None and len(mask) > 0:
+                objects[-1]['mask'] = mask
+            elif polygons is not None and len(polygons) > 0:
+                objects[-1]['polygons'] = [np.array(p) for p in polygons]
+        return objects
--- a/seetadet/data/evaluators/__init__.py
+++ b/seetadet/data/evaluators/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Evaluators."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.data.evaluators import coco_evaluator
+from seetadet.data.evaluators import voc_evaluator
--- a/seetadet/data/evaluators/coco_evaluator.py
+++ b/seetadet/data/evaluators/coco_evaluator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""COCO dataset evaluator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+import prettytable
+from pycocotools.cocoeval import COCOeval
+from seetadet.data.build import EVALUATORS
+from seetadet.data.evaluators.evaluator import Evaluator
+@EVALUATORS.register('coco')
+class COCOEvaluator(Evaluator):
+    """Evaluator for MS COCO dataset."""
+    def __init__(self, output_dir, classes):
+        super(COCOEvaluator, self).__init__(output_dir, classes, COCOeval)
+    def print_eval_results(self, coco_eval):
+        def get_thr_ind(coco_eval, thr):
+            ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
+                           (coco_eval.params.iouThrs < thr + 1e-5))[0][0]
+            iou_thr = coco_eval.params.iouThrs[ind]
+            assert np.isclose(iou_thr, thr)
+            return ind
+        ind_lo = get_thr_ind(coco_eval, 0.5)
+        ind_hi = get_thr_ind(coco_eval, 0.95)
+        # Precision: (iou, recall, cls, area range, max dets)
+        # Recall: (iou, cls, area range, max dets)
+        # Area range index 0: all area ranges
+        # Max dets index 2: 100 per image
+        all_prec = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
+        all_recall = coco_eval.eval['recall'][ind_lo:(ind_hi + 1), :, 0, 2]
+        metrics = collections.OrderedDict([
+            ('AP@[IoU=0.5:0.95]', []), ('AR@[IoU=0.5:0.95]', [])])
+        class_table = prettytable.PrettyTable()
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            ap = np.mean(all_prec[:, :, cls_ind - 1])  # (iou, recall, cls)
+            recall = np.mean(all_recall[:, cls_ind - 1])  # (iou, cls)
+            metrics['AP@[IoU=0.5:0.95]'].append(ap)
+            metrics['AR@[IoU=0.5:0.95]'].append(recall)
+        for k, v in metrics.items():
+            v = np.nan_to_num(v, nan=0)
+            class_table.add_column(k, np.round(v * 100, 2))
+        class_table.add_column('Class', self.classes[1:])
+        print('Per class results:\n' + class_table.get_string(), '\n')
+        print('Summary:')
+        coco_eval.summarize()
--- a/seetadet/data/evaluators/evaluator.py
+++ b/seetadet/data/evaluators/evaluator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Base evaluator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import json
+import os
+import numpy as np
+from pycocotools.coco import COCO
+from seetadet.data.build import build_loader_test
+from seetadet.utils import logging
+from seetadet.utils.mask import encode_masks
+from seetadet.utils.mask import paste_masks
+class Evaluator(object):
+    """Evaluator using COCO json dataset format."""
+    def __init__(self, output_dir, classes, eval_type=None):
+        self.output_dir = output_dir
+        self.classes = classes
+        self.num_classes = len(self.classes)
+        self.class_to_cat_id = dict(zip(self.classes, range(self.num_classes)))
+        self.eval_type = eval_type
+        self.cocoGt = None
+        self.loader = build_loader_test()
+        self.num_images = self.loader.dataset_size
+        self.cached_inputs = []
+        self.records = collections.OrderedDict()
+    def eval_bbox(self, boxes):
+        """Evaluate bbox results."""
+        if len(self.cocoGt.dataset['annotations']) == 0:
+            logging.info('No annotations. Skip evaluation.')
+            return
+        self.verify_records()
+        res_file = self.write_bbox_results(boxes)
+        cocoDt = self.cocoGt.loadRes(res_file)
+        coco_eval = self.eval_type(self.cocoGt, cocoDt, 'bbox')
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+        self.print_eval_results(coco_eval)
+    def eval_segm(self, boxes, masks):
+        """Evaluate segmentation results."""
+        if len(self.cocoGt.dataset['annotations']) == 0:
+            logging.info('No annotations. Skip evaluation.')
+            return
+        self.verify_records()
+        res_file = self.write_segm_results(boxes, masks)
+        cocoDt = self.cocoGt.loadRes(res_file)
+        coco_eval = self.eval_type(self.cocoGt, cocoDt, 'segm')
+        coco_eval.evaluate()
+        coco_eval.accumulate()
+        self.print_eval_results(coco_eval)
+    def get_image(self):
+        """Return the next image for evaluation."""
+        if len(self.cached_inputs) == 0:
+            inputs = self.loader()
+            for i, img_meta in enumerate(inputs['img_meta']):
+                self.cached_inputs.append({
+                    'img': inputs['img'][i],
+                    'objects': inputs['objects'][i],
+                    'id': img_meta['id'],
+                    'height': img_meta['height'],
+                    'width': img_meta['width']})
+        inputs = self.cached_inputs.pop(0)
+        img_id, img = inputs.pop('id'), inputs.pop('img')
+        self.records[img_id] = inputs
+        return img_id, img
+    def load_annotations(self, ann_file=None):
+        """Load annotations."""
+        self.cocoGt = COCO(ann_file)
+        if len(self.cocoGt.dataset) > 0:
+            self.class_to_cat_id = dict((v['name'], v['id'])
+                                        for v in self.cocoGt.cats.values())
+    def verify_records(self):
+        """Verify loaded records."""
+        if len(self.records) != self.num_images:
+            raise RuntimeError(
+                'Mismatched number of records and images. ({} vs. {}).'
+                '\nCheck if existing duplicate image ids.'
+                .format(len(self.records), self.num_images))
+        if self.cocoGt is None:
+            ann_file = self.write_annotations(self.records, self.output_dir)
+            self.load_annotations(ann_file)
+    def print_eval_results(self, coco_eval):
+        """Print the evaluation results."""
+    def bbox_results_one_category(self, boxes, cat_id):
+        """Write bbox results of a specific category."""
+        results = []
+        for i, img_id in enumerate(self.records.keys()):
+            dets = boxes[i].astype('float64')
+            if len(dets) == 0:
+                continue
+            xs, ys = dets[:, 0], dets[:, 1]
+            ws, hs = dets[:, 2] - xs, dets[:, 3] - ys
+            scores = dets[:, -1]
+            results.extend([{
+                'image_id': self.get_image_id(img_id),
+                'category_id': cat_id,
+                'bbox': [xs[j], ys[j], ws[j], hs[j]],
+                'score': scores[j],
+            } for j in range(dets.shape[0])])
+        return results
+    def segm_results_one_category(self, boxes, masks, cat_id):
+        """Write segm results of a specific category."""
+        results = []
+        for i, (img_id, rec) in enumerate(self.records.items()):
+            dets = boxes[i]
+            if len(dets) == 0:
+                continue
+            scores = dets[:, -1]
+            rles = encode_masks(paste_masks(
+                masks[i], dets, (rec['height'], rec['width'])))
+            results.extend([{
+                'image_id': self.get_image_id(img_id),
+                'category_id': cat_id,
+                'segmentation': rles[j],
+                'score': float(scores[j]),
+            } for j in range(dets.shape[0])])
+        return results
+    def write_bbox_results(self, boxes):
+        """Write bbox results."""
+        results = []
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            print('Collecting {} results ({:d}/{:d})'
+                  .format(cls, cls_ind, self.num_classes - 1))
+            results.extend(self.bbox_results_one_category(
+                boxes[cls_ind], self.class_to_cat_id[cls]))
+        res_file = self.get_res_file(type='bbox')
+        print('Writing results json to {}'.format(res_file))
+        with open(res_file, 'w') as f:
+            json.dump(results, f)
+        return res_file
+    def write_segm_results(self, boxes, masks):
+        """Write segm results."""
+        results = []
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            print('Collecting {} results ({:d}/{:d})'
+                  .format(cls, cls_ind, self.num_classes - 1))
+            results.extend(self.segm_results_one_category(
+                boxes[cls_ind], masks[cls_ind], self.class_to_cat_id[cls]))
+        res_file = self.get_res_file(type='segm')
+        print('Writing results json to {}'.format(res_file))
+        with open(res_file, 'w') as fid:
+            json.dump(results, fid)
+        return res_file
+    def write_annotations(self):
+        """Write annotations."""
+        dataset = {'images': [], 'categories': [], 'annotations': []}
+        for img_id, rec in self.records.items():
+            dataset['images'].append({
+                'id': self.get_image_id(img_id),
+                'height': rec['height'], 'width': rec['width']})
+        for cls in self.classes:
+            if cls == '__background__':
+                continue
+            dataset['categories'].append({
+                'name': cls, 'id': self.class_to_cat_id[cls]})
+        for img_id, rec in self.records.items():
+            img_size = (rec['height'], rec['width'])
+            for obj in rec['objects']:
+                x, y = obj['bbox'][0], obj['bbox'][1]
+                w, h = obj['bbox'][2] - x, obj['bbox'][3] - y
+                dataset['annotations'].append({
+                    'id': str(len(dataset['annotations'])),
+                    'bbox': [x, y, w, h],
+                    'area': w * h,
+                    'iscrowd': obj['difficult'],
+                    'image_id': self.get_image_id(img_id),
+                    'category_id': self.class_to_cat_id[obj['name']]})
+                if 'mask' in obj:
+                    segm = {'size': img_size, 'counts': obj['mask']}
+                    dataset['annotations'][-1]['segmentation'] = segm
+                elif 'polygons' in obj:
+                    segm = []
+                    for poly in obj['polygons']:
+                        if isinstance(poly, np.ndarray):
+                            poly = poly.tolist()
+                        segm.append(poly)
+                    dataset['annotations'][-1]['segmentation'] = segm
+        ann_file = self.get_ann_file()
+        print('Writing annotations json to {}'.format(ann_file))
+        with open(ann_file, 'w') as f:
+            json.dump(dataset, f)
+        return ann_file
+    def get_ann_file(self):
+        """Return the ann filename."""
+        filename = 'annotations.json'
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+        return os.path.join(self.output_dir, filename)
+    def get_res_file(self, type='bbox'):
+        """Return the result filename."""
+        prefix = ''
+        if type == 'bbox':
+            prefix = 'detections'
+        elif type == 'segm':
+            prefix = 'segmentations'
+        elif type == 'kpt':
+            prefix = 'keypoints'
+        filename = prefix + '.json'
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+        return os.path.join(self.output_dir, filename)
+    @staticmethod
+    def get_image_id(image_name):
+        """Return the image name from the id."""
+        image_id = image_name.split('_')[-1].split('.')[0]
+        try:
+            return int(image_id)
+        except ValueError:
+            return image_name
--- a/seetadet/data/evaluators/voc_eval.py
+++ b/seetadet/data/evaluators/voc_eval.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Evaluation API on the Pascal VOC dataset."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import datetime
+import itertools
+import time
+import numpy as np
+from pycocotools import mask as maskUtils
+def voc_ap(rec, prec, use_07_metric=False):
+    """Compute VOC AP given precision and recall."""
+    if use_07_metric:
+        # 11 point metric.
+        ap = 0.
+        for t in np.arange(0., 1.1, 0.1):
+            if np.sum(rec >= t) == 0:
+                p = 0
+            else:
+                p = np.max(prec[rec >= t])
+            ap = ap + p / 11.
+    else:
+        # Correct AP calculation.
+        # First append sentinel values at the end.
+        mrec = np.concatenate(([0.], rec, [1.]))
+        mpre = np.concatenate(([0.], prec, [0.]))
+        # Compute the precision envelope.
+        for i in range(mpre.size - 1, 0, -1):
+            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
+        # To calculate area under PR curve, look for points.
+        # where X axis (recall) changes value.
+        i = np.where(mrec[1:] != mrec[:-1])[0]
+        # And sum (\Delta recall) * prec
+        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
+    return ap
+class VOCeval(object):
+    """Interface for evaluating detection via COCO object."""
+    def __init__(self, cocoGt=None, cocoDt=None, iouType='bbox',
+                 iouThrs=[0.5, 0.7], use_07_metric=False):
+        self.cocoGt = cocoGt
+        self.cocoDt = cocoDt
+        self.params = Params(iouType)
+        self.params.iouThrs = iouThrs
+        self.params.use_07_metric = use_07_metric
+        if cocoGt is not None:
+            self.params.imgIds = sorted(cocoGt.getImgIds())
+            self.params.catIds = sorted(cocoGt.getCatIds())
+        self.ious = {}
+    def _prepare(self):
+        p = self.params
+        gts = self.cocoGt.loadAnns(
+            self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
+        dts = self.cocoDt.loadAnns(
+            self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
+        for gt in gts:
+            gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
+            gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
+        self._gts = collections.defaultdict(list)
+        self._dts = collections.defaultdict(list)
+        for gt in gts:
+            self._gts[gt['image_id'], gt['category_id']].append(gt)
+        for dt in dts:
+            self._dts[dt['image_id'], dt['category_id']].append(dt)
+        self.eval = {}
+    def evaluate(self):
+        tic = time.time()
+        print('Running per image evaluation...')
+        p = self.params
+        print('Evaluate annotation type *{}*'.format(p.iouType))
+        p.imgIds = list(np.unique(p.imgIds))
+        p.catIds = list(np.unique(p.catIds))
+        self._prepare()
+        self.ious = {(imgId, catId): self.computeIoU(imgId, catId)
+                     for imgId in p.imgIds for catId in p.catIds}
+        self.evalImgs = [self.evaluateImg(imgId, catId)
+                         for catId in p.catIds for imgId in p.imgIds]
+        toc = time.time()
+        print('DONE (t={:0.2f}s).'.format(toc - tic))
+    def accumulate(self, p=None):
+        print('Accumulating evaluation results...')
+        tic = time.time()
+        if not self.evalImgs:
+            print('Please run evaluate() first')
+        if p is None:
+            p = self.params
+        print('VOC07 metric? ' + ('Yes' if p.use_07_metric else 'No'))
+        T, K, I = len(p.iouThrs), len(p.catIds), len(p.imgIds)
+        recall, ap = np.zeros((T, K)), np.zeros((T, K))
+        for k in range(K):
+            E = [self.evalImgs[k * I + i] for i in range(I)]
+            E = [e for e in E if e is not None]
+            if len(E) == 0:
+                continue
+            dtScores = np.concatenate([e['dtScores'] for e in E])
+            inds = np.argsort(-dtScores)
+            dtm = np.concatenate([e['dtMatches'] for e in E], axis=1)[:, inds]
+            dtIg = np.concatenate([e['dtIgnore'] for e in E], axis=1)[:, inds]
+            gtIg = np.concatenate([e['gtIgnore'] for e in E])
+            npig = np.count_nonzero(gtIg == 0)
+            if npig == 0:
+                continue
+            tps = np.logical_and(dtm, np.logical_not(dtIg))
+            fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg))
+            tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
+            fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
+            for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
+                nd = len(tp)
+                rc = tp / npig
+                pr = tp / np.maximum(tp + fp, np.spacing(1))
+                recall[t, k] = rc[-1] if nd else 0
+                ap[t, k] = voc_ap(rc, pr, use_07_metric=p.use_07_metric)
+        self.eval = {'counts': [T, K],
+                     'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
+                     'ap': ap, 'recall': recall}
+        toc = time.time()
+        print('DONE (t={:0.2f}s).'.format(toc - tic))
+    def computeIoU(self, imgId, catId):
+        p = self.params
+        gt = self._gts[imgId, catId]
+        dt = self._dts[imgId, catId]
+        if len(gt) == 0 and len(dt) == 0:
+            return []
+        inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
+        dt = [dt[i] for i in inds]
+        if p.iouType == 'segm':
+            g = [g['segmentation'] for g in gt]
+            d = [d['segmentation'] for d in dt]
+        elif p.iouType == 'bbox':
+            g = [g['bbox'] for g in gt]
+            d = [d['bbox'] for d in dt]
+        else:
+            raise Exception('unknown iouType for iou computation')
+        iscrowd = [int(o['iscrowd']) for o in gt]
+        return maskUtils.iou(d, g, iscrowd)
+    def evaluateImg(self, imgId, catId):
+        p = self.params
+        gt = self._gts[imgId, catId]
+        dt = self._dts[imgId, catId]
+        if len(gt) == 0 and len(dt) == 0:
+            return None
+        for g in gt:
+            g['_ignore'] = g['ignore']
+        gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
+        gt = [gt[i] for i in gtind]
+        dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
+        dt = [dt[i] for i in dtind]
+        iscrowd = [int(o['iscrowd']) for o in gt]
+        ious = (self.ious[imgId, catId][:, gtind]
+                if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId])
+        T, G, D = len(p.iouThrs), len(gt), len(dt)
+        gtm, dtm = np.zeros((T, G)), np.zeros((T, D))
+        gtIg, dtIg = np.array([g['_ignore'] for g in gt]), np.zeros((T, D))
+        for (tind, iou), (dind, d) in itertools.product(
+                enumerate(p.iouThrs), enumerate(dt)):
+            m = -1
+            for gind, g in enumerate(gt):
+                if gtm[tind, gind] > 0 and not iscrowd[gind]:
+                    continue
+                if m > -1 and gtIg[m] == 0 and gtIg[gind] == 1:
+                    break
+                if ious[dind, gind] <= iou:
+                    continue
+                m = gind
+            if m == -1:
+                continue
+            dtIg[tind, dind] = gtIg[m]
+            dtm[tind, dind] = gt[m]['id']
+            gtm[tind, m] = d['id']
+        return {'image_id': imgId,
+                'category_id': catId,
+                'dtMatches': dtm,
+                'dtScores': [d['score'] for d in dt],
+                'gtIgnore': gtIg,
+                'dtIgnore': dtIg}
+class Params(object):
+    """Params for evaluation API."""
+    def setDetParams(self):
+        self.imgIds = []
+        self.catIds = []
+        self.iouThrs = [0.5]
+        self.use_07_metric = False
+    def __init__(self, iouType='segm'):
+        if iouType == 'segm' or iouType == 'bbox':
+            self.setDetParams()
+        else:
+            raise Exception('iouType not supported')
+        self.iouType = iouType
--- a/seetadet/data/evaluators/voc_evaluator.py
+++ b/seetadet/data/evaluators/voc_evaluator.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""VOC dataset evaluator."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import functools
+import numpy as np
+import prettytable
+from seetadet.data.build import EVALUATORS
+from seetadet.data.evaluators.evaluator import Evaluator
+from seetadet.data.evaluators.voc_eval import VOCeval
+@EVALUATORS.register(['voc', 'voc2007', 'voc2010', 'voc2012'])
+class VOCEvaluator(Evaluator):
+    """Evaluator for Pascal VOC dataset."""
+    def __init__(self, output_dir, classes, use_07_metric=False):
+        eval_type = functools.partial(
+            VOCeval, iouThrs=[0.5], use_07_metric=use_07_metric)
+        super(VOCEvaluator, self).__init__(output_dir, classes, eval_type)
+    def print_eval_results(self, coco_eval):
+        metrics = collections.OrderedDict()
+        for cls_ind, cls in enumerate(self.classes):
+            if cls == '__background__':
+                continue
+            for k, name in zip(('ap', 'recall'), ('AP', 'AR')):
+                for i, iou in enumerate(coco_eval.params.iouThrs):
+                    name = '%s@[IoU=%s]' % (name, str(iou))
+                    v = coco_eval.eval[k][i, cls_ind - 1]
+                    if name not in metrics:
+                        metrics[name] = []
+                    metrics[name].append(v)
+        class_table = prettytable.PrettyTable()
+        summary_list = []
+        for k, v in metrics.items():
+            v = np.nan_to_num(v, nan=0)
+            class_table.add_column(k, np.round(v * 100, 2))
+            iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
+            titleStr, typeStr = 'Average Precision', '(AP)'
+            if k.startswith('AR'):
+                titleStr, typeStr = 'Average Recall', '(AR)'
+            iouStr = '{:0.2f}'.format(float(k.split('IoU=')[-1][:-1]))
+            summary_list.append(iStr.format(titleStr, typeStr, iouStr, 'all', -1, np.mean(v)))
+        class_table.add_column('Class', self.classes[1:])
+        print('Per class results:\n' + class_table.get_string(), '\n')
+        print('Summary:\n' + '\n'.join(summary_list))
--- a/seetadet/data/loader.py
+++ b/seetadet/data/loader.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Data loader."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import multiprocessing as mp
+import time
+import threading
+import queue
+import codewithgpu
+import dragon
+from seetadet.core.config import cfg
+from seetadet.data.build import build_dataset
+from seetadet.utils import logging
+from seetadet.utils.blob import blob_vstack
+class BalancedQueues(object):
+    """Balanced queues."""
+    def __init__(self, base_queue, num=1):
+        self.queues = [base_queue]
+        self.queues += [mp.Queue(base_queue._maxsize) for _ in range(num - 1)]
+        self.index = 0
+    def put(self, obj, block=True, timeout=None):
+        q = self.queues[self.index]
+        q.put(obj, block=block, timeout=timeout)
+        self.index = (self.index + 1) % len(self.queues)
+    def get(self, block=True, timeout=None):
+        q = self.queues[self.index]
+        obj = q.get(block=block, timeout=timeout)
+        self.index = (self.index + 1) % len(self.queues)
+        return obj
+    def get_n(self, num=1):
+        outputs = []
+        while len(outputs) < num:
+            obj = self.get()
+            if obj is not None:
+                outputs.append(obj)
+        return outputs
+class DataLoaderBase(threading.Thread):
+    """Base class of data loader."""
+    def __init__(self, worker, **kwargs):
+        super(DataLoaderBase, self).__init__(daemon=True)
+        self.batch_size = kwargs.get('batch_size', 2)
+        self.num_readers = kwargs.get('num_readers', 1)
+        self.num_workers = kwargs.get('num_workers', 3)
+        self.queue_depth = kwargs.get('queue_depth', 2)
+        # Initialize distributed group.
+        rank, group_size = 0, 1
+        dist_group = dragon.distributed.get_group()
+        if dist_group is not None:
+            group_size = dist_group.size
+            rank = dragon.distributed.get_rank(dist_group)
+        # Build queues.
+        self.reader_queue = mp.Queue(self.queue_depth * self.batch_size)
+        self.worker_queue = mp.Queue(self.queue_depth * self.batch_size)
+        self.batch_queue = queue.Queue(self.queue_depth)
+        self.reader_queue = BalancedQueues(self.reader_queue, self.num_workers)
+        self.worker_queue = BalancedQueues(self.worker_queue, self.num_workers)
+        # Build readers.
+        self.readers = []
+        for i in range(self.num_readers):
+            partition_id = i
+            num_partitions = self.num_readers
+            num_partitions *= group_size
+            partition_id += rank * self.num_readers
+            self.readers.append(codewithgpu.DatasetReader(
+                output_queue=self.reader_queue,
+                partition_id=partition_id,
+                num_partitions=num_partitions,
+                seed=cfg.RNG_SEED + partition_id, **kwargs))
+            self.readers[i].start()
+            time.sleep(0.1)
+        # Build workers.
+        self.workers = []
+        for i in range(self.num_workers):
+            p = worker(**kwargs)
+            p.seed += (i + rank * self.num_workers)
+            p.reader_queue = self.reader_queue.queues[i]
+            p.worker_queue = self.worker_queue.queues[i]
+            p.start()
+            self.workers.append(p)
+            time.sleep(0.1)
+        # Register cleanup callbacks.
+        def cleanup():
+            def terminate(processes):
+                for p in processes:
+                    p.terminate()
+                    p.join()
+            terminate(self.workers)
+            terminate(self.readers)
+        import atexit
+        atexit.register(cleanup)
+        # Start batch prefetching.
+        self.start()
+    def next(self):
+        """Return the next batch of data."""
+        return self.__next__()
+    def run(self):
+        """Main loop."""
+    def __call__(self):
+        return self.next()
+    def __iter__(self):
+        """Return the iterator self."""
+        return self
+    def __next__(self):
+        """Return the next batch of data."""
+        return self.batch_queue.get()
+class DataLoader(DataLoaderBase):
+    """Loader to return the batch of data."""
+    def __init__(self, dataset, worker, **kwargs):
+        dataset = build_dataset(dataset)
+        self.dataset_size = dataset.size
+        self.contiguous = kwargs.get('contiguous', True)
+        self.prefetch_count = kwargs.get('prefetch_count', 50)
+        self.img_mean = cfg.MODEL.PIXEL_MEAN
+        self.img_align = (cfg.BACKBONE.COARSEST_STRIDE,) * 2
+        args = {'path': dataset.source,
+                'dataset_getter': dataset.getter,
+                'classes': dataset.classes,
+                'shuffle': kwargs.get('shuffle', True),
+                'batch_size': kwargs.get('batch_size', 1),
+                'num_workers': kwargs.get('num_workers', 1)}
+        super(DataLoader, self).__init__(worker, **args)
+    def run(self):
+        """Main loop."""
+        logging.info('Prefetch batches...')
+        prev_inputs = self.worker_queue.get_n(
+            self.prefetch_count * self.batch_size)
+        next_inputs = []
+        while True:
+            # Use cached buffer for next N inputs.
+            if len(next_inputs) == 0:
+                next_inputs = prev_inputs
+                if 'aspect_ratio' in next_inputs[0]:
+                    # Inputs are sorted for aspect ratio grouping.
+                    next_inputs.sort(key=lambda d: d['aspect_ratio'][0] > 1)
+                prev_inputs = []
+            # Collect the next batch.
+            outputs = collections.defaultdict(list)
+            for _ in range(self.batch_size):
+                inputs = next_inputs.pop(0)
+                for k, v in inputs.items():
+                    outputs[k].extend(v)
+                prev_inputs += self.worker_queue.get_n(1)
+            # Stack batch data.
+            if self.contiguous:
+                outputs['img'] = blob_vstack(
+                    outputs['img'], fill_value=self.img_mean,
+                    align=self.img_align)
+            # Send batch data to consumer.
+            self.batch_queue.put(outputs)
--- a/seetadet/data/pipelines.py
+++ b/seetadet/data/pipelines.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Data loading pipelines."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import multiprocessing
+import cv2
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data import transforms
+from seetadet.data.build import LOADERS
+from seetadet.data.build import build_anchor_sampler
+from seetadet.data.datasets import AnnotatedDatum
+from seetadet.data.loader import DataLoader
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.bbox import filter_empty_boxes
+class WorkerBase(multiprocessing.Process):
+    """Base class of data worker."""
+    def __init__(self):
+        super(WorkerBase, self).__init__(daemon=True)
+        self.seed = cfg.RNG_SEED
+        self.reader_queue = None
+        self.worker_queue = None
+    def get_outputs(self, inputs):
+        """Return the processed outputs."""
+        return inputs
+    def run(self):
+        """Main prefetch loop."""
+        # Disable the opencv threading.
+        cv2.setNumThreads(1)
+        # Fix the process-local random seed.
+        np.random.seed(self.seed)
+        inputs = []
+        while True:
+            # Use cached buffer for next 4 inputs.
+            while len(inputs) < 4:
+                inputs.append(self.reader_queue.get())
+            outputs = self.get_outputs(inputs)
+            self.worker_queue.put(outputs)
+class DetTrainWorker(WorkerBase):
+    """Generic train pipeline for detection."""
+    def __init__(self, **kwargs):
+        super(DetTrainWorker, self).__init__()
+        self.parse_boxes = transforms.ParseBoxes()
+        self.resize = transforms.RandomResize(
+            scales=cfg.TRAIN.SCALES,
+            scales_range=cfg.TRAIN.SCALES_RANGE,
+            max_size=cfg.TRAIN.MAX_SIZE)
+        self.flip = transforms.RandomFlip()
+        self.crop = transforms.RandomCrop(cfg.TRAIN.CROP_SIZE)
+        self.distort = transforms.ColorJitter(cfg.TRAIN.COLOR_JITTER)
+        self.anchor_sampler = build_anchor_sampler()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, boxes = datum.img, self.parse_boxes(datum)
+        img, boxes = self.resize(img, boxes)
+        img, boxes = self.flip(img, boxes)
+        img, boxes = self.crop(img, boxes)
+        boxes = clip_boxes(boxes, img.shape)
+        boxes = boxes[filter_empty_boxes(boxes)]
+        if len(boxes) == 0:
+            return None
+        img = self.distort(img)
+        im_scale = self.resize.im_scale
+        aspect_ratio = float(img.shape[0]) / float(img.shape[1])
+        outputs = {'img': [img],
+                   'gt_boxes': [boxes],
+                   'im_info': [img.shape[:2] + (im_scale,)],
+                   'aspect_ratio': [aspect_ratio]}
+        if self.anchor_sampler is not None:
+            data = self.anchor_sampler.sample(boxes)
+            for k, v in data.items():
+                outputs[k] = [v]
+        return outputs
+class MaskTrainWorker(WorkerBase):
+    """Generic train pipeline for instance segmentation."""
+    def __init__(self, **kwargs):
+        super(MaskTrainWorker, self).__init__()
+        self.parse_boxes = transforms.ParseBoxes()
+        self.parse_segms = transforms.ParseSegms()
+        self.resize = transforms.RandomResize(
+            scales=cfg.TRAIN.SCALES,
+            scales_range=cfg.TRAIN.SCALES_RANGE,
+            max_size=cfg.TRAIN.MAX_SIZE)
+        self.flip = transforms.RandomFlip()
+        self.crop = transforms.RandomCrop(cfg.TRAIN.CROP_SIZE)
+        self.distort = transforms.ColorJitter(cfg.TRAIN.COLOR_JITTER)
+        self.recompute_boxes = cfg.TRAIN.CROP_SIZE > 0
+        self.anchor_sampler = build_anchor_sampler()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, boxes = datum.img, self.parse_boxes(datum)
+        segms = self.parse_segms(datum)
+        img, boxes, segms = self.resize(img, boxes, segms)
+        img, boxes, segms = self.flip(img, boxes, segms)
+        img, boxes, segms = self.crop(img, boxes, segms)
+        if self.recompute_boxes:
+            boxes[:, :4] = segms.get_boxes()
+        else:
+            boxes = clip_boxes(boxes, img.shape)
+        keep = filter_empty_boxes(boxes)
+        boxes, segms = boxes[keep], segms[keep]
+        if len(boxes) == 0:
+            return None
+        img = self.distort(img)
+        im_scale = self.resize.im_scale
+        aspect_ratio = float(img.shape[0]) / float(img.shape[1])
+        outputs = {'img': [img],
+                   'gt_boxes': [boxes],
+                   'gt_segms': [segms],
+                   'im_info': [img.shape[:2] + (im_scale,)],
+                   'scale_jitter': [self.resize.scale_jitter],
+                   'aspect_ratio': [aspect_ratio]}
+        if self.anchor_sampler is not None:
+            data = self.anchor_sampler.sample(boxes)
+            for k, v in data.items():
+                outputs[k] = [v]
+        return outputs
+class SSDTrainWorker(WorkerBase):
+    """Generic train pipeline for SSD detection."""
+    def __init__(self, **kwargs):
+        super(SSDTrainWorker, self).__init__()
+        self.parse_boxes = transforms.ParseBoxes()
+        self.paste = transforms.RandomPaste()
+        self.crop = transforms.RandomBBoxCrop()
+        self.resize = transforms.ResizeWarp(cfg.TRAIN.SCALES[0])
+        self.flip = transforms.RandomFlip()
+        self.distort = transforms.ColorJitter(cfg.TRAIN.COLOR_JITTER)
+        self.anchor_sampler = build_anchor_sampler()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, boxes = datum.img, self.parse_boxes(datum)
+        boxes /= [(img.shape[1], img.shape[0]) * 2 + (1,)]
+        img, boxes = self.paste(img, boxes)
+        img, boxes = self.crop(img, boxes)
+        if len(boxes) == 0:
+            return None
+        img = self.resize(img)
+        boxes[:, :4] *= img.shape[0]
+        img, boxes = self.flip(img, boxes)
+        img = self.distort(img)
+        outputs = {'img': [img],
+                   'gt_boxes': [boxes],
+                   'im_info': [img.shape[:2]]}
+        if self.anchor_sampler is not None:
+            data = self.anchor_sampler.sample(boxes)
+            for k, v in data.items():
+                outputs[k] = [v]
+        return outputs
+class DetTestWorker(WorkerBase):
+    """Generic test pipeline for detection."""
+    def __init__(self, **kwargs):
+        super(DetTestWorker, self).__init__()
+    def get_outputs(self, inputs):
+        datum = AnnotatedDatum(inputs.pop(0))
+        img, objects = datum.img, datum.objects
+        outputs = {'img': [img], 'objects': [objects],
+                   'img_meta': [{'id': datum.id,
+                                 'height': datum.height,
+                                 'width': datum.width}]}
+        return outputs
+LOADERS.register('det_train', DataLoader, worker=DetTrainWorker)
+LOADERS.register('mask_train', DataLoader, worker=MaskTrainWorker)
+LOADERS.register('ssd_train', DataLoader, worker=SSDTrainWorker)
+LOADERS.register('det_test', DataLoader, worker=DetTestWorker)
--- a/seetadet/data/structures/__init__.py
+++ b/seetadet/data/structures/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Structures."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.data.structures.mask import PolygonMasks
--- a/seetadet/data/structures/mask.py
+++ b/seetadet/data/structures/mask.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask structure."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from copy import deepcopy
+import numpy as np
+from seetadet.utils.polygon import crop_polygons
+from seetadet.utils.polygon import flip_polygons
+from seetadet.utils.mask import mask_from
+class PolygonMasks(object):
+    """Polygon masks."""
+    def __init__(self, shape=None):
+        self.data = []
+        self.shape = list(shape)
+    def new_masks(self, data, copy=False):
+        """Return a new masks object."""
+        ret = PolygonMasks(self.shape)
+        ret.data = deepcopy(data) if copy else data
+        return ret
+    def apply_flip(self):
+        """Apply flip transform."""
+        for i, mask in enumerate(self.data):
+            if mask is None:
+                continue
+            self.data[i] = flip_polygons(mask, self.shape[1])
+        return self
+    def apply_resize(self, size=None, scale=None):
+        """Apply resize transform."""
+        if size is None:
+            if not isinstance(scale, (tuple, list)):
+                scale = (scale, scale)
+            self.shape[0] = int(self.shape[0] * scale[0] + .5)
+            self.shape[1] = int(self.shape[1] * scale[1] + .5)
+        else:
+            if not isinstance(size, (tuple, list)):
+                size = (size, size)
+            scale = (size[0] * 1. / self.shape[0],
+                     size[1] * 1. / self.shape[1])
+            self.shape = list(size)
+        for mask in self.data:
+            if mask is None:
+                continue
+            for p in mask:
+                p[0::2] *= scale[1]
+                p[1::2] *= scale[0]
+        return self
+    def apply_crop(self, crop_box):
+        """Apply crop transform."""
+        self.shape = [crop_box[3] - crop_box[1],
+                      crop_box[2] - crop_box[0]]
+        for i, mask in enumerate(self.data):
+            if mask is None:
+                continue
+            self.data[i] = crop_polygons(mask, crop_box)
+    def crop_and_resize(self, boxes, mask_size):
+        """Return the resized ROI masks."""
+        return [mask_from(self.data[i], mask_size, boxes[i])
+                for i in range(len(self.data))]
+    def get_boxes(self):
+        """Return the bounding boxes of masks."""
+        boxes = np.zeros((len(self.data), 4), 'float32')
+        for i, mask in enumerate(self.data):
+            if len(mask) == 0:
+                continue
+            xymin = np.array([float('inf'), float('inf')], 'float32')
+            xymax = np.zeros((2,), 'float32')
+            for p in mask:
+                coords = p.reshape((-1, 2)).astype('float32')
+                xymin = np.minimum(xymin, coords.min(0))
+                xymax = np.maximum(xymax, coords.max(0))
+            boxes[i, :2], boxes[i, 2:] = xymin, xymax
+        return boxes
+    def append(self, mask):
+        """Append a mask."""
+        assert isinstance(mask, list)
+        self.data.append(mask)
+        return self
+    def extend(self, masks):
+        """Append a set of masks."""
+        for mask in masks:
+            self.append(mask)
+        return self
+    def __getitem__(self, item):
+        if isinstance(item, slice):
+            return self.new_masks(self.data[item])
+        elif isinstance(item, np.ndarray):
+            return self.new_masks([self.data[i] for i in item.tolist()])
+        return self.new_masks([self.data[item]])
+    def __iadd__(self, masks):
+        if isinstance(masks, PolygonMasks):
+            self.data += masks.data
+            return self
+        return self.extend(masks)
+    def __len__(self):
+        return len(self.data)
--- a/seetadet/data/targets/__init__.py
+++ b/seetadet/data/targets/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/data/targets/rcnn.py
+++ b/seetadet/data/targets/rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+import numpy.random as npr
+from seetadet.core.config import cfg
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.bbox import distribute_boxes
+from seetadet.utils.bbox import filter_empty_boxes
+class ProposalTargets(object):
+    """Generate ground-truth targets for proposals."""
+    def __init__(self):
+        super(ProposalTargets, self).__init__()
+        self.num_classes = len(cfg.MODEL.CLASSES)
+        self.num_rois = cfg.FAST_RCNN.BATCH_SIZE
+        self.num_fg_rois = round(cfg.FAST_RCNN.POSITIVE_FRACTION * self.num_rois)
+        self.bbox_reg_weights = cfg.FAST_RCNN.BBOX_REG_WEIGHTS
+        self.bbox_reg_cls_agnostic = cfg.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC
+        self.mask_size = (cfg.MASK_RCNN.POOLER_RESOLUTION * 2,) * 2
+        self.lvl_min, self.lvl_max = cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL
+        self.assigner = MaxIoUAssigner(pos_iou_thr=cfg.FAST_RCNN.POSITIVE_OVERLAP,
+                                       neg_iou_thr=cfg.FAST_RCNN.NEGATIVE_OVERLAP,
+                                       match_low_quality=False)
+        self.defaults = {'rois': np.array([[-1, 0, 0, 1, 1]], 'float32'),
+                         'labels': np.array([-1], 'int64'),
+                         'bbox_targets': np.zeros((1, 4), 'float32'),
+                         'mask_targets': np.full((1,) + self.mask_size, -1, 'float32')}
+    def sample_rois(self, rois, gt_boxes):
+        """Match and sample positive and negative RoIs."""
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(rois[:, 1:5], gt_boxes)
+        fg_inds = np.where(labels > 0)[0]
+        bg_inds = np.where(labels == 0)[0]
+        # Include ground-truth boxes as foreground regions.
+        batch_inds = np.full((gt_boxes.shape[0], 1), rois[0, 0])
+        gt_inds = np.arange(len(rois), len(rois) + len(batch_inds))
+        fg_inds = np.concatenate((fg_inds, gt_inds))
+        rois = np.vstack((rois, np.hstack((batch_inds, gt_boxes[:, :4]))))
+        # Sample foreground regions without replacement.
+        num_fg_rois = int(min(self.num_fg_rois, fg_inds.size))
+        fg_inds = npr.choice(fg_inds, num_fg_rois, False)
+        # Sample background regions without replacement.
+        num_bg_rois = self.num_rois - num_fg_rois
+        num_bg_rois = min(num_bg_rois, bg_inds.size)
+        if bg_inds.size > 0:
+            bg_inds = npr.choice(bg_inds, num_bg_rois, False)
+        # Take values via sampled indices.
+        keep_inds = np.append(fg_inds, bg_inds)
+        rois = rois[keep_inds]
+        overlaps = bbox_overlaps(rois[:, 1:5], gt_boxes[:, :4])
+        gt_assignments = overlaps.argmax(axis=1)
+        labels = gt_boxes[gt_assignments, 4].astype('int64')
+        # Reassign background regions.
+        labels[num_fg_rois:] = 0
+        return rois, labels, gt_assignments
+    def distribute_blobs(self, blobs, lvls):
+        """Distribute blobs on given levels."""
+        outputs = collections.defaultdict(list)
+        lvl_inds = [np.where(lvls == (i + self.lvl_min))[0]
+                    for i in range(self.lvl_max - self.lvl_min + 1)]
+        for inds in lvl_inds:
+            for key, blob in blobs.items():
+                outputs[key].append(blob[inds] if len(inds) > 0
+                                    else self.defaults[key])
+        return outputs
+    def get_bbox_targets(self, rois, boxes):
+        return bbox_transform(rois, boxes, weights=self.bbox_reg_weights)
+    def get_mask_targets(self, rois, segms, inds):
+        targets = np.full((len(rois),) + self.mask_size, -1, 'float32')
+        masks = segms[inds].crop_and_resize(rois[inds], self.mask_size)
+        for i, j in enumerate(inds):
+            if masks[i] is not None:
+                targets[j] = masks[i]
+        return targets
+    def compute(self, **inputs):
+        """Compute proposal targets."""
+        blobs = collections.defaultdict(list)
+        all_rois = inputs['rois']
+        batch_inds = all_rois[:, 0].astype('int32')
+        # Compute targets per image.
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            # Select proposals of this image.
+            rois = all_rois[np.where(batch_inds == i)[0]]
+            # Filter empty RoIs.
+            rois[:, 1:5] = clip_boxes(rois[:, 1:5], inputs['im_info'][i][:2])
+            rois = rois[filter_empty_boxes(rois[:, 1:5])]
+            # Sample a batch of RoIs for training.
+            rois, labels, gt_assignments = self.sample_rois(rois, gt_boxes)
+            # Fill blobs.
+            blobs['rois'].append(rois)
+            blobs['labels'].append(labels)
+            blobs['bbox_targets'].append(self.get_bbox_targets(
+                rois[:, 1:5], gt_boxes[gt_assignments, :4]))
+            if 'gt_segms' in inputs:
+                fg_inds = np.where(labels > 0)[0]
+                segms = inputs['gt_segms'][i][gt_assignments]
+                targets = self.get_mask_targets(rois[:, 1:5], segms, fg_inds)
+                blobs['mask_targets'].append(targets)
+        # Concat to get the contiguous blobs.
+        blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
+        # Distribute blobs by the level of all ROIs.
+        lvls = distribute_boxes(blobs['rois'][:, 1:], self.lvl_min, self.lvl_max)
+        blobs = self.distribute_blobs(blobs, lvls)
+        # Add the targets using foreground ROIs only.
+        for lvl in range(self.lvl_max - self.lvl_min + 1):
+            inds = np.where(blobs['labels'][lvl] > 0)[0]
+            if len(inds) > 0:
+                blobs['fg_rois'].append(blobs['rois'][lvl][inds])
+                blobs['mask_labels'].append(blobs['labels'][lvl][inds] - 1)
+                if 'mask_targets' in blobs:
+                    blobs['mask_targets'][lvl] = blobs['mask_targets'][lvl][inds]
+            else:
+                blobs['fg_rois'].append(self.defaults['rois'])
+                blobs['mask_labels'].append(np.array([0], 'int64'))
+                if 'mask_targets' in blobs:
+                    blobs['mask_targets'][lvl] = self.defaults['mask_targets']
+        # Concat to get contiguous blobs along the levels.
+        rois, fg_rois = blobs['rois'], blobs['fg_rois']
+        blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
+        # Compute class-specific strides.
+        bbox_strides = np.arange(len(blobs['rois'])) * (self.num_classes - 1)
+        mask_strides = np.arange(len(blobs['fg_rois'])) * (self.num_classes - 1)
+        # Select the foreground RoIs for bbox targets.
+        fg_inds = np.where(blobs['labels'] > 0)[0]
+        if len(fg_inds) == 0:
+            # Sample a proposal randomly to avoid memory issue.
+            fg_inds = npr.randint(len(blobs['labels']), size=[1])
+        outputs = {
+            'rois': [to_tensor(rois[i]) for i in range(len(rois))],
+            'fg_rois': [to_tensor(fg_rois[i]) for i in range(len(fg_rois))],
+            'labels': to_tensor(blobs['labels']), 'proposals': np.concatenate(rois),
+            'bbox_inds': to_tensor(fg_inds if self.bbox_reg_cls_agnostic else
+                                   (bbox_strides[fg_inds] + (blobs['labels'][fg_inds] - 1))),
+            'mask_inds': to_tensor(mask_strides + blobs['mask_labels']),
+            'bbox_targets': to_tensor(blobs['bbox_targets'][fg_inds]),
+            'bbox_anchors': to_tensor(blobs['rois'][fg_inds, 1:]),
+        }
+        if 'mask_targets' in blobs:
+            outputs['mask_targets'] = to_tensor(blobs['mask_targets'])
+        return outputs
--- a/seetadet/data/targets/retinanet.py
+++ b/seetadet/data/targets/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.build import ANCHOR_SAMPLERS
+from seetadet.data.anchors.rpn import AnchorGenerator
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+@ANCHOR_SAMPLERS.register('retinanet')
+class AnchorTargets(object):
+    """Generate ground-truth targets for anchors."""
+    def __init__(self):
+        super(AnchorTargets, self).__init__()
+        self.generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
+            scales_per_octave=3)
+        self.assigner = MaxIoUAssigner(
+            pos_iou_thr=cfg.RETINANET.POSITIVE_OVERLAP,
+            neg_iou_thr=cfg.RETINANET.NEGATIVE_OVERLAP)
+        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
+        if cfg.BACKBONE.COARSEST_STRIDE > 0:
+            stride = float(cfg.BACKBONE.COARSEST_STRIDE)
+            max_size = int(np.ceil(max_size / stride) * stride)
+        self.generator.reset_grid(max_size)
+    def sample(self, gt_boxes):
+        """Sample positive and negative anchors."""
+        anchors = self.generator.grid_anchors
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(anchors, gt_boxes)
+        # Select foreground and ignored indices
+        # to avoid too many backgrounds.
+        # (~100x faster for 200k background indices)
+        return {'fg_inds': np.where(labels > 0)[0],
+                'bg_inds': np.where(labels < 0)[0]}
+    def compute(self, **inputs):
+        """Compute anchor targets."""
+        shapes = [x[:2] for x in inputs['grid_info']]
+        num_images = len(inputs['gt_boxes'])
+        num_anchors = self.generator.num_anchors(shapes)
+        blobs = collections.defaultdict(list)
+        # "1" is positive, "0" is negative, "-1" is don't care.
+        labels = np.zeros((num_images, num_anchors), 'int64')
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            fg_inds = inputs['fg_inds'][i]
+            ignore_inds = inputs['bg_inds'][i]
+            # Narrow anchors to match the feature layout.
+            ignore_inds = self.generator.narrow_anchors(shapes, ignore_inds)
+            fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
+            # Compute bbox targets.
+            gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
+            bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
+            blobs['bbox_anchors'].append(anchors)
+            blobs['bbox_targets'].append(bbox_targets)
+            # Compute label assignments.
+            labels[i, ignore_inds] = -1
+            labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
+            # Compute sparse indices.
+            fg_inds += i * num_anchors
+            blobs['bbox_inds'].extend([fg_inds])
+        return {
+            'labels': to_tensor(labels),
+            'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
+            'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
+            'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
+        }
--- a/seetadet/data/targets/rpn.py
+++ b/seetadet/data/targets/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Generate targets for RPN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+import numpy.random as npr
+from seetadet.core.config import cfg
+from seetadet.data.anchors.rpn import AnchorGenerator
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.data.build import ANCHOR_SAMPLERS
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+@ANCHOR_SAMPLERS.register(['faster_rcnn', 'mask_rcnn', 'cascade_rcnn'])
+class AnchorTargets(object):
+    """Generate ground-truth targets for anchors."""
+    def __init__(self):
+        super(AnchorTargets, self).__init__()
+        self.generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
+        self.assigner = MaxIoUAssigner(
+            pos_iou_thr=cfg.RPN.POSITIVE_OVERLAP,
+            neg_iou_thr=cfg.RPN.NEGATIVE_OVERLAP)
+        max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
+        if cfg.BACKBONE.COARSEST_STRIDE > 0:
+            stride = float(cfg.BACKBONE.COARSEST_STRIDE)
+            max_size = int(np.ceil(max_size / stride) * stride)
+        self.generator.reset_grid(max_size)
+    def sample(self, gt_boxes):
+        """Sample positive and negative anchors."""
+        anchors = self.generator.grid_anchors
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(anchors, gt_boxes)
+        fg_inds = np.where(labels > 0)[0]
+        bg_inds = np.where(labels == 0)[0]
+        # Sample sufficient negative labels.
+        num_bg = cfg.RPN.BATCH_SIZE * 8
+        if len(bg_inds) > num_bg:
+            bg_inds = npr.choice(bg_inds, num_bg, False)
+        # Select foreground and background indices.
+        return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
+    def compute(self, **inputs):
+        """Compute anchor targets."""
+        shapes = [x[:2] for x in inputs['grid_info']]
+        num_anchors = self.generator.num_anchors(shapes)
+        blobs = collections.defaultdict(list)
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            fg_inds = inputs['fg_inds'][i]
+            bg_inds = inputs['bg_inds'][i]
+            # Narrow anchors to match the feature layout.
+            bg_inds = self.generator.narrow_anchors(shapes, bg_inds)
+            fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
+            num_fg = int(cfg.RPN.POSITIVE_FRACTION * cfg.RPN.BATCH_SIZE)
+            if len(fg_inds) > num_fg:
+                keep = npr.choice(np.arange(len(fg_inds)), num_fg, False)
+                fg_inds, anchors = fg_inds[keep], anchors[keep]
+            # Sample negative labels if we have too many.
+            num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
+            if len(bg_inds) > num_bg:
+                bg_inds = npr.choice(bg_inds, num_bg, False)
+            # Compute bbox targets.
+            gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
+            bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
+            blobs['bbox_anchors'].append(anchors)
+            blobs['bbox_targets'].append(bbox_targets)
+            # Compute sparse indices.
+            fg_inds += i * num_anchors
+            bg_inds += i * num_anchors
+            blobs['cls_inds'] += [fg_inds, bg_inds]
+            blobs['bbox_inds'] += [fg_inds]
+            blobs['labels'] += [np.ones_like(fg_inds, 'float32'),
+                                np.zeros_like(bg_inds, 'float32')]
+        return {
+            'labels': to_tensor(np.hstack(blobs['labels'])),
+            'cls_inds': to_tensor(np.hstack(blobs['cls_inds'])),
+            'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
+            'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
+            'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
+        }
--- a/seetadet/data/targets/ssd.py
+++ b/seetadet/data/targets/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Generate targets for SSD head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.build import ANCHOR_SAMPLERS
+from seetadet.data.anchors.ssd import AnchorGenerator
+from seetadet.data.assigners import MaxIoUAssigner
+from seetadet.ops.normalization import to_tensor
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import bbox_transform
+@ANCHOR_SAMPLERS.register('ssd')
+class AnchorTargets(object):
+    """Generate ground-truth targets for anchors."""
+    def __init__(self):
+        super(AnchorTargets, self).__init__()
+        self.generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
+        self.assigner = MaxIoUAssigner(
+            pos_iou_thr=cfg.SSD.POSITIVE_OVERLAP,
+            neg_iou_thr=cfg.SSD.NEGATIVE_OVERLAP,
+            gt_max_assign_all=False)
+        self.neg_pos_ratio = (1.0 / cfg.SSD.POSITIVE_FRACTION) - 1.0
+        max_size = cfg.ANCHOR_GENERATOR.STRIDES[-1]
+        self.generator.reset_grid(max_size)
+    def sample(self, gt_boxes):
+        """Sample positive and negative anchors."""
+        anchors = self.generator.grid_anchors
+        # Assign ground-truth according to the IoU.
+        labels = self.assigner.assign(anchors, gt_boxes)
+        # Select positive and non-positive indices.
+        return {'fg_inds': np.where(labels > 0)[0],
+                'bg_inds': np.where(labels <= 0)[0]}
+    def compute(self, **inputs):
+        """Compute anchor targets."""
+        num_images = len(inputs['gt_boxes'])
+        num_anchors = self.generator.grid_anchors.shape[0]
+        cls_score = inputs['cls_score'].numpy().astype('float32')
+        blobs = collections.defaultdict(list)
+        # "1" is positive, "0" is negative, "-1" is don't care
+        labels = np.full((num_images, num_anchors,), -1, 'int64')
+        for i, gt_boxes in enumerate(inputs['gt_boxes']):
+            fg_inds = pos_inds = inputs['fg_inds'][i]
+            neg_inds = inputs['bg_inds'][i]
+            # Mining hard negatives as background.
+            num_pos, num_neg = len(pos_inds), len(neg_inds)
+            num_bg = min(int(num_pos * self.neg_pos_ratio), num_neg)
+            neg_score = cls_score[i, neg_inds, 0]
+            bg_inds = neg_inds[np.argsort(neg_score)][:num_bg]
+            # Compute bbox targets.
+            anchors = self.generator.grid_anchors[fg_inds]
+            gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
+            bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4],
+                                          weights=cfg.SSD.BBOX_REG_WEIGHTS)
+            blobs['bbox_anchors'].append(anchors)
+            blobs['bbox_targets'].append(bbox_targets)
+            # Compute label assignments.
+            labels[i, bg_inds] = 0
+            labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
+            # Compute sparse indices.
+            fg_inds += i * num_anchors
+            blobs['bbox_inds'].extend([fg_inds])
+        return {
+            'labels': to_tensor(labels),
+            'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
+            'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
+            'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
+        }
--- a/seetadet/data/transforms.py
+++ b/seetadet/data/transforms.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import numpy.random as npr
+from seetadet.core.config import cfg
+from seetadet.data.structures import PolygonMasks
+from seetadet.utils.bbox import bbox_overlaps
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.bbox import flip_boxes
+from seetadet.utils.image import im_resize
+from seetadet.utils.image import color_jitter
+class Transform(object):
+    """Base transform type."""
+    def init_params(self, params=None):
+        for k, v in (params or {}).items():
+            if k != 'self' and not k.startswith('_'):
+                setattr(self, k, v)
+    def filter_outputs(self, *outputs):
+        outputs = [x for x in outputs if x is not None]
+        return outputs if len(outputs) > 1 else outputs[0]
+class ParseBoxes(Transform):
+    """Parse the ground-truth boxes."""
+    def __init__(self):
+        super(ParseBoxes, self).__init__()
+        self.classes = cfg.MODEL.CLASSES
+        self.num_classes = len(self.classes)
+        self.class_indices = dict(zip(self.classes, range(self.num_classes)))
+        self.use_diff = cfg.TRAIN.USE_DIFF
+    def __call__(self, datum):
+        height, width = datum.height, datum.width
+        objects = list(filter(lambda obj: self.use_diff or
+                              not obj.get('difficult', 0), datum.objects))
+        boxes = np.empty((len(objects), 5), 'float32')
+        for i, obj in enumerate(objects):
+            boxes[i, :] = [max(0, obj['bbox'][0]),
+                           max(0, obj['bbox'][1]),
+                           min(obj['bbox'][2], width),
+                           min(obj['bbox'][3], height),
+                           self.class_indices[obj['name']]]
+        return boxes
+class ParseSegms(Transform):
+    """Parse the ground-truth segmentations."""
+    def __init__(self):
+        super(ParseSegms, self).__init__()
+        self.use_diff = cfg.TRAIN.USE_DIFF
+    def __call__(self, datum):
+        masks = PolygonMasks((datum.height, datum.width))
+        objects = filter(lambda obj: self.use_diff or
+                         not obj.get('difficult', 0), datum.objects)
+        masks += [obj.get('polygons', None) for obj in objects]
+        return masks
+class RandomFlip(Transform):
+    """Flip the image randomly."""
+    def __init__(self, prob=0.5):
+        super(RandomFlip, self).__init__()
+        self.prob = prob
+        self.is_flipped = False
+    def __call__(self, img, boxes=None, segms=None):
+        self.is_flipped = npr.rand() < self.prob
+        img = img[:, ::-1] if self.is_flipped else img
+        if self.is_flipped and boxes is not None:
+            boxes = flip_boxes(boxes, img.shape[1])
+        if self.is_flipped and segms is not None:
+            segms = segms.apply_flip()
+        return self.filter_outputs(img, boxes, segms)
+class ResizeWarp(Transform):
+    """Resize the image to a square size."""
+    def __init__(self, size):
+        super(ResizeWarp, self).__init__()
+        self.size = size
+        self.im_scale = (1.0, 1.0)
+    def __call__(self, img, boxes=None):
+        self.im_scale = (float(self.size) / float(img.shape[0]),
+                         float(self.size) / float(img.shape[1]))
+        img = im_resize(img, size=self.size)
+        if boxes is not None:
+            boxes[:, (0, 2)] = boxes[:, (0, 2)] * self.im_scale[1]
+            boxes[:, (1, 3)] = boxes[:, (1, 3)] * self.im_scale[0]
+        return self.filter_outputs(img, boxes)
+class RandomResize(Transform):
+    """Resize the image randomly."""
+    def __init__(self, scales=(640,), scales_range=(1.0, 1.0), max_size=1066):
+        super(RandomResize, self).__init__()
+        self.scales = scales
+        self.scales_range = scales_range
+        self.max_size = max_size
+        self.im_scale = 1.0
+        self.scale_jitter = 1.0
+    def __call__(self, img, boxes=None, segms=None):
+        im_shape = img.shape
+        target_size = npr.choice(self.scales)
+        # Scale along the shortest side.
+        max_size = max(self.max_size, target_size)
+        im_size_min = np.min(im_shape[:2])
+        im_size_max = np.max(im_shape[:2])
+        self.im_scale = float(target_size) / float(im_size_min)
+        # Prevent the biggest axis from being more than *MAX_SIZE*.
+        if np.round(self.im_scale * im_size_max) > max_size:
+            self.im_scale = float(max_size) / float(im_size_max)
+        # Apply random scaling to get a range of dynamic scales.
+        self.scale_jitter = npr.uniform(*self.scales_range)
+        self.im_scale *= self.scale_jitter
+        img = im_resize(img, scale=self.im_scale)
+        if boxes is not None:
+            boxes[:, :4] *= self.im_scale
+        if segms is not None:
+            segms.apply_resize(scale=self.im_scale)
+        return self.filter_outputs(img, boxes, segms)
+class RandomPaste(Transform):
+    """Copy image into a larger canvas randomly."""
+    def __init__(self, prob=0.5):
+        self.ratio = 1. / cfg.TRAIN.SCALES_RANGE[0]
+        self.prob = prob if self.ratio > 1 else 0
+        self.pixel_mean = cfg.MODEL.PIXEL_MEAN
+    def __call__(self, img, boxes):
+        if npr.rand() > self.prob:
+            return img, boxes
+        im_shape = list(img.shape)
+        h, w = im_shape[:2]
+        ratio = npr.uniform(1., self.ratio)
+        out_h, out_w = int(h * ratio), int(w * ratio)
+        y1 = int(np.floor(npr.uniform(0., out_h - h)))
+        x1 = int(np.floor(npr.uniform(0., out_w - w)))
+        im_shape[:2] = (out_h, out_w)
+        out_img = np.empty(im_shape, img.dtype)
+        out_img[:] = self.pixel_mean
+        out_img[y1:y1 + h, x1:x1 + w, :] = img
+        out_boxes = boxes.astype(boxes.dtype, copy=True)
+        out_boxes[:, (0, 2)] = (boxes[:, (0, 2)] * w + x1) / out_w
+        out_boxes[:, (1, 3)] = (boxes[:, (1, 3)] * h + y1) / out_h
+        return out_img, out_boxes
+class RandomCrop(Transform):
+    """Crop the image randomly."""
+    def __init__(self, crop_size=512):
+        super(RandomCrop, self).__init__()
+        self.crop_size = crop_size
+        self.pixel_mean = cfg.MODEL.PIXEL_MEAN
+    def __call__(self, img, boxes=None, segms=None):
+        if self.crop_size <= 0:
+            return self.filter_outputs(img, boxes, segms)
+        im_shape = list(img.shape)
+        h, w = im_shape[:2]
+        out_h, out_w = (self.crop_size,) * 2
+        y = npr.randint(max(h - out_h, 0) + 1)
+        x = npr.randint(max(w - out_w, 0) + 1)
+        im_shape[:2] = (out_h, out_w)
+        out_img = np.empty(im_shape, img.dtype)
+        out_img[:] = self.pixel_mean
+        out_img[:h, :w] = img[y:y + out_h, x:x + out_w]
+        img = out_img
+        if boxes is not None:
+            boxes[:, (0, 2)] -= x
+            boxes[:, (1, 3)] -= y
+        if segms is not None:
+            segms.apply_crop((x, y, x + out_w, y + out_h))
+        return self.filter_outputs(img, boxes, segms)
+class ColorJitter(Transform):
+    """Distort the brightness, contrast and color of image."""
+    def __init__(self, prob=0.5):
+        super(ColorJitter, self).__init__()
+        self.prob = prob
+        self.brightness_range = (0.875, 1.125)
+        self.contrast_range = (0.5, 1.5)
+        self.saturation_range = (0.5, 1.5)
+    def __call__(self, img):
+        brightness = contrast = saturation = None
+        if npr.rand() < self.prob:
+            brightness = self.brightness_range
+        if npr.rand() < self.prob:
+            contrast = self.contrast_range
+        if npr.rand() < self.prob:
+            saturation = self.saturation_range
+        return color_jitter(img, brightness=brightness,
+                            contrast=contrast, saturation=saturation)
+class RandomBBoxCrop(Transform):
+    """Crop image by sampling a region restricted by bounding boxes."""
+    def __init__(self, scales_range=(0.3, 1.0), aspect_ratios_range=(0.5, 2.0),
+                 overlaps=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9)):
+        self.samplers = [{}]
+        for ov in overlaps:
+            self.samplers.append({
+                'scales_range': scales_range,
+                'aspect_ratios_range': aspect_ratios_range,
+                'overlaps_range': (ov, 1.0), 'max_trials': 10})
+    @staticmethod
+    def generate_sample(param):
+        scales_range = param.get('scales_range', (1.0, 1.0))
+        aspect_ratios_range = param.get('aspect_ratios_range', (1.0, 1.0))
+        scale = npr.uniform(scales_range[0], scales_range[1])
+        min_aspect_ratio = max(aspect_ratios_range[0], scale**2)
+        max_aspect_ratio = min(aspect_ratios_range[1], 1. / (scale**2))
+        aspect_ratio = npr.uniform(min_aspect_ratio, max_aspect_ratio)
+        bbox_w = scale * (aspect_ratio ** 0.5)
+        bbox_h = scale / (aspect_ratio ** 0.5)
+        w_off = npr.uniform(0., 1. - bbox_w)
+        h_off = npr.uniform(0., 1. - bbox_h)
+        return np.array([w_off, h_off, w_off + bbox_w, h_off + bbox_h])
+    @staticmethod
+    def check_center(sample_box, boxes):
+        x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
+        y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
+        keep = np.where((x_ctr >= sample_box[0]) & (x_ctr <= sample_box[2]) &
+                        (y_ctr >= sample_box[1]) & (y_ctr <= sample_box[3]))[0]
+        return len(keep) > 0
+    @staticmethod
+    def check_overlap(sample_box, boxes, param):
+        ov_range = param.get('overlaps_range', (0.0, 1.0))
+        if ov_range[0] == 0.0 and ov_range[1] == 1.0:
+            return True
+        ovmax = bbox_overlaps(sample_box[None, :], boxes[:, :4]).max()
+        if ovmax < ov_range[0] or ovmax > ov_range[1]:
+            return False
+        return True
+    def generate_samples(self, boxes):
+        crop_boxes = []
+        for sampler in self.samplers:
+            for _ in range(sampler.get('max_trials', 1)):
+                crop_box = self.generate_sample(sampler)
+                if not self.check_overlap(crop_box, boxes, sampler):
+                    continue
+                if not self.check_center(crop_box, boxes):
+                    continue
+                crop_boxes.append(crop_box)
+                break
+        return crop_boxes
+    @classmethod
+    def crop(cls, img, crop_box, boxes=None):
+        h, w = img.shape[:2]
+        w_offset = int(crop_box[0] * w)
+        h_offset = int(crop_box[1] * h)
+        crop_w = int((crop_box[2] - crop_box[0]) * w)
+        crop_h = int((crop_box[3] - crop_box[1]) * h)
+        img = img[h_offset:h_offset + crop_h, w_offset:w_offset + crop_w]
+        if boxes is not None:
+            x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
+            y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
+            keep = np.where((x_ctr >= crop_box[0]) & (x_ctr <= crop_box[2]) &
+                            (y_ctr >= crop_box[1]) & (y_ctr <= crop_box[3]))[0]
+            boxes = boxes[keep]
+            boxes[:, (0, 2)] = boxes[:, (0, 2)] * w - w_offset
+            boxes[:, (1, 3)] = boxes[:, (1, 3)] * h - h_offset
+            boxes = clip_boxes(boxes, (crop_h, crop_w))
+            boxes[:, (0, 2)] /= crop_w
+            boxes[:, (1, 3)] /= crop_h
+        return img, boxes
+    def __call__(self, img, boxes):
+        crop_boxes = self.generate_samples(boxes)
+        if len(crop_boxes) > 0:
+            crop_box = crop_boxes[npr.randint(len(crop_boxes))]
+            img, boxes = self.crop(img, crop_box, boxes)
+        return img, boxes
--- a/seetadet/models/__init__.py
+++ b/seetadet/models/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Models."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.models import backbones
+from seetadet.models import decoders
+from seetadet.models import dense_heads
+from seetadet.models import detectors
+from seetadet.models import necks
+from seetadet.models import roi_heads
--- a/seetadet/models/backbones/__init__.py
+++ b/seetadet/models/backbones/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Backbones."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# Modules
+from seetadet.models.backbones import mobilenet_v2
+from seetadet.models.backbones import mobilenet_v3
+from seetadet.models.backbones import resnet
+from seetadet.models.backbones import vgg
--- a/seetadet/models/backbones/mobilenet_v2.py
+++ b/seetadet/models/backbones/mobilenet_v2.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""MobileNetV2 backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.conv import ConvNorm2d
+def make_divisible(v, divisor=8):
+    """Return the divisible value."""
+    min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+class InvertedResidual(nn.Module):
+    """Invert residual block."""
+    def __init__(self, dim_in, dim_out, kernel_size=3, stride=1, expand_ratio=6):
+        super(InvertedResidual, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type='ReLU6')
+        self.has_endpoint = stride == 2
+        self.apply_shortcut = stride == 1 and dim_in == dim_out
+        self.dim = dim = int(round(dim_in * expand_ratio))
+        self.conv1 = (conv_module(dim_in, dim, 1)
+                      if expand_ratio > 1 else nn.Identity())
+        self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
+        self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
+    def forward(self, x):
+        shortcut = x
+        x = self.conv1(x)
+        if self.has_endpoint:
+            self.endpoint = x
+        x = self.conv2(x)
+        x = self.conv3(x)
+        if self.apply_shortcut:
+            return x.add_(shortcut)
+        return x
+class MobileNetV2(nn.Module):
+    """MobileNetV2 class."""
+    def __init__(self, depths, dims, strides, expand_ratios, width_mult=1.0):
+        super(MobileNetV2, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type='ReLU6')
+        dims = list(map(lambda x: make_divisible(x * width_mult), dims))
+        self.conv1 = conv_module(3, dims[0], 3, 2)
+        dim_in, blocks = dims[0], []
+        self.out_indices, self.out_dims = [], []
+        for i, (depth, dim) in enumerate(zip(depths, dims[1:-1])):
+            for j in range(depth):
+                stride = strides[i] if j == 0 else 1
+                blocks.append(InvertedResidual(
+                    dim_in, dim, stride=stride,
+                    expand_ratio=expand_ratios[i]))
+                if blocks[-1].has_endpoint:
+                    self.out_indices.append(len(blocks) - 1)
+                    self.out_dims.append(blocks[-1].dim)
+                dim_in = dim
+            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
+        self.conv2 = conv_module(dim_in, dims[-1], 1)
+        self.blocks = blocks + [self.conv2]
+        self.out_dims.append(dims[-1])
+        self.out_indices.append(len(self.blocks) - 1)
+    def forward(self, x):
+        x = self.conv1(x)
+        outputs = []
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(blk.__dict__.pop('endpoint', x))
+        return outputs
+BACKBONES.register(
+    'mobilenet_v2', MobileNetV2,
+    dims=(32,) + (16, 24, 32, 64, 96, 160, 320) + (1280,),
+    depths=(1, 2, 3, 4, 3, 3, 1),
+    strides=(1, 2, 2, 2, 1, 2, 1),
+    expand_ratios=(1, 6, 6, 6, 6, 6, 6))
--- a/seetadet/models/backbones/mobilenet_v3.py
+++ b/seetadet/models/backbones/mobilenet_v3.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""MobileNetV3 backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.conv import ConvNorm2d
+def make_divisible(v, divisor=8):
+    """Return the divisible value."""
+    min_value = divisor
+    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
+    if new_v < 0.9 * v:
+        new_v += divisor
+    return new_v
+class SqueezeExcite(nn.Module):
+    """Squeeze-and-Excitation block."""
+    def __init__(self, dim_in, dim):
+        super(SqueezeExcite, self).__init__()
+        self.conv1 = nn.Conv2d(dim_in, dim, 1)
+        self.conv2 = nn.Conv2d(dim, dim_in, 1)
+        self.activation1 = nn.ReLU(True)
+        self.activation2 = nn.Hardsigmoid(True)
+    def forward(self, x):
+        scale = x.mean((2, 3), keepdim=True)
+        scale = self.activation1(self.conv1(scale))
+        scale = self.activation2(self.conv2(scale))
+        return x * scale
+class InvertedResidual(nn.Module):
+    """Invert residual block."""
+    def __init__(
+        self,
+        dim_in,
+        dim_out,
+        kernel_size=3,
+        stride=1,
+        expand_ratio=3,
+        squeeze_ratio=1,
+        activation_type='ReLU',
+    ):
+        super(InvertedResidual, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type=activation_type)
+        self.has_endpoint = stride == 2
+        self.apply_shortcut = stride == 1 and dim_in == dim_out
+        self.dim = dim = int(round(dim_in * expand_ratio))
+        self.conv1 = (conv_module(dim_in, dim, 1)
+                      if expand_ratio > 1 else nn.Identity())
+        self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
+        self.se = (SqueezeExcite(dim, make_divisible(dim * squeeze_ratio))
+                   if squeeze_ratio < 1 else nn.Identity())
+        self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
+    def forward(self, x):
+        shortcut = x
+        x = self.conv1(x)
+        if self.has_endpoint:
+            self.endpoint = x
+        x = self.conv2(x)
+        x = self.se(x)
+        x = self.conv3(x)
+        if self.apply_shortcut:
+            return x.add_(shortcut)
+        return x
+class MobileNetV3(nn.Module):
+    """MobileNetV3 class."""
+    def __init__(self, depths, dims, kernel_sizes, strides,
+                 expand_ratios, squeeze_ratios, width_mult=1.0):
+        super(MobileNetV3, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
+            activation_type='Hardswish')
+        dims = list(map(lambda x: make_divisible(x * width_mult), dims))
+        self.conv1 = conv_module(3, dims[0], 3, 2)
+        dim_in, blocks, coarsest_stride = dims[0], [], 2
+        self.out_indices, self.out_dims = [], []
+        for i, (depth, dim) in enumerate(zip(depths, dims[1:])):
+            coarsest_stride *= strides[i]
+            layer_expand_ratios = expand_ratios[i]
+            if not isinstance(layer_expand_ratios, (tuple, list)):
+                layer_expand_ratios = [layer_expand_ratios]
+            layer_expand_ratios = list(layer_expand_ratios)
+            layer_expand_ratios += ([layer_expand_ratios[-1]] *
+                                    (depth - len(layer_expand_ratios)))
+            for j in range(depth):
+                blocks.append(InvertedResidual(
+                    dim_in, dim,
+                    kernel_size=kernel_sizes[i],
+                    stride=strides[i] if j == 0 else 1,
+                    expand_ratio=layer_expand_ratios[j],
+                    squeeze_ratio=squeeze_ratios[i],
+                    activation_type='Hardswish'
+                    if coarsest_stride >= 16 else 'ReLU'))
+                if blocks[-1].has_endpoint:
+                    self.out_indices.append(len(blocks) - 1)
+                    self.out_dims.append(blocks[-1].dim)
+                dim_in = dim
+            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
+        self.conv2 = conv_module(dim_in, blocks[-1].dim, 1)
+        self.blocks = blocks + [self.conv2]
+        self.out_dims.append(blocks[-1].dim)
+        self.out_indices.append(len(self.blocks) - 1)
+    def forward(self, x):
+        x = self.conv1(x)
+        outputs = []
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(blk.__dict__.pop('endpoint', x))
+        return outputs
+BACKBONES.register(
+    'mobilenet_v3_large', MobileNetV3,
+    dims=(16,) + (16, 24, 40, 80, 112, 160),
+    depths=(1, 2, 3, 4, 2, 3),
+    kernel_sizes=(3, 3, 5, 3, 3, 5),
+    strides=(1, 2, 2, 2, 1, 2),
+    expand_ratios=(1, (4, 3), 3, (6, 2.5, 2.3, 2.3), 6, 6),
+    squeeze_ratios=(1, 1, 0.25, 1, 0.25, 0.25))
+BACKBONES.register(
+    'mobilenet_v3_small', MobileNetV3,
+    dims=(16,) + (16, 24, 40, 48, 96),
+    depths=(1, 2, 3, 2, 3),
+    kernel_sizes=(3, 3, 5, 5, 5),
+    strides=(2, 2, 2, 1, 2),
+    expand_ratios=(1, (4.5, 88. / 24), (4, 6, 6), 3, 6),
+    squeeze_ratios=(0.25, 1, 0.25, 0.25, 0.25))
--- a/seetadet/models/backbones/resnet.py
+++ b/seetadet/models/backbones/resnet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""ResNet backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import itertools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.core.engine.utils import freeze_module
+from seetadet.models.build import BACKBONES
+from seetadet.ops.build import build_norm
+class BasicBlock(nn.Module):
+    """The basic resnet block."""
+    expansion = 1
+    def __init__(self, dim_in, dim, stride=1, downsample=None):
+        super(BasicBlock, self).__init__()
+        self.conv1 = nn.Conv2d(dim_in, dim, 3, stride, padding=1, bias=False)
+        self.bn1 = build_norm(dim, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.conv2 = nn.Conv2d(dim, dim, 3, padding=1, bias=False)
+        self.bn2 = build_norm(dim, cfg.BACKBONE.NORM)
+        self.downsample = downsample
+    def forward(self, x):
+        shortcut = x
+        x = self.relu(self.bn1(self.conv1(x)))
+        x = self.bn2(self.conv2(x))
+        if self.downsample is not None:
+            shortcut = self.downsample(shortcut)
+        return self.relu(x.add_(shortcut))
+class Bottleneck(nn.Module):
+    """The bottleneck resnet block."""
+    expansion = 4
+    groups, width_per_group = 1, 64
+    def __init__(self, dim_in, dim, stride=1, downsample=None):
+        super(Bottleneck, self).__init__()
+        width = int(dim * (self.width_per_group / 64.)) * self.groups
+        self.conv1 = nn.Conv2d(dim_in, width, 1, bias=False)
+        self.bn1 = build_norm(width, cfg.BACKBONE.NORM)
+        self.conv2 = nn.Conv2d(width, width, 3, stride, padding=1, bias=False)
+        self.bn2 = build_norm(width, cfg.BACKBONE.NORM)
+        self.conv3 = nn.Conv2d(width, dim * self.expansion, 1, bias=False)
+        self.bn3 = build_norm(dim * self.expansion, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.downsample = downsample
+    def forward(self, x):
+        shortcut = x
+        x = self.relu(self.bn1(self.conv1(x)))
+        x = self.relu(self.bn2(self.conv2(x)))
+        x = self.bn3(self.conv3(x))
+        if self.downsample is not None:
+            shortcut = self.downsample(shortcut)
+        return self.relu(x.add_(shortcut))
+class ResNet(nn.Module):
+    """ResNet class."""
+    def __init__(self, block, depths, stride_in_1x1=False):
+        super(ResNet, self).__init__()
+        dim_in, dims, blocks = 64, [64, 128, 256, 512], []
+        self.out_indices = [v - 1 for v in itertools.accumulate(depths)]
+        self.out_dims = [dim_in] + [v * block.expansion for v in dims]
+        self.conv1 = nn.Conv2d(3, dim_in, 7, 2, padding=3, bias=False)
+        self.bn1 = build_norm(dim_in, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.maxpool = nn.MaxPool2d(3, 2, padding=1)
+        for i, depth, dim in zip(range(4), depths, dims):
+            downsample, stride = None, 1 if i == 0 else 2
+            if stride != 1 or dim_in != dim * block.expansion:
+                downsample = nn.Sequential(
+                    nn.Conv2d(dim_in, dim * block.expansion, 1, stride, bias=False),
+                    build_norm(dim * block.expansion, cfg.BACKBONE.NORM))
+            blocks.append(block(dim_in, dim, stride, downsample))
+            if isinstance(blocks[-1], Bottleneck) and stride_in_1x1:
+                blocks[-1].conv1.stride = (stride, stride)
+                blocks[-1].conv2.stride = (1, 1)
+            dim_in = dim * block.expansion
+            for _ in range(depth - 1):
+                blocks.append(block(dim_in, dim))
+            setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
+        self.blocks = blocks
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
+        num_freeze_stages = cfg.BACKBONE.FREEZE_AT
+        if num_freeze_stages > 0:
+            self.conv1.apply(freeze_module)
+            self.bn1.apply(freeze_module)
+        for i in range(num_freeze_stages - 1, 0, -1):
+            getattr(self, 'layer%d' % i).apply(freeze_module)
+    def forward(self, x):
+        x = self.relu(self.bn1(self.conv1(x)))
+        x = self.maxpool(x)
+        outputs = [None]
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(x)
+        return outputs
+class ResNetV1a(ResNet):
+    """ResNet with stride in bottleneck 1x1 convolution."""
+    def __init__(self, block, depths):
+        super(ResNetV1a, self).__init__(block, depths, stride_in_1x1=True)
+BACKBONES.register('resnet18', ResNet, block=BasicBlock, depths=[2, 2, 2, 2])
+BACKBONES.register('resnet34', ResNet, block=BasicBlock, depths=[3, 4, 6, 3])
+BACKBONES.register('resnet50', ResNet, block=Bottleneck, depths=[3, 4, 6, 3])
+BACKBONES.register('resnet101', ResNet, block=Bottleneck, depths=[3, 4, 23, 3])
+BACKBONES.register('resnet50_v1a', ResNetV1a, block=Bottleneck, depths=[3, 4, 6, 3])
+BACKBONES.register('resnet101_v1a', ResNetV1a, block=Bottleneck, depths=[3, 4, 23, 3])
--- a/seetadet/models/backbones/vgg.py
+++ b/seetadet/models/backbones/vgg.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""VGGNet backbone."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import itertools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import BACKBONES
+from seetadet.ops.build import build_norm
+from seetadet.ops.normalization import L2Norm
+class VGGBlock(nn.Module):
+    """The VGG block."""
+    def __init__(self, dim_in, dim, downsample=None):
+        super(VGGBlock, self).__init__()
+        self.conv = nn.Conv2d(dim_in, dim, 3, padding=1,
+                              bias=not cfg.BACKBONE.NORM)
+        self.bn = build_norm(dim, cfg.BACKBONE.NORM)
+        self.relu = nn.ReLU(True)
+        self.downsample = downsample
+    def forward(self, x):
+        if self.downsample is not None:
+            x = self.downsample(x)
+        return self.relu(self.bn(self.conv(x)))
+class VGG(nn.Module):
+    """VGGNet."""
+    def __init__(self, depths):
+        super(VGG, self).__init__()
+        dim_in, dims, blocks = 3, [64, 128, 256, 512, 512], []
+        self.out_indices = [v - 1 for v in itertools.accumulate(depths)][1:]
+        self.out_dims = dims[1:]
+        for i, (depth, dim) in enumerate(zip(depths, dims)):
+            downsample = nn.MaxPool2d(2, 2, ceil_mode=True) if i > 0 else None
+            blocks.append(VGGBlock(dim_in, dim, downsample))
+            for _ in range(depth - 1):
+                blocks.append(VGGBlock(dim, dim))
+            setattr(self, 'layer%d' % i, nn.Sequential(*blocks[-depth:]))
+            dim_in = dim
+        self.blocks = blocks
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
+    def forward(self, x):
+        outputs = []
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i in self.out_indices:
+                outputs.append(x)
+        return outputs
+class VGGFCN(VGG):
+    """Fully convolutional VGGNet in SSD."""
+    def __init__(self, depths):
+        super(VGGFCN, self).__init__(depths)
+        dim_in, out_index = self.out_dims[-1], self.out_indices[-1]
+        self.blocks.append(nn.Sequential(
+            nn.MaxPool2d(3, padding=1),
+            nn.Conv2d(dim_in, 1024, 3, padding=6, dilation=6),
+            nn.ReLU(True)))
+        self.blocks.append(nn.Sequential(nn.Conv2d(1024, 1024, 1), nn.ReLU(True)))
+        self.layer4.add_module(str(len(self.layer4)), self.blocks[-2])
+        self.layer4.add_module(str(len(self.layer4)), self.blocks[-1])
+        self.out_dims = [self.out_dims[-2], 1024]  # conv4_3, fc7
+        self.out_indices = [self.out_indices[-2], out_index + 2]  # 9, 14
+        self.norm = L2Norm(dim_in, init=20.0)
+    def forward(self, x):
+        outputs = super(VGGFCN, self).forward(x)
+        outputs[0] = self.norm(outputs[0])
+        return outputs
+BACKBONES.register('vgg16', VGG, depths=(2, 2, 3, 3, 3))
+BACKBONES.register('vgg16_fcn', VGGFCN, depths=(2, 2, 3, 3, 3))
--- a/seetadet/models/build.py
+++ b/seetadet/models/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for models."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.core.registry import Registry
+from seetadet.core.engine.utils import get_device
+BACKBONES = Registry('backbones')
+NECKS = Registry('necks')
+DETECTORS = Registry('detectors')
+def build_backbone():
+    """Build the backbone."""
+    backbone_types = cfg.BACKBONE.TYPE.split('.')
+    backbone = BACKBONES.get(backbone_types[0])()
+    backbone_dims = backbone.out_dims
+    neck = nn.Identity()
+    if len(backbone_types) > 1:
+        neck = NECKS.get(backbone_types[1])(backbone_dims)
+    else:
+        neck.out_dims = backbone_dims
+    return backbone, neck
+def build_detector(device=None, weights=None, training=False):
+    """Create a detector instance.
+    Parameters
+    ----------
+    device : int, optional
+        The index of compute device.
+    weights : str, optional
+        The path of weight file.
+    training : bool, optional, default=False
+        Return a training detector or not.
+    """
+    model = DETECTORS.get(cfg.MODEL.TYPE)()
+    if model is None:
+        raise ValueError('Unknown detector: ' + cfg.MODEL.TYPE)
+    if weights is not None:
+        model.load_weights(weights, strict=True)
+    if device is not None:
+        model.to(device=get_device(device))
+    if not training:
+        model.eval()
+        model.optimize_for_inference()
+    return model
--- a/seetadet/models/decoders/__init__.py
+++ b/seetadet/models/decoders/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/models/decoders/retinanet.py
+++ b/seetadet/models/decoders/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet decoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import autograd
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.anchors.rpn import AnchorGenerator
+class RetinaNetDecoder(nn.Module):
+    """Decode predictions from retinanet."""
+    def __init__(self):
+        super(RetinaNetDecoder, self).__init__()
+        self.anchor_generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
+            scales_per_octave=3)
+        self.pre_nms_topk = cfg.RETINANET.PRE_NMS_TOPK
+        self.score_thresh = float(cfg.TEST.SCORE_THRESH)
+    def forward(self, inputs):
+        input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
+        return autograd.Function.apply(
+            'RetinaNetDecoder',
+            inputs['cls_score'].device,
+            inputs=[inputs[k] for k in input_tags],
+            strides=self.anchor_generator.strides,
+            ratios=self.anchor_generator.aspect_ratios[0],
+            scales=self.anchor_generator.scales[0],
+            pre_nms_topk=self.pre_nms_topk,
+            score_thresh=self.score_thresh,
+        )
+    autograd.Function.register(
+        'RetinaNetDecoder', lambda **kwargs: {
+            'strides': kwargs.get('strides', []),
+            'ratios': kwargs.get('ratios', []),
+            'scales': kwargs.get('scales', []),
+            'pre_nms_topk': kwargs.get('pre_nms_topk', 1000),
+            'score_thresh': kwargs.get('score_thresh', 0.05),
+            'check_device': False,
+        })
--- a/seetadet/models/decoders/rpn.py
+++ b/seetadet/models/decoders/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RPN decoder."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import autograd
+from dragon.vm.torch import nn
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.anchors.rpn import AnchorGenerator
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.bbox import filter_empty_boxes
+from seetadet.utils.nms import gpu_nms
+class RPNDecoder(nn.Module):
+    """Generate proposal regions from RPN."""
+    def __init__(self):
+        super(RPNDecoder, self).__init__()
+        self.anchor_generator = AnchorGenerator(
+            strides=cfg.ANCHOR_GENERATOR.STRIDES,
+            sizes=cfg.ANCHOR_GENERATOR.SIZES,
+            aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
+        self.min_level = cfg.FAST_RCNN.MIN_LEVEL
+        self.max_level = cfg.FAST_RCNN.MAX_LEVEL
+        self.pre_nms_topk = {True: cfg.RPN.PRE_NMS_TOPK_TRAIN,
+                             False: cfg.RPN.PRE_NMS_TOPK_TEST}
+        self.post_nms_topk = {True: cfg.RPN.POST_NMS_TOPK_TRAIN,
+                              False: cfg.RPN.POST_NMS_TOPK_TEST}
+        self.nms_thresh = float(cfg.RPN.NMS_THRESH)
+    def decode_proposals(self, scores, deltas, anchors, im_info):
+        # Select top-K anchors.
+        pre_nms_topk = self.pre_nms_topk[self.training]
+        if pre_nms_topk <= 0 or pre_nms_topk >= len(scores):
+            order = np.argsort(-scores.squeeze())
+        else:
+            inds = np.argpartition(-scores.squeeze(), pre_nms_topk)[:pre_nms_topk]
+            order = np.argsort(-scores[inds].squeeze())
+            order = inds[order]
+        scores, deltas, anchors = scores[order], deltas[order], anchors[order]
+        # Convert anchors into proposals.
+        proposals = bbox_transform_inv(anchors, deltas)
+        proposals = clip_boxes(proposals, im_info[:2])
+        keep = filter_empty_boxes(proposals)
+        if len(proposals) != len(keep):
+            proposals, scores = proposals[keep], scores[keep]
+        # Apply NMS.
+        proposals = np.hstack((proposals, scores))
+        keep = gpu_nms(proposals, self.nms_thresh)
+        return proposals[keep, :].astype('float32', copy=False)
+    def forward_train(self, inputs):
+        shapes = [x[:2] for x in inputs['grid_info']]
+        anchors = self.anchor_generator.get_anchors(shapes)
+        cls_score = inputs['cls_score'].numpy()
+        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1).numpy()
+        all_rois, batch_size = [], cls_score.shape[0]
+        lvl_slices, lvl_start = [], 0
+        post_nms_topk = self.post_nms_topk[self.training]
+        for shape in shapes:
+            num_anchors = self.anchor_generator.num_anchors([shape])
+            lvl_slices.append(slice(lvl_start, lvl_start + num_anchors))
+            lvl_start = lvl_start + num_anchors
+        for batch_ind in range(batch_size):
+            scores = cls_score[batch_ind].reshape((-1, 1))
+            deltas = bbox_pred[batch_ind]
+            im_info = inputs['im_info'][batch_ind]
+            all_proposals = []
+            for lvl_slice in lvl_slices:
+                all_proposals.append(self.decode_proposals(
+                    scores[lvl_slice], deltas[lvl_slice],
+                    anchors[lvl_slice], im_info))
+            proposals = np.concatenate(all_proposals)
+            proposals, scores = proposals[:, :4], proposals[:, -1]
+            if post_nms_topk > 0:
+                keep = np.argsort(-scores)[:post_nms_topk]
+                proposals = proposals[keep, :]
+            batch_inds = np.full((proposals.shape[0], 1), batch_ind, 'float32')
+            all_rois.append(np.hstack((batch_inds, proposals)))
+        return np.concatenate(all_rois)
+    def forward(self, inputs):
+        if self.training:
+            return self.forward_train(inputs)
+        input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
+        return autograd.Function.apply(
+            'RPNDecoder',
+            inputs['cls_score'].device,
+            inputs=[inputs[k] for k in input_tags],
+            outputs=[None] * (self.max_level - self.min_level + 1),
+            strides=self.anchor_generator.strides,
+            ratios=self.anchor_generator.aspect_ratios[0],
+            scales=self.anchor_generator.scales[0],
+            min_level=self.min_level,
+            max_level=self.max_level,
+            pre_nms_topk=self.pre_nms_topk[False],
+            post_nms_topk=self.post_nms_topk[False],
+            nms_thresh=self.nms_thresh,
+        )
+    autograd.Function.register(
+        'RPNDecoder', lambda **kwargs: {
+            'strides': kwargs.get('strides', []),
+            'ratios': kwargs.get('ratios', []),
+            'scales': kwargs.get('scales', []),
+            'pre_nms_topk': kwargs.get('pre_nms_topk', 1000),
+            'post_nms_topk': kwargs.get('post_nms_topk', 1000),
+            'nms_thresh': kwargs.get('nms_thresh', 0.7),
+            'min_level': kwargs.get('min_level', 2),
+            'max_level': kwargs.get('max_level', 5),
+            'check_device': False,
+        })
--- a/seetadet/models/dense_heads/__init__.py
+++ b/seetadet/models/dense_heads/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/models/dense_heads/retinanet.py
+++ b/seetadet/models/dense_heads/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+import math
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.targets.retinanet import AnchorTargets
+from seetadet.ops.build import build_activation
+from seetadet.ops.build import build_loss
+from seetadet.ops.build import build_norm
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.fusion import fuse_conv_bn
+class RetinaNetHead(nn.Module):
+    """RetinaNet head."""
+    def __init__(self, in_dims):
+        super(RetinaNetHead, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, dim_in=in_dims[0], dim_out=in_dims[0],
+            kernel_size=3, conv_type=cfg.RETINANET.CONV)
+        norm_module = functools.partial(build_norm, norm_type=cfg.RETINANET.NORM)
+        self.conv_module = conv_module
+        self.dim_cls = len(cfg.MODEL.CLASSES) - 1
+        self.cls_conv = nn.ModuleList(
+            conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
+        self.bbox_conv = nn.ModuleList(
+            conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
+        self.cls_norm = nn.ModuleList()
+        self.bbox_norm = nn.ModuleList()
+        for _ in range(len(self.cls_conv)):
+            self.cls_norm.append(nn.ModuleList())
+            self.bbox_norm.append(nn.ModuleList())
+            for _ in range(len(in_dims)):
+                self.cls_norm[-1].append(norm_module(in_dims[0]))
+                self.bbox_norm[-1].append(norm_module(in_dims[0]))
+        self.targets = AnchorTargets()
+        num_anchors = self.targets.generator.num_cell_anchors(0)
+        self.cls_score = conv_module(dim_out=self.dim_cls * num_anchors)
+        self.bbox_pred = conv_module(dim_out=4 * num_anchors)
+        self.activation = build_activation(cfg.RETINANET.ACTIVATION, inplace=True)
+        self.cls_loss = build_loss('sigmoid_focal')
+        self.bbox_loss = build_loss(cfg.RETINANET.BBOX_REG_LOSS_TYPE, beta=0.1)
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.normal_(m.weight, std=0.01)
+        # Bias prior initialization for focal loss.
+        for name, param in self.cls_score.named_parameters():
+            if name.endswith('bias'):
+                nn.init.constant_(param, -math.log((1 - 0.01) / 0.01))
+    def optimize_for_inference(self):
+        """Optimize modules for inference."""
+        if hasattr(self.cls_norm[0][0], 'momentum'):
+            cls_conv = nn.ModuleList()
+            bbox_conv = nn.ModuleList()
+            for i in range(len(self.cls_norm)):
+                cls_conv.append(nn.ModuleList())
+                bbox_conv.append(nn.ModuleList())
+                cls_state = self.cls_conv[i].state_dict()
+                bbox_state = self.bbox_conv[i].state_dict()
+                for j in range(len(self.cls_norm[i])):
+                    cls_conv[i].append(self.conv_module()._apply(
+                        lambda t: t.to(self.cls_norm[i][j].weight.device)))
+                    bbox_conv[i].append(self.conv_module()._apply(
+                        lambda t: t.to(self.bbox_norm[i][j].weight.device)))
+                    cls_conv[i][j].load_state_dict(cls_state)
+                    bbox_conv[i][j].load_state_dict(bbox_state)
+                    fuse_conv_bn(cls_conv[i][j][-1], self.cls_norm[i][j])
+                    fuse_conv_bn(bbox_conv[i][j][-1], self.bbox_norm[i][j])
+            self._modules['cls_conv'] = cls_conv
+            self._modules['bbox_conv'] = bbox_conv
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        features = list(inputs['features'])
+        cls_score, bbox_pred = [], []
+        for j, feature in enumerate(features):
+            cls_input, box_input = feature, feature
+            for i in range(len(self.cls_conv)):
+                if isinstance(self.cls_conv[i], nn.ModuleList):
+                    cls_input = self.cls_conv[i][j](cls_input)
+                    box_input = self.bbox_conv[i][j](box_input)
+                else:
+                    cls_input = self.cls_conv[i](cls_input)
+                    box_input = self.bbox_conv[i](box_input)
+                cls_input = self.activation(self.cls_norm[i][j](cls_input))
+                box_input = self.activation(self.bbox_norm[i][j](box_input))
+            cls_score.append(self.cls_score(cls_input).reshape_((0, self.dim_cls, -1)))
+            bbox_pred.append(self.bbox_pred(box_input).reshape_((0, 4, -1)))
+        cls_score = torch.cat(cls_score, 2) if len(features) > 1 else cls_score[0]
+        bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
+        return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        """Return the losses."""
+        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
+        bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
+        cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
+                                   targets['bbox_anchors'])
+        normalizer = targets['bbox_inds'].size(0)
+        cls_loss_weight = 1.0 / normalizer
+        bbox_loss_weight = cfg.RETINANET.BBOX_REG_LOSS_WEIGHT / normalizer
+        cls_loss = cls_loss.mul_(cls_loss_weight)
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
+    def forward(self, inputs):
+        outputs = self.get_outputs(inputs)
+        if self.training:
+            targets = self.targets.compute(**inputs)
+            logits = {'cls_score': outputs['cls_score'].float(),
+                      'bbox_pred': outputs['bbox_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            cls_score = outputs['cls_score'].permute(0, 2, 1)
+            cls_score = nn.functional.sigmoid(cls_score, inplace=True)
+            return {'cls_score': cls_score.float(),
+                    'bbox_pred': outputs['bbox_pred'].float()}
--- a/seetadet/models/dense_heads/rpn.py
+++ b/seetadet/models/dense_heads/rpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RPN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.targets.rpn import AnchorTargets
+from seetadet.ops.build import build_loss
+class RPNHead(nn.Module):
+    """RPN head."""
+    def __init__(self, in_dims):
+        super(RPNHead, self).__init__()
+        self.targets = AnchorTargets()
+        dim, num_anchors = in_dims[0], self.targets.generator.num_cell_anchors(0)
+        self.output_conv = nn.ModuleList(nn.Conv2d(
+            dim, dim, 3, padding=1) for _ in range(cfg.RPN.NUM_CONV))
+        self.cls_score = nn.Conv2d(dim, num_anchors, 1)
+        self.bbox_pred = nn.Conv2d(dim, num_anchors * 4, 1)
+        self.activation = nn.ReLU(inplace=True)
+        self.cls_loss = nn.BCEWithLogitsLoss(reduction='mean')
+        self.bbox_loss = build_loss(cfg.RPN.BBOX_REG_LOSS_TYPE, beta=0.1)
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.normal_(m.weight, std=0.01)
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        features = list(inputs['features'])
+        cls_score, bbox_pred = [], []
+        for x in features:
+            for conv in self.output_conv:
+                x = self.activation(conv(x))
+            cls_score.append(self.cls_score(x).reshape_((0, -1)))
+            bbox_pred.append(self.bbox_pred(x).reshape_((0, 4, -1)))
+        cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
+        bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
+        return {'rpn_cls_score': cls_score, 'rpn_bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        """Return the losses."""
+        bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
+        bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
+        cls_score = inputs['cls_score'].flatten(0, 1)[targets['cls_inds']]
+        cls_loss = self.cls_loss(cls_score, targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
+                                   targets['bbox_anchors'])
+        normalizer = cfg.RPN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
+        bbox_loss_weight = cfg.RPN.BBOX_REG_LOSS_WEIGHT / normalizer
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'rpn_cls_loss': cls_loss, 'rpn_bbox_loss': bbox_loss}
+    def forward(self, inputs):
+        outputs = self.get_outputs(inputs)
+        outputs['rpn_bbox_pred'] = outputs['rpn_bbox_pred'].float()
+        outputs['rpn_cls_score'] = outputs['rpn_cls_score'].float()
+        if self.training:
+            targets = self.targets.compute(**inputs)
+            rpn_cls_score = outputs.pop('rpn_cls_score')
+            outputs['rpn_cls_score'] = rpn_cls_score.data
+            logits = {'cls_score': rpn_cls_score,
+                      'bbox_pred': outputs['rpn_bbox_pred']}
+            outputs.update(self.get_losses(logits, targets))
+        return outputs
--- a/seetadet/models/dense_heads/ssd.py
+++ b/seetadet/models/dense_heads/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""SSD head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.data.targets.ssd import AnchorTargets
+from seetadet.ops.build import build_loss
+from seetadet.ops.conv import ConvNorm2d
+class SSDHead(nn.Module):
+    """SSD head."""
+    def __init__(self, in_dims):
+        super(SSDHead, self).__init__()
+        self.targets = AnchorTargets()
+        self.cls_score = nn.ModuleList()
+        self.bbox_pred = nn.ModuleList()
+        self.num_classes = len(cfg.MODEL.CLASSES)
+        conv_module = nn.Conv2d
+        if cfg.FPN.CONV == 'SepConv2d':
+            conv_module = functools.partial(ConvNorm2d, conv_type='SepConv2d')
+        conv_module = functools.partial(conv_module, kernel_size=3, padding=1)
+        for i, dim in enumerate(in_dims):
+            num_anchors = self.targets.generator.num_cell_anchors(i)
+            self.cls_score.append(conv_module(dim, num_anchors * self.num_classes))
+            self.bbox_pred.append(conv_module(dim, num_anchors * 4))
+        self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='sum')
+        self.bbox_loss = build_loss(cfg.SSD.BBOX_REG_LOSS_TYPE)
+    def get_outputs(self, inputs):
+        """Return the outputs."""
+        features = list(inputs['features'])
+        cls_score, bbox_pred = [], []
+        for i, x in enumerate(features):
+            cls_score.append(self.cls_score[i](x).permute(0, 2, 3, 1).flatten_(1))
+            bbox_pred.append(self.bbox_pred[i](x).permute(0, 2, 3, 1).flatten_(1))
+        cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
+        bbox_pred = torch.cat(bbox_pred, 1) if len(features) > 1 else bbox_pred[0]
+        cls_score = cls_score.reshape_((0, -1, self.num_classes))
+        bbox_pred = bbox_pred.reshape_((0, -1, 4))
+        return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        """Return the losses."""
+        cls_score = inputs['cls_score'].flatten_(0, 1)
+        bbox_pred = inputs['bbox_pred'].flatten_(0, 1)
+        bbox_pred = bbox_pred[targets['bbox_inds']]
+        cls_loss = self.cls_loss(cls_score, targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
+                                   targets['bbox_anchors'])
+        normalizer = targets['bbox_inds'].size(0)
+        cls_loss_weight = 1.0 / normalizer
+        bbox_loss_weight = cfg.SSD.BBOX_REG_LOSS_WEIGHT / normalizer
+        cls_loss = cls_loss.mul_(cls_loss_weight)
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
+    def forward(self, inputs):
+        outputs = self.get_outputs(inputs)
+        cls_score = outputs['cls_score']
+        if self.training:
+            cls_score_data = nn.functional.softmax(cls_score.data, dim=2)
+            targets = self.targets.compute(cls_score=cls_score_data, **inputs)
+            logits = {'cls_score': cls_score.float(),
+                      'bbox_pred': outputs['bbox_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            cls_score = nn.functional.softmax(cls_score, dim=2, inplace=True)
+            return {'cls_score': cls_score.float(),
+                    'bbox_pred': outputs['bbox_pred'].float()}
--- a/seetadet/models/detectors/__init__.py
+++ b/seetadet/models/detectors/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Detectors."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.models.detectors.detector import Detector
+from seetadet.models.detectors.rcnn import CascadeRCNN
+from seetadet.models.detectors.rcnn import FasterRCNN
+from seetadet.models.detectors.rcnn import MaskRCNN
+from seetadet.models.detectors.retinanet import RetinaNet
+from seetadet.models.detectors.ssd import SSD
--- a/seetadet/models/detectors/detector.py
+++ b/seetadet/models/detectors/detector.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Base detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import build_backbone
+from seetadet.ops.fusion import get_fusion
+from seetadet.ops.normalization import ToTensor
+from seetadet.utils import logging
+class Detector(nn.Module):
+    """Class to build and compute the detection pipelines."""
+    def __init__(self):
+        super(Detector, self).__init__()
+        self.to_tensor = ToTensor()
+        self.backbone, self.neck = build_backbone()
+        self.backbone_dims = self.neck.out_dims
+    def get_inputs(self, inputs):
+        """Return the detection inputs.
+        Parameters
+        ----------
+        inputs : dict, optional
+            The optional inputs.
+        """
+        inputs['img'] = self.to_tensor(inputs['img'], normalize=True)
+        return inputs
+    def get_features(self, inputs):
+        """Return the detection features.
+        Parameters
+        ----------
+        inputs : dict
+            The inputs.
+        """
+        return self.neck(self.backbone(inputs['img']))
+    def get_outputs(self, inputs):
+        """Return the detection outputs.
+        Parameters
+        ----------
+        inputs : dict
+            The inputs.
+        """
+        return inputs
+    def forward(self, inputs):
+        """Define the computation performed at every call.
+        Parameters
+        ----------
+        inputs : dict
+            The inputs.
+        """
+        return self.get_outputs(inputs)
+    def load_weights(self, weights, strict=False):
+        """Load the state dict of this detector.
+        Parameters
+        ----------
+        weights : str
+            The path of the weights file.
+        """
+        return self.load_state_dict(torch.load(weights), strict=strict)
+    def optimize_for_inference(self):
+        """Optimize the graph for the inference."""
+        # Set precision.
+        precision = cfg.MODEL.PRECISION.lower()
+        self.half() if precision == 'float16' else self.float()
+        logging.info('Set precision: ' + precision)
+        # Fuse modules.
+        fusion_memo, last_module = set(), None
+        for module in self.modules():
+            if module is self:
+                continue
+            if hasattr(module, 'optimize_for_inference'):
+                module.optimize_for_inference()
+                fusion_memo.add(module.__class__.__name__)
+                continue
+            key, fn = get_fusion(last_module, module)
+            if fn is not None:
+                fusion_memo.add(key)
+                fn(last_module, module)
+            last_module = module
+        for key in fusion_memo:
+            logging.info('Fuse modules: ' + key)
--- a/seetadet/models/detectors/rcnn.py
+++ b/seetadet/models/detectors/rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""R-CNN detectors."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import nn
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.data.targets.rcnn import ProposalTargets
+from seetadet.models.build import DETECTORS
+from seetadet.models.decoders.rpn import RPNDecoder
+from seetadet.models.dense_heads.rpn import RPNHead
+from seetadet.models.detectors.detector import Detector
+from seetadet.models.roi_heads.fast_rcnn import FastRCNNHead
+from seetadet.models.roi_heads.mask_rcnn import MaskRCNNHead
+from seetadet.utils.bbox import bbox_transform_inv
+@DETECTORS.register('faster_rcnn')
+class FasterRCNN(Detector):
+    """Faster R-CNN detector."""
+    def __init__(self):
+        super(FasterRCNN, self).__init__()
+        self.rpn_head = RPNHead(self.backbone_dims)
+        self.bbox_head = FastRCNNHead(self.backbone_dims)
+        self.rpn_decoder = RPNDecoder()
+        self.proposal_targets = ProposalTargets()
+    def get_outputs(self, inputs):
+        """Return the detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs = self.rpn_head(inputs)
+        inputs['rois'] = self.rpn_decoder({
+            'cls_score': outputs.pop('rpn_cls_score'),
+            'bbox_pred': outputs.pop('rpn_bbox_pred'),
+            'im_info': inputs['im_info'],
+            'grid_info': inputs['grid_info']})
+        if self.training:
+            targets = self.proposal_targets.compute(**inputs)
+            inputs['rois'] = targets['rois']
+            outputs.update(self.bbox_head(inputs, targets))
+        else:
+            outputs.update(self.bbox_head(inputs))
+        return outputs
+@DETECTORS.register('mask_rcnn')
+class MaskRCNN(Detector):
+    """Mask R-CNN detector."""
+    def __init__(self):
+        super(MaskRCNN, self).__init__()
+        self.rpn_head = RPNHead(self.backbone_dims)
+        self.bbox_head = FastRCNNHead(self.backbone_dims)
+        self.mask_head = MaskRCNNHead(self.backbone_dims)
+        self.rpn_decoder = RPNDecoder()
+        self.proposal_targets = ProposalTargets()
+    def get_outputs(self, inputs):
+        """Return the detection outputs."""
+        inputs, outputs = self.get_inputs(inputs), {}
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs.update(self.rpn_head(inputs))
+        inputs['rois'] = self.rpn_decoder({
+            'cls_score': outputs.pop('rpn_cls_score'),
+            'bbox_pred': outputs.pop('rpn_bbox_pred'),
+            'im_info': inputs['im_info'],
+            'grid_info': inputs['grid_info']})
+        if self.training:
+            targets = self.proposal_targets.compute(**inputs)
+            inputs['rois'] = targets.pop('rois')
+            outputs.update(self.bbox_head(inputs, targets))
+            inputs['rois'] = targets.pop('fg_rois')
+            outputs.update(self.mask_head(inputs, targets))
+        else:
+            outputs.update(self.bbox_head(inputs))
+            self.outputs = {'features': inputs['features']}
+        return outputs
+@DETECTORS.register('cascade_rcnn')
+class CascadeRCNN(Detector):
+    """Cascade R-CNN detector."""
+    def __init__(self):
+        super(CascadeRCNN, self).__init__()
+        self.cascade_ious = cfg.CASCADE_RCNN.POSITIVE_OVERLAP
+        self.bbox_reg_weights = cfg.CASCADE_RCNN.BBOX_REG_WEIGHTS
+        self.rpn_head = RPNHead(self.backbone_dims)
+        self.bbox_heads = nn.ModuleList(FastRCNNHead(self.backbone_dims)
+                                        for _ in range(len(self.cascade_ious)))
+        if cfg.CASCADE_RCNN.MASK_ON:
+            self.mask_head = MaskRCNNHead(self.backbone_dims)
+        else:
+            self.mask_head = None
+        self.rpn_decoder = RPNDecoder()
+        self.proposal_targets = ProposalTargets()
+    def get_outputs(self, inputs):
+        """Return the detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs = self.rpn_head(inputs)
+        inputs['rois'] = self.rpn_decoder({
+            'cls_score': outputs.pop('rpn_cls_score'),
+            'bbox_pred': outputs.pop('rpn_bbox_pred'),
+            'im_info': inputs['im_info'],
+            'grid_info': inputs['grid_info']})
+        if self.training:
+            assigner = self.proposal_targets.assigner
+            outputs['cls_loss'], outputs['bbox_loss'] = [], []
+            mask_targets = {}
+            for i, bbox_head in enumerate(self.bbox_heads):
+                assigner.pos_iou_thr = assigner.neg_iou_thr = self.cascade_ious[i]
+                self.proposal_targets.bbox_reg_weights = self.bbox_reg_weights[i]
+                targets = self.proposal_targets.compute(**inputs)
+                if self.mask_head is not None and 'gt_segms' in inputs:
+                    inputs.pop('gt_segms')
+                    for k in ('fg_rois', 'mask_inds', 'mask_targets'):
+                        mask_targets[k] = targets.pop(k)
+                proposals, inputs['rois'] = targets['proposals'], targets['rois']
+                outputs_i = bbox_head(inputs, targets)
+                outputs['cls_loss'].append(outputs_i['cls_loss'])
+                outputs['bbox_loss'].append(outputs_i['bbox_loss'])
+                if i < len(self.bbox_heads) - 1:
+                    boxes = bbox_transform_inv(
+                        proposals[:, 1:5], outputs_i['bbox_pred'].numpy(),
+                        weights=self.bbox_reg_weights[i])
+                    inputs['rois'] = np.hstack((proposals[:, :1], boxes))
+            if self.mask_head is not None:
+                inputs['rois'] = mask_targets.pop('fg_rois')
+                outputs.update(self.mask_head(inputs, mask_targets))
+        else:
+            outputs.update(self.bbox_heads[0](inputs))
+            self.outputs = {'features': inputs['features'], 'rois': inputs['rois']}
+        return outputs
--- a/seetadet/models/detectors/retinanet.py
+++ b/seetadet/models/detectors/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.models.build import DETECTORS
+from seetadet.models.decoders.retinanet import RetinaNetDecoder
+from seetadet.models.dense_heads.retinanet import RetinaNetHead
+from seetadet.models.detectors.detector import Detector
+@DETECTORS.register('retinanet')
+class RetinaNet(Detector):
+    """RetinaNet detector."""
+    def __init__(self):
+        super(RetinaNet, self).__init__()
+        self.bbox_head = RetinaNetHead(self.backbone_dims)
+        self.bbox_decoder = RetinaNetDecoder()
+    def get_outputs(self, inputs):
+        """Compute detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        inputs['grid_info'] = inputs.pop(
+            'grid_info', [x.shape[-2:] for x in inputs['features']])
+        outputs = self.bbox_head(inputs)
+        if not self.training:
+            outputs['dets'] = self.bbox_decoder({
+                'cls_score': outputs.pop('cls_score'),
+                'bbox_pred': outputs.pop('bbox_pred'),
+                'im_info': inputs['im_info'],
+                'grid_info': inputs['grid_info']})
+        return outputs
--- a/seetadet/models/detectors/ssd.py
+++ b/seetadet/models/detectors/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""SSD detector."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.models.build import DETECTORS
+from seetadet.models.dense_heads.ssd import SSDHead
+from seetadet.models.detectors.detector import Detector
+@DETECTORS.register('ssd')
+class SSD(Detector):
+    """SSD detector."""
+    def __init__(self):
+        super(SSD, self).__init__()
+        self.bbox_head = SSDHead(self.backbone_dims)
+    def get_outputs(self, inputs=None):
+        """Compute detection outputs."""
+        inputs = self.get_inputs(inputs)
+        inputs['features'] = self.get_features(inputs)
+        outputs = self.bbox_head(inputs)
+        return outputs
--- a/seetadet/models/necks/__init__.py
+++ b/seetadet/models/necks/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Necks."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+# Modules
+from seetadet.models.necks import bifpn
+from seetadet.models.necks import fpn
+from seetadet.models.necks import ssd
--- a/seetadet/models/necks/bifpn.py
+++ b/seetadet/models/necks/bifpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""BiFPN neck."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import NECKS
+from seetadet.ops.build import build_activation
+from seetadet.ops.conv import ConvNorm2d
+class FuseOp(nn.Module):
+    """Operator to fuse input features."""
+    def __init__(self, num_inputs):
+        super(FuseOp, self).__init__()
+        self.fuse_type = cfg.FPN.FUSE_TYPE
+        if self.fuse_type == 'weighted':
+            self.weight = nn.Parameter(torch.ones(num_inputs))
+    def forward(self, *inputs):
+        if self.fuse_type == 'weighted':
+            weights = nn.functional.softmax(self.weight, dim=0).split(1)
+            outputs = inputs[0] * weights[0]
+            for x, w in zip(inputs[1:], weights[1:]):
+                outputs += x * w
+        else:
+            outputs = inputs[0]
+            for x in inputs[1:]:
+                outputs += x
+        return outputs
+class Block(nn.Module):
+    """BiFPN block."""
+    def __init__(self, in_dims=None):
+        super(Block, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FPN.NORM, conv_type=cfg.FPN.CONV)
+        self.dim = cfg.FPN.DIM
+        self.min_lvl = cfg.FPN.MIN_LEVEL
+        self.max_lvl = cfg.FPN.MAX_LEVEL
+        self.highest_lvl = min(self.max_lvl, len(in_dims))
+        self.coarsest_stride = cfg.BACKBONE.COARSEST_STRIDE
+        self.output_conv1, self.output_fuse1 = nn.ModuleList(), nn.ModuleList()
+        self.output_conv2, self.output_fuse2 = nn.ModuleList(), nn.ModuleList()
+        for lvl in range(self.min_lvl, self.max_lvl):
+            self.output_conv1 += [conv_module(self.dim, self.dim, 3)]
+            self.output_conv2 += [conv_module(self.dim, self.dim, 3)]
+            self.output_fuse1 += [FuseOp(2)]
+            self.output_fuse2 += [FuseOp(3 if lvl < self.max_lvl - 1 else 2)]
+        self.activation = build_activation(cfg.FPN.ACTIVATION, inplace=True)
+    def forward(self, laterals1, laterals2=None):
+        outputs = [laterals1[-1]]
+        for i in range(len(laterals1) - 1, 0, -1):
+            x1, x2 = outputs[0], laterals1[i - 1]
+            scale = 2 if self.coarsest_stride > 1 else None
+            size = None if self.coarsest_stride > 1 else x2.shape[2:]
+            x1 = nn.functional.interpolate(x1, size, scale)
+            y = self.output_fuse1[i - 1](x1, x2)
+            outputs.insert(0, self.output_conv1[i - 1](self.activation(y)))
+        if laterals2 is None:
+            laterals2 = laterals1[1:]
+        else:
+            laterals2 += laterals1[self.highest_lvl - self.min_lvl + 1:]
+        for i in range(1, len(outputs)):
+            x1, x2 = outputs[i - 1], laterals2[i - 1]
+            x1 = nn.functional.max_pool2d(x1, 3, 2, padding=1)
+            if i < len(outputs) - 1:
+                y = self.output_fuse2[i - 1](x1, x2, outputs[i])
+            else:
+                y = self.output_fuse2[i - 1](x1, x2)
+            outputs[i] = self.output_conv2[i - 1](self.activation(y))
+        return outputs
+@NECKS.register('bifpn')
+class BiFPN(nn.Module):
+    """BiFPN to enhance input features."""
+    def __init__(self, in_dims=None):
+        super(BiFPN, self).__init__()
+        conv_module = functools.partial(ConvNorm2d, norm_type=cfg.FPN.NORM)
+        self.dim = cfg.FPN.DIM
+        self.min_lvl = cfg.FPN.MIN_LEVEL
+        self.max_lvl = cfg.FPN.MAX_LEVEL
+        self.highest_lvl = min(self.max_lvl, len(in_dims))
+        self.out_dims = [self.dim] * (self.max_lvl - self.min_lvl + 1)
+        self.lateral_conv1 = nn.ModuleList()
+        self.lateral_conv2 = nn.ModuleList()
+        for dim in in_dims[self.min_lvl - 1:self.highest_lvl]:
+            self.lateral_conv1 += [conv_module(dim, self.dim, 1)]
+        for dim in in_dims[self.min_lvl:self.highest_lvl]:
+            self.lateral_conv2 += [conv_module(dim, self.dim, 1)]
+        for lvl in range(self.highest_lvl + 1, self.max_lvl + 1):
+            dim = in_dims[-1] if lvl == self.highest_lvl + 1 else self.dim
+            self.lateral_conv1 += [conv_module(dim, self.dim, 1)
+                                   if lvl == self.highest_lvl + 1 else nn.Identity()]
+        self.blocks = nn.ModuleList(Block(in_dims) for _ in range(cfg.FPN.NUM_BLOCKS))
+    def forward(self, features):
+        features = features[self.min_lvl - 1:self.highest_lvl]
+        laterals1 = [conv(x) for conv, x in zip(self.lateral_conv1, features)]
+        laterals2 = [conv(x) for conv, x in zip(self.lateral_conv2, features[1:])]
+        x = features[-1]
+        for i in range(len(laterals1), len(self.out_dims)):
+            x = self.lateral_conv1[i](x)
+            x = nn.functional.max_pool2d(x, 3, 2, padding=1)
+            laterals1.append(x)
+        for i, blk in enumerate(self.blocks):
+            laterals1 = blk(laterals1, laterals2 if i == 0 else None)
+        return laterals1
--- a/seetadet/models/necks/fpn.py
+++ b/seetadet/models/necks/fpn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""FPN neck."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import NECKS
+from seetadet.ops.conv import ConvNorm2d
+@NECKS.register('fpn')
+class FPN(nn.Module):
+    """FPN to enhance input features."""
+    def __init__(self, in_dims):
+        super(FPN, self).__init__()
+        lateral_conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FPN.NORM)
+        output_conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FPN.NORM, conv_type=cfg.FPN.CONV)
+        self.dim = cfg.FPN.DIM
+        self.min_lvl = cfg.FPN.MIN_LEVEL
+        self.max_lvl = cfg.FPN.MAX_LEVEL
+        self.fuse_lvl = cfg.FPN.FUSE_LEVEL
+        self.highest_lvl = min(self.max_lvl, len(in_dims))
+        self.coarsest_stride = cfg.BACKBONE.COARSEST_STRIDE
+        self.out_dims = [self.dim] * (self.max_lvl - self.min_lvl + 1)
+        self.lateral_conv = nn.ModuleList()
+        self.output_conv = nn.ModuleList()
+        for dim in in_dims[self.min_lvl - 1:self.highest_lvl]:
+            self.lateral_conv += [lateral_conv_module(dim, self.dim, 1)]
+            self.output_conv += [output_conv_module(self.dim, self.dim, 3)]
+        if 'rcnn' not in cfg.MODEL.TYPE:
+            for lvl in range(self.highest_lvl + 1, self.max_lvl + 1):
+                dim = in_dims[-1] if lvl == self.highest_lvl + 1 else self.dim
+                self.output_conv += [output_conv_module(dim, self.dim, 3, stride=2)]
+    def forward(self, features):
+        features = features[self.min_lvl - 1:self.highest_lvl]
+        laterals = [conv(x) for conv, x in zip(self.lateral_conv, features)]
+        for i in range(self.fuse_lvl - self.min_lvl, 0, -1):
+            y, x = laterals[i - 1], laterals[i]
+            scale = 2 if self.coarsest_stride > 1 else None
+            size = None if self.coarsest_stride > 1 else y.shape[2:]
+            y += nn.functional.interpolate(x, size, scale)
+        outputs = [conv(x) for conv, x in zip(self.output_conv, laterals)]
+        if len(self.output_conv) <= len(self.lateral_conv):
+            for _ in range(len(outputs), len(self.out_dims)):
+                outputs.append(nn.functional.max_pool2d(outputs[-1], 1, stride=2))
+        else:
+            outputs.append(self.output_conv[len(outputs)](features[-1]))
+            for i in range(len(outputs), len(self.out_dims)):
+                outputs.append(self.output_conv[i](nn.functional.relu(outputs[-1])))
+        return outputs
--- a/seetadet/models/necks/ssd.py
+++ b/seetadet/models/necks/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""SSD neck."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.models.build import NECKS
+from seetadet.ops.conv import ConvNorm2d
+class SSDNeck(nn.Module):
+    """Feature Pyramid Network."""
+    def __init__(self, in_dims, out_dims, kernel_sizes, strides, paddings):
+        super(SSDNeck, self).__init__()
+        self.out_dims = list(in_dims[-2:]) + list(out_dims)
+        dim_in, self.blocks = in_dims[-1], nn.ModuleList()
+        conv_module = functools.partial(
+            ConvNorm2d, conv_type=cfg.FPN.CONV,
+            norm_type=cfg.FPN.NORM, activation_type=cfg.FPN.ACTIVATION)
+        for dim, kernel_size, stride, padding in zip(
+                out_dims, kernel_sizes, strides, paddings):
+            self.blocks.append(conv_module(dim_in, dim // 2, 1))
+            self.blocks.append(conv_module(dim // 2, dim, kernel_size, stride, padding))
+            dim_in = dim
+    def forward(self, features):
+        x, outputs = features[-1], features[-2:]
+        for i, blk in enumerate(self.blocks):
+            x = blk(x)
+            if i % 2 > 0:
+                outputs.append(x)
+        return outputs
+NECKS.register(
+    'ssd300', SSDNeck,
+    out_dims=(512, 256, 256, 256),
+    kernel_sizes=(3, 3, 3, 3),
+    strides=(2, 2, 1, 1),
+    paddings=(1, 1, 0, 0))
+NECKS.register(
+    'ssd512', SSDNeck,
+    out_dims=(512, 256, 256, 256, 256),
+    kernel_sizes=(3, 3, 3, 3, 4),
+    strides=(2, 2, 2, 2, 1),
+    paddings=(1, 1, 1, 1, 1))
+NECKS.register(
+    'ssdlite', SSDNeck,
+    out_dims=(512, 256, 256, 128),
+    kernel_sizes=(3, 3, 3, 3),
+    strides=(2, 2, 2, 2),
+    paddings=(1, 1, 1, 1))
--- a/seetadet/models/roi_heads/__init__.py
+++ b/seetadet/models/roi_heads/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/models/roi_heads/fast_rcnn.py
+++ b/seetadet/models/roi_heads/fast_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Fast R-CNN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.ops.build import build_loss
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.vision import RoIPooler
+class FastRCNNHead(nn.Module):
+    """Fast R-CNN head."""
+    def __init__(self, in_dims):
+        super(FastRCNNHead, self).__init__()
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.FAST_RCNN.NORM,
+            kernel_size=3, activation_type='ReLU')
+        self.output_conv = nn.ModuleList()
+        self.output_fc = nn.ModuleList()
+        for i in range(cfg.FAST_RCNN.NUM_CONV):
+            dim = in_dims[0] if i == 0 else cfg.FAST_RCNN.CONV_HEAD_DIM
+            self.output_conv += [conv_module(dim, cfg.FAST_RCNN.CONV_HEAD_DIM)]
+        for i in range(cfg.FAST_RCNN.NUM_FC):
+            dim = in_dims[0] * cfg.FAST_RCNN.POOLER_RESOLUTION ** 2
+            dim = dim if i == 0 else cfg.FAST_RCNN.FC_HEAD_DIM
+            self.output_fc += [nn.Sequential(nn.Linear(dim, cfg.FAST_RCNN.FC_HEAD_DIM),
+                                             nn.ReLU(inplace=True))]
+        self.cls_score = nn.Linear(cfg.FAST_RCNN.FC_HEAD_DIM, len(cfg.MODEL.CLASSES))
+        num_classes = 1 if cfg.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC else len(cfg.MODEL.CLASSES) - 1
+        self.bbox_pred = nn.Linear(cfg.FAST_RCNN.FC_HEAD_DIM, num_classes * 4)
+        self.pooler = RoIPooler(
+            pooler_type=cfg.FAST_RCNN.POOLER_TYPE,
+            resolution=cfg.FAST_RCNN.POOLER_RESOLUTION,
+            sampling_ratio=cfg.FAST_RCNN.POOLER_SAMPLING_RATIO)
+        self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='mean')
+        self.bbox_loss = build_loss(cfg.FAST_RCNN.BBOX_REG_LOSS_TYPE)
+        self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
+            cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL + 1)]
+        self.reset_parameters()
+    def reset_parameters(self):
+        nn.init.normal_(self.cls_score.weight, std=0.01)
+        nn.init.normal_(self.bbox_pred.weight, std=0.001)
+    def get_outputs(self, inputs):
+        x = torch.cat([self.pooler(
+            inputs['features'][i], inputs['rois'][i],
+            spatial_scale=spatial_scale) for i, spatial_scale
+            in enumerate(self.spatial_scales)])
+        for layer in self.output_conv:
+            x = layer(x)
+        x = x.flatten_(1)
+        for layer in self.output_fc:
+            x = layer(x)
+        cls_score, bbox_pred = self.cls_score(x), self.bbox_pred(x)
+        return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
+    def get_losses(self, inputs, targets):
+        bbox_pred = inputs['bbox_pred'].reshape_((0, -1, 4))
+        bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
+        cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
+        bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'])
+        normalizer = cfg.FAST_RCNN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
+        bbox_loss_weight = cfg.FAST_RCNN.BBOX_REG_LOSS_WEIGHT / normalizer
+        bbox_loss = bbox_loss.mul_(bbox_loss_weight)
+        return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
+    def forward(self, inputs, targets=None):
+        outputs = self.get_outputs(inputs)
+        if self.training:
+            logits = {'cls_score': outputs['cls_score'].float(),
+                      'bbox_pred': outputs['bbox_pred'].float()}
+            outputs = self.get_losses(logits, targets)
+            outputs['bbox_pred'] = logits['bbox_pred'].data
+            return outputs
+        else:
+            outputs['cls_score'] = nn.functional.softmax(
+                outputs['cls_score'], dim=1, inplace=True)
+            return {'rois': torch.cat(inputs['rois']),
+                    'cls_score': outputs['cls_score'].float(),
+                    'bbox_pred': outputs['bbox_pred'].float()}
--- a/seetadet/models/roi_heads/mask_rcnn.py
+++ b/seetadet/models/roi_heads/mask_rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask R-CNN head."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.ops.conv import ConvNorm2d
+from seetadet.ops.vision import RoIPooler
+class MaskRCNNHead(nn.Module):
+    """Mask R-CNN head."""
+    def __init__(self, in_dims):
+        super(MaskRCNNHead, self).__init__()
+        self.dim = cfg.MASK_RCNN.CONV_HEAD_DIM
+        conv_module = functools.partial(
+            ConvNorm2d, norm_type=cfg.MASK_RCNN.NORM,
+            kernel_size=3, activation_type='ReLU')
+        self.output_conv = nn.ModuleList()
+        for i in range(cfg.MASK_RCNN.NUM_CONV):
+            dim = in_dims[0] if i == 0 else self.dim
+            self.output_conv += [conv_module(dim, self.dim)]
+        self.output_conv += [nn.Sequential(
+            nn.ConvTranspose2d(self.dim, self.dim, 2, 2),
+            nn.ReLU(True))]
+        self.mask_pred = nn.Conv2d(self.dim, len(cfg.MODEL.CLASSES) - 1, 1)
+        self.pooler = RoIPooler(
+            pooler_type=cfg.MASK_RCNN.POOLER_TYPE,
+            resolution=cfg.MASK_RCNN.POOLER_RESOLUTION,
+            sampling_ratio=cfg.MASK_RCNN.POOLER_SAMPLING_RATIO)
+        self.mask_loss = nn.BCEWithLogitsLoss(reduction='mean')
+        self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
+            cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL + 1)]
+        self.reset_parameters()
+    def reset_parameters(self):
+        nn.init.normal_(self.mask_pred.weight, std=0.001)
+    def get_outputs(self, inputs):
+        x = torch.cat([self.pooler(
+            inputs['features'][i], inputs['rois'][i],
+            spatial_scale=spatial_scale) for i, spatial_scale
+            in enumerate(self.spatial_scales)])
+        for layer in self.output_conv:
+            x = layer(x)
+        return {'mask_pred': self.mask_pred(x)}
+    def get_losses(self, inputs, targets):
+        mask_pred = inputs['mask_pred']
+        mask_pred = mask_pred.flatten_(0, 1)[targets['mask_inds']]
+        mask_loss = self.mask_loss(mask_pred, targets['mask_targets'])
+        return {'mask_loss': mask_loss}
+    def forward(self, inputs, targets=None):
+        outputs = self.get_outputs(inputs)
+        if self.training:
+            logits = {'mask_pred': outputs['mask_pred'].float()}
+            return self.get_losses(logits, targets)
+        else:
+            outputs['mask_pred'] = nn.functional.sigmoid(
+                outputs['mask_pred'], inplace=True).float()
+            return {'mask_pred': outputs['mask_pred']}
--- a/seetadet/modules/__init__.py
+++ b/seetadet/modules/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Modules."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.modules import rcnn
+from seetadet.modules import retinanet
+from seetadet.modules import ssd
--- a/seetadet/modules/build.py
+++ b/seetadet/modules/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for modules."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import types
+import codewithgpu
+from dragon.vm import torch
+from seetadet.core.config import cfg
+from seetadet.core.registry import Registry
+from seetadet.utils.profiler import Timer
+INFERENCE_MODULES = Registry('inference_modules')
+def build_inference(model):
+    """Build the inference module."""
+    return INFERENCE_MODULES.get(cfg.MODEL.TYPE)(model)
+class InferenceModule(codewithgpu.InferenceModule):
+    """Inference module."""
+    def __init__(self, model):
+        super(InferenceModule, self).__init__(model)
+        self.timers = collections.defaultdict(Timer)
+    def get_time_diffs(self):
+        """Return the time differences."""
+        return dict((k, v.average_time)
+                    for k, v in self.timers.items())
+    def trace(self, name, func, example_inputs=None):
+        """Trace the function and bound to model."""
+        if not hasattr(self.model, name):
+            setattr(self.model, name, torch.jit.trace(
+                    func=types.MethodType(func, self.model),
+                    example_inputs=example_inputs))
+        return getattr(self.model, name)
+    @staticmethod
+    def register(model_type, **kwargs):
+        """Register a inference module."""
+        def decorated(func):
+            return INFERENCE_MODULES.register(model_type, func, **kwargs)
+        return decorated
--- a/seetadet/modules/rcnn.py
+++ b/seetadet/modules/rcnn.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RCNN modules."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.modules.build import InferenceModule
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.bbox import distribute_boxes
+from seetadet.utils.bbox import filter_empty_boxes
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+@InferenceModule.register(['faster_rcnn', 'mask_rcnn', 'cascade_rcnn'])
+class RCNNInference(InferenceModule):
+    """RCNN inference module."""
+    def __init__(self, model):
+        super(RCNNInference, self).__init__(model)
+        self.forward_model = self.trace(
+            'forward_eval', lambda self, img, im_info, grid_info:
+            self.forward({'img': img, 'im_info': im_info,
+                          'grid_info': grid_info}))
+    @torch.no_grad()
+    def get_results(self, imgs):
+        """Return the inference results."""
+        img_boxes, proposals = self.forward_bbox(imgs)
+        if getattr(self.model, 'mask_head', None) is None:
+            return [{'boxes': boxes} for boxes in img_boxes]
+        proposals = np.concatenate(sum(proposals, []))
+        mask_pred = self.forward_mask(proposals)
+        ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+        img_masks = [[] for _ in range(ims_per_batch)]
+        batch_inds = proposals[:, :1].astype('int32')
+        for i in range(ims_per_batch * num_scales):
+            index = i // num_scales
+            inds = np.where(batch_inds == i)[0]
+            masks, labels = mask_pred[inds], proposals[inds, 5]
+            num_classes = len(img_boxes[index])
+            for _ in range(num_classes - len(img_masks[index])):
+                img_masks[index].append([])
+            for j in range(1, num_classes):
+                img_masks[index][j].append(masks[np.where(labels == (j - 1))[0]])
+                if (i + 1) % num_scales == 0:
+                    v = img_masks[index][j]
+                    img_masks[index][j] = np.vstack(v) if len(v) > 1 else v[0]
+        return [{'boxes': boxes, 'masks': masks}
+                for boxes, masks in zip(img_boxes, img_masks)]
+    @torch.no_grad()
+    def forward_data(self, imgs):
+        """Return the inference data."""
+        im_batch, im_shapes, im_scales = [], [], []
+        for img in imgs:
+            scaled_imgs, scales = im_rescale(
+                img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
+            im_batch += scaled_imgs
+            im_scales += scales
+            im_shapes += [x.shape[:2] for x in scaled_imgs]
+        im_batch = blob_vstack(
+            im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
+            size=(cfg.TEST.CROP_SIZE,) * 2,
+            align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
+        im_shapes = np.array(im_shapes)
+        im_scales = np.array(im_scales).reshape((len(im_batch), -1))
+        im_info = np.hstack([im_shapes, im_scales]).astype('float32')
+        strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+        strides = np.array(strides)[:, None]
+        grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
+        grid_shapes = (grid_shapes - 1) // strides + 1
+        grid_info = grid_shapes.astype('int64')
+        return im_batch, im_info, grid_info
+    @torch.no_grad()
+    def forward_bbox(self, imgs):
+        """Run bbox inference."""
+        im_batch, im_info, grid_info = self.forward_data(imgs)
+        self.timers['im_detect'].tic()
+        inputs = {'img': torch.from_numpy(im_batch),
+                  'im_info': torch.from_numpy(im_info),
+                  'grid_info': torch.from_numpy(grid_info)}
+        outputs = self.forward_model(inputs['img'], inputs['im_info'],
+                                     inputs['grid_info'])
+        outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+        cls_score, bbox_pred = self.forward_cascade(outputs)
+        ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+        results = [([], [], []) for _ in range(ims_per_batch)]
+        batch_inds = outputs['rois'][:, :1].astype('int32')
+        for i in range(ims_per_batch * num_scales):
+            index = i // num_scales
+            inds = np.where(batch_inds == i)[0]
+            boxes = bbox_pred[inds] / im_info[i, 2]
+            boxes = clip_boxes(boxes, imgs[index].shape)
+            results[index][0].append(cls_score[inds])
+            results[index][1].append(boxes)
+            results[index][2].append(batch_inds[inds])
+        results = [[np.vstack(x) for x in y] for y in results]
+        self.timers['im_detect'].toc(n=ims_per_batch)
+        img_boxes, img_proposals = [], []
+        for scores, boxes, batch_inds in results:
+            with self.timers['misc'].tic_and_toc():
+                cls_boxes, cls_proposals = get_cls_results(
+                    scores, boxes, batch_inds, im_info)
+            img_boxes.append(cls_boxes)
+            img_proposals.append(cls_proposals)
+        return img_boxes, img_proposals
+    @torch.no_grad()
+    def forward_mask(self, proposals):
+        """Run mask inference."""
+        lvl_min, lvl_max = cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL
+        lvls = distribute_boxes(proposals[:, 1:5], lvl_min, lvl_max)
+        roi_inds = [np.where(lvls == (i + lvl_min))[0]
+                    for i in range(lvl_max - lvl_min + 1)]
+        rois, labels = [], []
+        for inds in roi_inds:
+            rois.append(proposals[inds, :5] if len(inds) > 0 else
+                        np.array([[-1, 0, 0, 1, 1]], 'float32'))
+            labels.append(proposals[inds, 5].astype('int64')
+                          if len(inds) > 0 else np.array([-1], 'int64'))
+        self.timers['im_detect_mask'].tic()
+        inputs = {'features': self.model.outputs['features'],
+                  'rois': [self.model.to_tensor(x) for x in rois]}
+        mask_pred = self.model.mask_head(inputs)['mask_pred']
+        num_rois, num_classes = mask_pred.shape[:2]
+        labels = np.concatenate(labels)
+        fg_inds = np.where(labels >= 0)[0]
+        strides = np.arange(num_rois) * num_classes
+        mask_inds = self.model.to_tensor(strides[fg_inds] + labels[fg_inds])
+        mask_pred = mask_pred.flatten_(0, 1)[mask_inds].numpy()
+        mask_pred = mask_pred[np.concatenate(roi_inds).argsort()].copy()
+        self.timers['im_detect_mask'].toc()
+        return mask_pred
+    @torch.no_grad()
+    def forward_cascade(self, outputs):
+        """Run cascade inference."""
+        if not hasattr(self.model, 'bbox_heads'):
+            bbox_pred = bbox_transform_inv(
+                outputs['rois'][:, 1:5], outputs['bbox_pred'],
+                weights=cfg.FAST_RCNN.BBOX_REG_WEIGHTS)
+            return outputs['cls_score'], bbox_pred
+        num_stages = len(self.model.bbox_heads)
+        batch_inds = outputs['rois'][:, :1]
+        cls_score = outputs['cls_score'].copy()
+        lvl_slices = np.cumsum([0] + list(x.size(0) for x in self.model.outputs['rois']))
+        lvl_slices = [slice(lvl_slices[i], lvl_slices[i + 1])
+                      for i in range(len(lvl_slices) - 1)]
+        inputs = {'features': self.model.outputs['features']}
+        for i in range(num_stages):
+            if i > 0:
+                outputs = self.model.bbox_heads[i](inputs)
+                outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+                cls_score += outputs['cls_score']
+            bbox_pred = bbox_transform_inv(
+                outputs['rois'][:, 1:5], outputs['bbox_pred'],
+                weights=self.model.bbox_reg_weights[i])
+            if i < num_stages - 1:
+                proposals = np.hstack((batch_inds, bbox_pred))
+                rois = [proposals[lvl_slice] for lvl_slice in lvl_slices]
+                inputs['rois'] = [self.model.to_tensor(x) for x in rois]
+        cls_score *= 1.0 / num_stages
+        return cls_score, bbox_pred
+def get_cls_results(all_scores, all_boxes, batch_inds, im_info):
+    """Return the categorical results."""
+    empty_boxes = np.zeros((0, 5), 'float32')
+    empty_proposals = np.zeros((0, 6), 'float32')
+    cls_boxes, cls_proposals = [[]], []
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
+        scores = all_scores[inds, j]
+        if cfg.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC:
+            boxes = all_boxes[inds]
+        else:
+            boxes = all_boxes[inds, (j - 1) * 4:j * 4]
+        keep = filter_empty_boxes(boxes)
+        if len(keep) == 0:
+            cls_boxes.append(empty_boxes)
+            cls_proposals.append(empty_proposals)
+            continue
+        scores, boxes = scores[keep], boxes[keep]
+        dets = np.hstack((boxes, scores[:, np.newaxis]))
+        dets = dets.astype('float32', copy=False)
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        batch_inds_keep = batch_inds[inds][keep]
+        cls_boxes.append(dets[keep, :])
+        cls_proposals.append(np.hstack((
+            batch_inds_keep,
+            cls_boxes[-1][:, :4] * im_info[batch_inds_keep, 2],
+            np.ones((len(keep), 1)) * (j - 1))).astype('float32'))
+    return cls_boxes, cls_proposals
--- a/seetadet/modules/retinanet.py
+++ b/seetadet/modules/retinanet.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""RetinaNet modules."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon.vm.torch as torch
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.modules.build import InferenceModule
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+@InferenceModule.register('retinanet')
+class RetinaNetInference(InferenceModule):
+    """RetinaNet inference module."""
+    def __init__(self, model):
+        super(RetinaNetInference, self).__init__(model)
+        self.forward_model = self.trace(
+            'forward_eval', lambda self, img, im_info, grid_info:
+            self.forward({'img': img, 'im_info': im_info,
+                          'grid_info': grid_info}))
+    @torch.no_grad()
+    def get_results(self, imgs):
+        """Return the inference results."""
+        results = self.forward_bbox(imgs)
+        img_boxes = []
+        for dets in results:
+            with self.timers['misc'].tic_and_toc():
+                cls_boxes = get_cls_results(dets)
+            img_boxes.append(cls_boxes)
+        return [{'boxes': boxes} for boxes in img_boxes]
+    @torch.no_grad()
+    def forward_data(self, imgs):
+        """Return the inference data."""
+        im_batch, im_shapes, im_scales = [], [], []
+        for img in imgs:
+            scaled_imgs, scales = im_rescale(
+                img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
+            im_batch += scaled_imgs
+            im_scales += scales
+            im_shapes += [x.shape[:2] for x in scaled_imgs]
+        im_batch = blob_vstack(
+            im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
+            size=(cfg.TEST.CROP_SIZE,) * 2,
+            align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
+        im_shapes = np.array(im_shapes)
+        im_scales = np.array(im_scales).reshape((len(im_batch), -1))
+        im_info = np.hstack([im_shapes, im_scales]).astype('float32')
+        strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+        strides = np.array(strides)[:, None]
+        grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
+        grid_shapes = (grid_shapes - 1) // strides + 1
+        grid_info = grid_shapes.astype('int64')
+        return im_batch, im_info, grid_info
+    @torch.no_grad()
+    def forward_bbox(self, imgs):
+        """Run bbox inference."""
+        im_batch, im_info, grid_info = self.forward_data(imgs)
+        self.timers['im_detect'].tic()
+        inputs = {'img': torch.from_numpy(im_batch),
+                  'im_info': torch.from_numpy(im_info),
+                  'grid_info': torch.from_numpy(grid_info)}
+        outputs = self.forward_model(inputs['img'], inputs['im_info'],
+                                     inputs['grid_info'])
+        outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+        ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+        results = [[] for _ in range(ims_per_batch)]
+        batch_inds = outputs['dets'][:, 0:1].astype('int32')
+        for i in range(ims_per_batch * num_scales):
+            index = i // num_scales
+            inds = np.where(batch_inds == i)[0]
+            results[index].append(outputs['dets'][inds, 1:])
+        for index in range(ims_per_batch):
+            try:
+                results[index] = np.vstack(results[index])
+            except ValueError:
+                results[index] = results[index][0]
+        self.timers['im_detect'].toc(n=ims_per_batch)
+        return results
+def get_cls_results(all_dets):
+    """Return the categorical results."""
+    empty_boxes = np.zeros((0, 5), 'float32')
+    cls_boxes = [[]]
+    labels = all_dets[:, 5].astype('int32')
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(labels == j)[0]
+        if len(inds) == 0:
+            cls_boxes.append(empty_boxes)
+            continue
+        dets = all_dets[inds, :5].astype('float32')
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        cls_boxes.append(dets[keep, :])
+    return cls_boxes
--- a/seetadet/modules/ssd.py
+++ b/seetadet/modules/ssd.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""SSD modules."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon.vm.torch as torch
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.modules.build import InferenceModule
+from seetadet.utils.bbox import bbox_transform_inv
+from seetadet.utils.bbox import clip_boxes
+from seetadet.utils.blob import blob_vstack
+from seetadet.utils.image import im_rescale
+from seetadet.utils.nms import nms
+@InferenceModule.register('ssd')
+class SSDInference(InferenceModule):
+    """SSD inference module."""
+    def __init__(self, model):
+        super(SSDInference, self).__init__(model)
+        self.forward_model = self.trace(
+            'forward_eval', lambda self, img:
+            self.forward({'img': img}))
+    @torch.no_grad()
+    def get_results(self, imgs):
+        """Return the inference results."""
+        results = self.forward_bbox(imgs)
+        im_boxes = []
+        for scores, boxes in results:
+            with self.timers['misc'].tic_and_toc():
+                cls_boxes = get_cls_results(scores, boxes)
+            im_boxes.append(cls_boxes)
+        return [{'boxes': boxes} for boxes in im_boxes]
+    @torch.no_grad()
+    def forward_data(self, imgs):
+        """Return the inference data."""
+        im_batch, im_scales = [], []
+        for img in imgs:
+            scaled_imgs, scales = im_rescale(
+                img, scales=cfg.TEST.SCALES, keep_ratio=False)
+            im_batch += scaled_imgs
+            im_scales += scales
+        im_batch = blob_vstack(im_batch, fill_value=cfg.MODEL.PIXEL_MEAN)
+        return im_batch, im_scales
+    @torch.no_grad()
+    def forward_bbox(self, imgs):
+        """Run bbox inference."""
+        im_batch, im_scales = self.forward_data(imgs)
+        self.timers['im_detect'].tic()
+        inputs = {'img': torch.from_numpy(im_batch)}
+        outputs = self.forward_model(inputs['img'])
+        outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
+        anchors = self.model.bbox_head.targets.generator.grid_anchors
+        ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
+        results = [([], []) for _ in range(ims_per_batch)]
+        for i in range(ims_per_batch * num_scales):
+            index = i // num_scales
+            boxes = bbox_transform_inv(
+                anchors, outputs['bbox_pred'][i],
+                weights=cfg.SSD.BBOX_REG_WEIGHTS)
+            boxes[:, 0::2] /= im_scales[i][1]
+            boxes[:, 1::2] /= im_scales[i][0]
+            boxes = clip_boxes(boxes, imgs[index].shape)
+            results[index][0].append(outputs['cls_score'][i])
+            results[index][1].append(boxes)
+        results = [[np.vstack(x) for x in y] for y in results]
+        self.timers['im_detect'].toc(n=ims_per_batch)
+        return results
+def get_cls_results(all_scores, all_boxes):
+    """Return the categorical results."""
+    cls_boxes = [[]]
+    for j in range(1, len(cfg.MODEL.CLASSES)):
+        inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
+        scores, boxes = all_scores[inds, j], all_boxes[inds]
+        inds = np.argsort(-scores)[:cfg.SSD.PRE_NMS_TOPK]
+        scores, boxes = scores[inds], boxes[inds]
+        dets = np.hstack((boxes, scores[:, np.newaxis]))
+        dets = dets.astype('float32', copy=False)
+        keep = nms(dets, cfg.TEST.NMS_THRESH)
+        cls_boxes.append(dets[keep, :])
+    return cls_boxes
--- a/seetadet/ops/__init__.py
+++ b/seetadet/ops/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Operators."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import os
+from seetadet.core.engine.utils import load_library as _load_library
+_load_library(os.path.join(os.path.dirname(__file__), '_C'))
--- a/seetadet/ops/build.py
+++ b/seetadet/ops/build.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Build for ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import nn
+from seetadet.ops.loss import GIoULoss
+from seetadet.ops.loss import L1Loss
+from seetadet.ops.loss import SmoothL1Loss
+from seetadet.ops.loss import SigmoidFocalLoss
+from seetadet.ops.normalization import FrozenBatchNorm2d
+from seetadet.ops.normalization import TransposedLayerNorm
+def build_loss(loss_type, reduction='sum', **kwargs):
+    if isinstance(loss_type, str):
+        loss_type = loss_type.lower()
+        if loss_type != 'smooth_l1':
+            kwargs.pop('beta', None)
+        loss_type = {
+            'l1': L1Loss,
+            'smooth_l1': SmoothL1Loss,
+            'giou': GIoULoss,
+            'cross_entroy': nn.CrossEntropyLoss,
+            'sigmoid_focal': SigmoidFocalLoss,
+        }[loss_type]
+    return loss_type(reduction=reduction, **kwargs)
+def build_norm(dim, norm_type):
+    """Build the normalization module."""
+    if isinstance(norm_type, str):
+        if len(norm_type) == 0:
+            return nn.Identity()
+        norm_type = {
+            'BN': nn.BatchNorm2d,
+            'FrozenBN': FrozenBatchNorm2d,
+            'SyncBN': nn.SyncBatchNorm,
+            'LN': TransposedLayerNorm,
+            'GN': lambda c: nn.GroupNorm(32, c),
+            'Affine': lambda c: FrozenBatchNorm2d(c, affine=True),
+        }[norm_type]
+    return norm_type(dim)
+def build_activation(activation_type, inplace=False):
+    """Build the activation module."""
+    if isinstance(activation_type, str):
+        if len(activation_type) == 0:
+            return nn.Identity()
+        activation_type = getattr(nn, activation_type)
+    activation = activation_type()
+    activation.inplace = inplace
+    return activation
--- a/seetadet/ops/conv.py
+++ b/seetadet/ops/conv.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Convolution ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.torch import nn
+from seetadet.ops.build import build_norm
+class ConvNorm2d(nn.Sequential):
+    """2d convolution followed by norm."""
+    def __init__(
+        self,
+        dim_in,
+        dim_out,
+        kernel_size,
+        stride=1,
+        padding=None,
+        dilation=1,
+        groups=1,
+        bias=True,
+        conv_type='Conv2d',
+        norm_type='',
+        activation_type='',
+        inplace=True,
+    ):
+        super(ConvNorm2d, self).__init__()
+        if padding is None:
+            padding = kernel_size // 2
+        if conv_type == 'Conv2d':
+            layers = [nn.Conv2d(dim_in, dim_out,
+                                kernel_size=kernel_size,
+                                stride=stride,
+                                padding=padding,
+                                dilation=dilation,
+                                groups=groups,
+                                bias=bias and (not norm_type))]
+        elif conv_type == 'SepConv2d':
+            layers = [nn.Conv2d(dim_in, dim_in,
+                                kernel_size=kernel_size,
+                                stride=stride,
+                                padding=padding,
+                                dilation=dilation,
+                                groups=dim_in,
+                                bias=False),
+                      nn.Conv2d(dim_in, dim_out,
+                                kernel_size=1,
+                                bias=bias and (not norm_type))]
+        else:
+            raise ValueError('Unknown conv type: ' + conv_type)
+        if norm_type:
+            layers += [build_norm(dim_out, norm_type)]
+        if activation_type:
+            layers += [getattr(nn, activation_type)()]
+            layers[-1].inplace = inplace
+        for i, layer in enumerate(layers):
+            self.add_module(str(i), layer)
+        self.reset_parameters()
+    def reset_parameters(self):
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                nn.init.kaiming_normal_(
+                    m.weight, mode='fan_out', nonlinearity='relu')
--- a/seetadet/ops/fusion.py
+++ b/seetadet/ops/fusion.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Operator fusions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torch
+from seetadet.core.registry import Registry
+# Pass to fuse adjacent modules.
+FUSIONS = Registry('fusions')
+@FUSIONS.register([
+    'Conv2d+BatchNorm2d',
+    'Conv2d+FrozenBatchNorm2d',
+    'Conv2d+SyncBatchNorm',
+    'ConvTranspose2d+BatchNorm2d',
+    'ConvTranspose2d+FrozenBatchNorm2d',
+    'ConvTranspose2d+SyncBatchNorm',
+    'DepthwiseConv2d+BatchNorm2d',
+    'DepthwiseConv2d+FrozenBatchNorm2d',
+    'DepthwiseConv2d+SyncBatchNorm'])
+def fuse_conv_bn(conv, bn):
+    """Fuse Conv and BatchNorm."""
+    with torch.no_grad():
+        m = bn.running_mean
+        if conv.bias is not None:
+            m.sub_(conv.bias.float())
+        else:
+            delattr(conv, 'bias')
+        bn.forward = lambda x: x
+        t = bn.weight.div((bn.running_var + bn.eps).sqrt_())
+        conv._parameters['bias'] = bn.bias.sub(t * m)
+        t_conv_shape = [1, conv.out_channels] if conv.transposed else [0, 1]
+        t_conv_shape += [1] * len(conv.kernel_size)
+        if conv.weight.dtype == 'float16' and t.dtype == 'float32':
+            conv.bias.half_()
+            weight = conv.weight.float()
+            weight.mul_(t.reshape_(t_conv_shape)).half_()
+            conv.weight.copy_(weight)
+        else:
+            conv.weight.mul_(t.reshape_(t_conv_shape))
+def get_fusion(*modules):
+    """Return the fusion pass between modules."""
+    key = '+'.join(m.__class__.__name__ for m in modules)
+    return key, FUSIONS.try_get(key)
--- a/seetadet/ops/loss.py
+++ b/seetadet/ops/loss.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Loss ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import dragon
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+class GIoULoss(nn.Module):
+    """GIoU loss."""
+    def __init__(self, reduction='sum', delta_weights=None):
+        super(GIoULoss, self).__init__()
+        self.reduction = reduction
+        self.delta_weights = delta_weights
+    def transform_inv(self, boxes, deltas):
+        widths = boxes[:, 2:3] - boxes[:, 0:1]
+        heights = boxes[:, 3:4] - boxes[:, 1:2]
+        ctr_x = boxes[:, 0:1] + 0.5 * widths
+        ctr_y = boxes[:, 1:2] + 0.5 * heights
+        dx, dy, dw, dh = torch.chunk(deltas, chunks=4, dim=1)
+        if self.delta_weights is not None:
+            wx, wy, ww, wh = self.delta_weights
+            dx, dy, dw, dh = dx / wx, dy / wy, dw / ww, dh / wh
+        pred_ctr_x = dx * widths + ctr_x
+        pred_ctr_y = dy * heights + ctr_y
+        pred_w = torch.exp(dw) * widths
+        pred_h = torch.exp(dh) * heights
+        x1 = pred_ctr_x - 0.5 * pred_w
+        y1 = pred_ctr_y - 0.5 * pred_h
+        x2 = pred_ctr_x + 0.5 * pred_w
+        y2 = pred_ctr_y + 0.5 * pred_h
+        return x1, y1, x2, y2
+    def forward_impl(self, input, target, anchor):
+        x1, y1, x2, y2 = self.transform_inv(anchor, input)
+        x1_, y1_, x2_, y2_ = self.transform_inv(anchor, target)
+        # Compute the independent area.
+        pred_area = (x2 - x1) * (y2 - y1)
+        target_area = (x2_ - x1_) * (y2_ - y1_)
+        # Compute the intersecting area.
+        x1_inter = torch.maximum(x1, x1_)
+        y1_inter = torch.maximum(y1, y1_)
+        x2_inter = torch.minimum(x2, x2_)
+        y2_inter = torch.minimum(y2, y2_)
+        w_inter = torch.clamp(x2_inter - x1_inter, min=0)
+        h_inter = torch.clamp(y2_inter - y1_inter, min=0)
+        area_inter = w_inter * h_inter
+        # Compute the enclosing area.
+        x1_enc = torch.minimum(x1, x1_)
+        y1_enc = torch.minimum(y1, y1_)
+        x2_enc = torch.maximum(x2, x2_)
+        y2_enc = torch.maximum(y2, y2_)
+        area_enc = (x2_enc - x1_enc) * (y2_enc - y1_enc) + 1.
+        # Compute the differentiable IoU metric.
+        area_union = pred_area + target_area - area_inter
+        iou = area_inter / (area_union + 1.)
+        iou_metric = iou - (area_enc - area_union) / area_enc
+        # Compute the reduced loss.
+        if self.reduction == 'sum':
+            return (1 - iou_metric).sum()
+        else:
+            return (1 - iou_metric).mean()
+    def forward(self, *inputs, **kwargs):
+        with dragon.variable_scope('IoULossVariable'):
+            return self.forward_impl(*inputs, **kwargs)
+class L1Loss(nn.L1Loss):
+    """L1 loss."""
+    def forward(self, input, target, *args):
+        return super(L1Loss, self).forward(input, target)
+class SigmoidFocalLoss(nn.SigmoidFocalLoss):
+    """Sigmoid focal loss."""
+    def __init__(self, reduction='sum'):
+        super(SigmoidFocalLoss, self).__init__(
+            alpha=cfg.MODEL.FOCAL_LOSS_ALPHA,
+            gamma=cfg.MODEL.FOCAL_LOSS_GAMMA,
+            start_index=1,  # Foreground index
+            reduction=reduction)
+class SmoothL1Loss(nn.SmoothL1Loss):
+    """Smoothed l1 loss."""
+    def forward(self, input, target, *args):
+        return nn.functional.smooth_l1_loss(
+            input, target, beta=self.beta,
+            reduction=self.reduction)
--- a/seetadet/ops/normalization.py
+++ b/seetadet/ops/normalization.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Normalization ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import functools
+from dragon.vm import torch
+from dragon.vm.torch import nn
+from seetadet.core.config import cfg
+from seetadet.core.engine.utils import get_device
+class FrozenBatchNorm2d(nn.Module):
+    """BatchNorm2d where statistics or affine parameters are fixed."""
+    def __init__(self, num_features, eps=1e-5, affine=False, inplace=True):
+        super(FrozenBatchNorm2d, self).__init__()
+        self.num_features = num_features
+        self.eps = eps
+        self.affine = affine
+        self.inplace = inplace and (not affine)
+        if self.affine:
+            self.weight = torch.nn.Parameter(torch.ones(num_features))
+            self.bias = torch.nn.Parameter(torch.zeros(num_features))
+        else:
+            self.register_buffer('weight', torch.ones(num_features))
+            self.register_buffer('bias', torch.zeros(num_features))
+        self.register_buffer('running_mean', torch.zeros(num_features))
+        self.register_buffer('running_var', torch.ones(num_features) - eps)
+    def extra_repr(self):
+        affine_str = '{num_features}, eps={eps}, affine={affine}' \
+                     .format(**self.__dict__)
+        inplace_str = ', inplace' if self.inplace else ''
+        return affine_str + inplace_str
+    def forward(self, input):
+        return nn.functional.affine(
+            input, self.weight, self.bias,
+            dim=1, out=input if self.inplace else None)
+    def _load_from_state_dict(
+        self,
+        state_dict,
+        prefix,
+        strict,
+        missing_keys,
+        unexpected_keys,
+        error_msgs,
+    ):
+        super(FrozenBatchNorm2d, self)._load_from_state_dict(
+            state_dict,
+            prefix,
+            strict,
+            missing_keys,
+            unexpected_keys,
+            error_msgs,
+        )
+        # Fuse the running stats into weight and bias.
+        # Note that this behavior will break the original stats
+        # into zero means and one stds.
+        with torch.no_grad():
+            self.running_var.float_().add_(self.eps).sqrt_()
+            self.weight.float_().div_(self.running_var)
+            self.bias.float_().sub_(self.running_mean.float_() * self.weight)
+            self.running_mean.zero_()
+            self.running_var.one_().sub_(self.eps)
+class TransposedLayerNorm(nn.LayerNorm):
+    """LayerNorm with pre-transposed spatial axes."""
+    def forward(self, input):
+        return nn.functional.layer_norm(
+            input.permute(0, 2, 3, 1), self.normalized_shape,
+            self.weight, self.bias, self.eps).permute(0, 3, 1, 2)
+class L2Norm(nn.Module):
+    """Parameterized L2 normalize."""
+    def __init__(self, num_features, init=20., eps=1e-5):
+        super(L2Norm, self).__init__()
+        self.eps = eps
+        self.weight = nn.Parameter(torch.Tensor(num_features).fill_(init))
+    def forward(self, input):
+        out = nn.functional.normalize(input, p=2, dim=1, eps=self.eps)
+        return nn.functional.affine(out, self.weight, dim=1)
+class ToTensor(nn.Module):
+    """Convert input to tensor."""
+    def __init__(self):
+        super(ToTensor, self).__init__()
+        self.device = torch.device('cpu')
+        self.tensor = torch.ones(1)
+        self.normalize = functools.partial(
+            nn.functional.channel_norm,
+            mean=cfg.MODEL.PIXEL_MEAN,
+            std=cfg.MODEL.PIXEL_STD,
+            dim=1, dims=(0, 3, 1, 2),
+            dtype=cfg.MODEL.PRECISION.lower())
+    def _apply(self, fn):
+        fn(self.tensor)
+    def forward(self, input, normalize=False):
+        if input is None:
+            return input
+        if not isinstance(input, torch.Tensor):
+            input = torch.from_numpy(input)
+        input = input.to(self.tensor.device)
+        if normalize and not input.is_floating_point():
+            input = self.normalize(input)
+        return input
+def to_tensor(input, to_device=True):
+    """Convert input to tensor."""
+    if input is None:
+        return input
+    if not isinstance(input, torch.Tensor):
+        input = torch.from_numpy(input)
+    if to_device:
+        input = input.to(device=get_device(cfg.GPU_ID))
+    return input
--- a/seetadet/ops/onnx.py
+++ b/seetadet/ops/onnx.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm.onnx.core import helper
+from dragon.vm.onnx.core.exporters import utils as export_util
+@export_util.register('RetinaNetDecoder')
+def retinanet_decoder_exporter(op_def, context):
+    node, const_tensors = export_util.translate(**locals())
+    node.op_type = 'ATen'  # Currently not supported in ai.onnx.
+    helper.add_attribute(node, 'op_type', 'RetinaNetDecoder')
+    for arg in op_def.arg:
+        if arg.name == 'strides':
+            helper.add_attribute(node, 'strides', arg.ints)
+        elif arg.name == 'ratios':
+            helper.add_attribute(node, 'ratios', arg.floats)
+        elif arg.name == 'scales':
+            helper.add_attribute(node, 'scales', arg.floats)
+        elif arg.name == 'pre_nms_topk':
+            helper.add_attribute(node, 'pre_nms_topk', arg.i)
+        elif arg.name == 'score_thresh':
+            helper.add_attribute(node, 'score_thresh', arg.f)
+    return node, const_tensors
+@export_util.register('RPNDecoder')
+def rpn_decoder_exporter(op_def, context):
+    node, const_tensors = export_util.translate(**locals())
+    node.op_type = 'ATen'  # Currently not supported in ai.onnx.
+    helper.add_attribute(node, 'op_type', 'RPNDecoder')
+    for arg in op_def.arg:
+        if arg.name == 'strides':
+            helper.add_attribute(node, 'strides', arg.ints)
+        elif arg.name == 'ratios':
+            helper.add_attribute(node, 'ratios', arg.floats)
+        elif arg.name == 'scales':
+            helper.add_attribute(node, 'scales', arg.floats)
+        elif arg.name == 'pre_nms_topk':
+            helper.add_attribute(node, 'pre_nms_topk', arg.i)
+        elif arg.name == 'post_nms_topk':
+            helper.add_attribute(node, 'post_nms_topk', arg.i)
+        elif arg.name == 'nms_thresh':
+            helper.add_attribute(node, 'nms_thresh', arg.f)
+        elif arg.name == 'min_level':
+            helper.add_attribute(node, 'min_level', arg.i)
+        elif arg.name == 'max_level':
+            helper.add_attribute(node, 'max_level', arg.i)
+    return node, const_tensors
--- a/seetadet/ops/vision.py
+++ b/seetadet/ops/vision.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Vision ops."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from dragon.vm import torchvision
+from dragon.vm.torch import nn
+from dragon.vm.torch import autograd
+class RoIPooler(nn.Module):
+    """Resample RoI features into a fixed resolution."""
+    def __init__(self, pooler_type='RoIAlign', resolution=7, sampling_ratio=0):
+        super(RoIPooler, self).__init__()
+        if not isinstance(resolution, (tuple, list)):
+            resolution = (resolution, resolution)
+        self.pooler_type = pooler_type
+        self.resolution = resolution
+        self.sampling_ratio = sampling_ratio
+    def forward(self, input, boxes, spatial_scale=1.0):
+        if self.pooler_type == 'RoIPool':
+            return torchvision.ops.roi_pool(
+                input, boxes,
+                output_size=self.resolution,
+                spatial_scale=spatial_scale)
+        elif self.pooler_type == 'RoIAlign':
+            return torchvision.ops.roi_align(
+                input, boxes,
+                output_size=self.resolution,
+                spatial_scale=spatial_scale,
+                sampling_ratio=self.sampling_ratio,
+                aligned=False)
+        elif self.pooler_type == 'RoIAlignV2':
+            return torchvision.ops.roi_align(
+                input, boxes,
+                output_size=self.resolution,
+                spatial_scale=spatial_scale,
+                sampling_ratio=self.sampling_ratio,
+                aligned=True)
+        else:
+            raise NotImplementedError
+class NonMaxSuppression(object):
+    """Filter out boxes that have high IoU with selected ones."""
+    @staticmethod
+    def apply(input, iou_threshold=0.5):
+        return autograd.Function.apply(
+            'NonMaxSuppression', input.device, [input],
+            iou_threshold=float(iou_threshold))
+    autograd.Function.register(
+        'NonMaxSuppression', lambda **kwargs: {
+            'iou_threshold': kwargs.get('iou_threshold', 0.5),
+        })
+class PasteMask(object):
+    """Paste a set of masks on an image."""
+    @staticmethod
+    def apply(masks, boxes, output_size, mask_threshold=0.5):
+        if not isinstance(output_size, (tuple, list)):
+            output_size = (output_size, output_size)
+        return autograd.Function.apply(
+            'PasteMask', masks.device, [masks, boxes],
+            mask_threshold=float(mask_threshold),
+            num_sizes=len(output_size), sizes=output_size)
+    autograd.Function.register(
+        'PasteMask', lambda **kwargs: {
+            'mask_threshold': kwargs.get('mask_threshold', 0.5),
+            'sizes_desc': 'int64',
+        })
--- a/seetadet/utils/__init__.py
+++ b/seetadet/utils/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/seetadet/utils/bbox/__init__.py
+++ b/seetadet/utils/bbox/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Bounding-Box utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.bbox.helper import clip_boxes
+from seetadet.utils.bbox.helper import clip_tiled_boxes
+from seetadet.utils.bbox.helper import distribute_boxes
+from seetadet.utils.bbox.helper import filter_empty_boxes
+from seetadet.utils.bbox.helper import flip_boxes
+from seetadet.utils.bbox.metrics import bbox_overlaps
+from seetadet.utils.bbox.metrics import bbox_centerness
+from seetadet.utils.bbox.transforms import bbox_transform
+from seetadet.utils.bbox.transforms import bbox_transform_inv
--- a/seetadet/utils/bbox/helper.py
+++ b/seetadet/utils/bbox/helper.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Helper functions for bounding box."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+def clip_boxes(boxes, im_shape):
+    """Clip the boxes."""
+    xmax, ymax = im_shape[1], im_shape[0]
+    boxes[:, (0, 2)] = np.maximum(np.minimum(boxes[:, (0, 2)], xmax), 0)
+    boxes[:, (1, 3)] = np.maximum(np.minimum(boxes[:, (1, 3)], ymax), 0)
+    return boxes
+def clip_tiled_boxes(boxes, im_shape):
+    """Clip the tiled boxes."""
+    xmax, ymax = im_shape[1], im_shape[0]
+    boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], xmax), 0)
+    boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], ymax), 0)
+    boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], xmax), 0)
+    boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], ymax), 0)
+    return boxes
+def flip_boxes(boxes, width):
+    """Flip the boxes horizontally."""
+    boxes_flipped = boxes.copy()
+    boxes_flipped[:, 0] = width - boxes[:, 2]
+    boxes_flipped[:, 2] = width - boxes[:, 0]
+    return boxes_flipped
+def filter_empty_boxes(boxes):
+    """Return the indices of non-empty boxes."""
+    ws = boxes[:, 2] - boxes[:, 0]
+    hs = boxes[:, 3] - boxes[:, 1]
+    return np.where((ws > 0) & (hs > 0))[0]
+def distribute_boxes(boxes, lvl_min, lvl_max):
+    """Return the fpn level of boxes."""
+    if len(boxes) == 0:
+        return []
+    ws = boxes[:, 2] - boxes[:, 0]
+    hs = boxes[:, 3] - boxes[:, 1]
+    s = np.sqrt(ws * hs)
+    s0 = 224  # default: 224
+    lvl0 = 4  # default: 4
+    lvls = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
+    return np.clip(lvls, lvl_min, lvl_max)
--- a/seetadet/utils/bbox/metrics.py
+++ b/seetadet/utils/bbox/metrics.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Bounding-Box metrics."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.bbox import cython_bbox
+import numpy as np
+def bbox_overlaps(boxes1, boxes2):
+    """Return the overlaps between two group of boxes."""
+    boxes1 = np.ascontiguousarray(boxes1, dtype=np.float)
+    boxes2 = np.ascontiguousarray(boxes2, dtype=np.float)
+    return cython_bbox.bbox_overlaps(boxes1, boxes2)
+def bbox_centerness(boxes1, boxes2):
+    """Return centerness between two group of boxes."""
+    ctr_x = (boxes1[:, 2] + boxes1[:, 0]) / 2
+    ctr_y = (boxes1[:, 3] + boxes1[:, 1]) / 2
+    l = ctr_x - boxes2[:, 0]
+    t = ctr_y - boxes2[:, 1]
+    r = boxes2[:, 2] - ctr_x
+    b = boxes2[:, 3] - ctr_y
+    centerness = ((np.minimum(l, r) / np.maximum(l, r)) *
+                  (np.minimum(t, b) / np.maximum(t, b)))
+    min_dist = np.stack([l, t, r, b], axis=1).min(axis=1)
+    keep_inds = np.where(min_dist > 0.01)[0]
+    discard_inds = np.where(min_dist <= 0.01)[0]
+    centerness[keep_inds] = np.sqrt(centerness[keep_inds])
+    centerness[discard_inds] = -1
+    return centerness, keep_inds, discard_inds
+def boxes_area(boxes):
+    """Return the area of boxes."""
+    return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
--- a/seetadet/utils/bbox/transforms.py
+++ b/seetadet/utils/bbox/transforms.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Bounding-Box transforms."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+_DEFAULT_SCALE_CLIP = np.log(1000.0 / 16.0)
+def bbox_transform(src_boxes, tgt_boxes, weights=(1., 1., 1., 1.)):
+    """Return the bbox transformation deltas."""
+    src_widths = src_boxes[:, 2] - src_boxes[:, 0]
+    src_heights = src_boxes[:, 3] - src_boxes[:, 1]
+    src_ctr_x = src_boxes[:, 0] + 0.5 * src_widths
+    src_ctr_y = src_boxes[:, 1] + 0.5 * src_heights
+    tgt_widths = tgt_boxes[:, 2] - tgt_boxes[:, 0]
+    tgt_heights = tgt_boxes[:, 3] - tgt_boxes[:, 1]
+    tgt_ctr_x = tgt_boxes[:, 0] + 0.5 * tgt_widths
+    tgt_ctr_y = tgt_boxes[:, 1] + 0.5 * tgt_heights
+    (wx, wy, ww, wh), deltas = weights, []
+    deltas += [wx * (tgt_ctr_x - src_ctr_x) / src_widths]
+    deltas += [wy * (tgt_ctr_y - src_ctr_y) / src_heights]
+    deltas += [ww * np.log(tgt_widths / src_widths)]
+    deltas += [wh * np.log(tgt_heights / src_heights)]
+    return np.vstack(deltas).transpose()
+def bbox_transform_inv(boxes, deltas, weights=(1., 1., 1., 1.)):
+    """Return the boxes transformed from deltas."""
+    if boxes.shape[0] == 0:
+        return np.zeros((0, deltas.shape[1]), deltas.dtype)
+    boxes = boxes.astype(deltas.dtype, copy=False)
+    widths = boxes[:, 2] - boxes[:, 0]
+    heights = boxes[:, 3] - boxes[:, 1]
+    ctr_x = boxes[:, 0] + 0.5 * widths
+    ctr_y = boxes[:, 1] + 0.5 * heights
+    wx, wy, ww, wh = weights
+    dx = deltas[:, 0::4] / wx
+    dy = deltas[:, 1::4] / wy
+    dw = deltas[:, 2::4] / ww
+    dh = deltas[:, 3::4] / wh
+    dw = np.minimum(dw, _DEFAULT_SCALE_CLIP)
+    dh = np.minimum(dh, _DEFAULT_SCALE_CLIP)
+    pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
+    pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
+    pred_w = np.exp(dw) * widths[:, np.newaxis]
+    pred_h = np.exp(dh) * heights[:, np.newaxis]
+    pred_boxes = np.zeros(deltas.shape, deltas.dtype)
+    pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
+    pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
+    pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
+    pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
+    return pred_boxes
--- a/seetadet/utils/blob.py
+++ b/seetadet/utils/blob.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Blob utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+def blob_vstack(arrays, fill_value=None, dtype=None, size=None, align=None):
+    """Stack arrays in sequence vertically."""
+    if fill_value is None:
+        return np.vstack(arrays)
+    # Compute the max stack shape.
+    max_shape = np.max(np.stack([arr.shape for arr in arrays]), 0)
+    if size is not None and min(size) > 0:
+        max_shape[:len(size)] = size
+    if align is not None and min(align) > 0:
+        align_size = np.ceil(max_shape[:len(align)] / align)
+        max_shape[:len(align)] = align_size.astype('int64') * align
+    # Fill output with the given value.
+    output_dtype = dtype or arrays[0].dtype
+    output_shape = [len(arrays)] + list(max_shape)
+    output = np.empty(output_shape, output_dtype)
+    output[:] = fill_value
+    # Copy arrays.
+    for i, arr in enumerate(arrays):
+        copy_slices = (slice(0, d) for d in arr.shape)
+        output[(i,) + tuple(copy_slices)] = arr
+    return output
--- a/seetadet/utils/image.py
+++ b/seetadet/utils/image.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Image utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import PIL.Image
+import PIL.ImageEnhance
+def im_resize(img, size=None, scale=None, mode='linear'):
+    """Resize image by the scale or size."""
+    if size is None:
+        if not isinstance(scale, (tuple, list)):
+            scale = (scale, scale)
+        h, w = img.shape[:2]
+        size = int(h * scale[0] + .5), int(w * scale[1] + .5)
+    else:
+        if not isinstance(size, (tuple, list)):
+            size = (size, size)
+    mode = {'linear': PIL.Image.BILINEAR,
+            'nearest': PIL.Image.NEAREST}[mode]
+    img = PIL.Image.fromarray(img)
+    return np.array(img.resize(size[::-1], mode))
+def im_rescale(img, scales, max_size=0, keep_ratio=True):
+    """Rescale image to match the detecting scales."""
+    im_shape = img.shape
+    img_list, img_scales = [], []
+    if keep_ratio:
+        size_min = np.min(im_shape[:2])
+        size_max = np.max(im_shape[:2])
+        for target_size in scales:
+            im_scale = float(target_size) / float(size_min)
+            target_size_max = max_size if max_size > 0 else target_size
+            if np.round(im_scale * size_max) > target_size_max:
+                im_scale = float(target_size_max) / float(size_max)
+            img_list.append(im_resize(img, scale=im_scale))
+            img_scales.append((im_scale, im_scale))
+    else:
+        for target_size in scales:
+            h_scale = float(target_size) / im_shape[0]
+            w_scale = float(target_size) / im_shape[1]
+            img_list.append(im_resize(img, size=target_size))
+            img_scales.append((h_scale, w_scale))
+    return img_list, img_scales
+def color_jitter(img, brightness=None, contrast=None, saturation=None):
+    """Distort the color of image."""
+    def add_transform(transforms, type, range):
+        if range is not None:
+            if not isinstance(range, (tuple, list)):
+                range = (1. - range, 1. + range)
+            transforms.append((type, range))
+    transforms = []
+    contrast_first = np.random.rand() < 0.5
+    add_transform(transforms, PIL.ImageEnhance.Brightness, brightness)
+    if contrast_first:
+        add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
+    add_transform(transforms, PIL.ImageEnhance.Color, saturation)
+    if not contrast_first:
+        add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
+    for transform, jitter_range in transforms:
+        if isinstance(img, np.ndarray):
+            img = PIL.Image.fromarray(img)
+        img = transform(img)
+        img = img.enhance(np.random.uniform(*jitter_range))
+    return np.asarray(img)
--- a/seetadet/utils/logging.py
+++ b/seetadet/utils/logging.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Logging utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import inspect
+import logging as _logging
+import os
+import sys as _sys
+import threading
+_logger = None
+_logger_lock = threading.Lock()
+def get_logger():
+    global _logger
+    # Use double-checked locking to avoid taking lock unnecessarily.
+    if _logger:
+        return _logger
+    _logger_lock.acquire()
+    try:
+        if _logger:
+            return _logger
+        logger = _logging.getLogger('seetadet')
+        logger.setLevel('INFO')
+        logger.propagate = False
+        logger._is_root = True
+        if True:
+            # Determine whether we are in an interactive environment.
+            _interactive = False
+            try:
+                # This is only defined in interactive shells.
+                if _sys.ps1:
+                    _interactive = True
+            except AttributeError:
+                # Even now, we may be in an interactive shell with `python -i`.
+                _interactive = _sys.flags.interactive
+            # If we are in an interactive environment (like Jupyter), set loglevel
+            # to INFO and pipe the output to stdout.
+            if _interactive:
+                logger.setLevel('INFO')
+                _logging_target = _sys.stdout
+            else:
+                _logging_target = _sys.stderr
+            # Add the output handler.
+            _handler = _logging.StreamHandler(_logging_target)
+            _handler.setFormatter(_logging.Formatter('%(levelname)s %(message)s'))
+            logger.addHandler(_handler)
+        _logger = logger
+        return _logger
+    finally:
+        _logger_lock.release()
+def _detailed_msg(msg):
+    file, lineno = inspect.stack()[:3][2][1:3]
+    return "{}:{}] {}".format(os.path.split(file)[-1], lineno, msg)
+def log(level, msg, *args, **kwargs):
+    get_logger().log(level, _detailed_msg(msg), *args, **kwargs)
+def debug(msg, *args, **kwargs):
+    if is_root():
+        get_logger().debug(_detailed_msg(msg), *args, **kwargs)
+def error(msg, *args, **kwargs):
+    get_logger().error(_detailed_msg(msg), *args, **kwargs)
+    assert 0
+def fatal(msg, *args, **kwargs):
+    get_logger().fatal(_detailed_msg(msg), *args, **kwargs)
+    assert 0
+def info(msg, *args, **kwargs):
+    if is_root():
+        get_logger().info(_detailed_msg(msg), *args, **kwargs)
+def warning(msg, *args, **kwargs):
+    if is_root():
+        get_logger().warning(_detailed_msg(msg), *args, **kwargs)
+def get_verbosity():
+    """Return how much logging output will be produced."""
+    return get_logger().getEffectiveLevel()
+def set_verbosity(v):
+    """Set the threshold for what messages will be logged."""
+    get_logger().setLevel(v)
+def set_formatter(fmt=None, datefmt=None):
+    """Set the formatter."""
+    handler = _logging.StreamHandler(_sys.stderr)
+    handler.setFormatter(_logging.Formatter(fmt, datefmt))
+    logger = get_logger()
+    logger.removeHandler(logger.handlers[0])
+    logger.addHandler(handler)
+def set_root(is_root=True):
+    """Set logger to the root."""
+    get_logger()._is_root = is_root
+def is_root():
+    """Return logger is the root."""
+    return get_logger()._is_root
--- a/seetadet/utils/mask/__init__.py
+++ b/seetadet/utils/mask/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.mask.helper import encode_masks
+from seetadet.utils.mask.helper import mask_from
+from seetadet.utils.mask.helper import mask_to_polygons
+from seetadet.utils.mask.helper import paste_masks
+from seetadet.utils.mask.metrics import mask_overlap
--- a/seetadet/utils/mask/helper.py
+++ b/seetadet/utils/mask/helper.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Helper functions for mask."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import copy
+import cv2
+import numpy as np
+from pycocotools.mask import decode
+from pycocotools.mask import encode
+from pycocotools.mask import merge
+from pycocotools.mask import frPyObjects
+from seetadet.ops.normalization import to_tensor
+from seetadet.ops.vision import PasteMask
+from seetadet.utils.image import im_resize
+def mask_from_buffer(buffer, size, box=None):
+    """Return a binary mask from the buffer."""
+    if not isinstance(size, (tuple, list)):
+        size = (size, size)
+    rles = [{'counts': buffer, 'size': size}]
+    mask = decode(rles)
+    if mask.shape[2] != 1:
+        raise ValueError('Mask contains {} instances. '
+                         'Merge them before compressing.'
+                         .format(mask.shape[2]))
+    mask = mask[:, :, 0]
+    if box is not None:
+        box = np.round(box).astype('int64')
+        mask = mask[box[1]:box[3], box[0]:box[2]]
+    return mask
+def mask_from_polygons(polygons, size, box=None):
+    """Return a binary mask from the polygons."""
+    if not isinstance(size, (tuple, list)):
+        size = (size, size)
+    if box is not None:
+        polygons = copy.deepcopy(polygons)
+        w, h = box[2] - box[0], box[3] - box[1]
+        ratio_h = size[0] / max(h, 0.1)
+        ratio_w = size[1] / max(w, 0.1)
+        for p in polygons:
+            p[0::2] = p[0::2] - box[0]
+            p[1::2] = p[1::2] - box[1]
+        if ratio_h == ratio_w:
+            for p in polygons:
+                p *= ratio_h
+        else:
+            for p in polygons:
+                p[0::2] *= ratio_w
+                p[1::2] *= ratio_h
+    rles = frPyObjects(polygons, size[0], size[1])
+    return decode(merge(rles))
+def mask_from_bitmap(bitmap, size, box=None):
+    """Return a binary mask from the bitmap."""
+    if not isinstance(size, (tuple, list)):
+        size = (size, size)
+    if box is not None:
+        box = np.round(box).astype('int64')
+        bitmap = bitmap[box[1]:box[3], box[0]:box[2]]
+    return im_resize(bitmap, size, mode='nearest')
+def mask_from(segm, size, box=None):
+    """Return a binary mask from the segmentation object."""
+    if segm is None:
+        return None
+    elif isinstance(segm, list):
+        return mask_from_polygons(segm, size, box)
+    elif isinstance(segm, np.ndarray):
+        return mask_from_bitmap(segm, size, box)
+    elif isinstance(segm, bytes):
+        return mask_from_buffer(segm, size, box)
+    else:
+        raise TypeError('Unknown segmentation type: ' + type(segm))
+def mask_to_polygons(mask):
+    """Convert a binary mask to a set of polygons."""
+    mask = np.ascontiguousarray(mask)
+    res = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
+    hierarchy = res[-1]
+    if hierarchy is None:
+        return []
+    contours = res[-2]
+    polygons = [x.flatten() for x in contours]
+    polygons = [x + 0.5 for x in polygons if len(x) >= 6]
+    return polygons
+def encode_masks(masks):
+    """Encode a set of masks to RLEs."""
+    rles = encode(np.asfortranarray(masks))
+    for rle in rles:
+        rle['counts'] = rle['counts'].decode()
+    return rles
+def paste_masks(masks, boxes, img_size, threshold=0.5, channels_last=True):
+    """Paste a set of masks on an image by resample."""
+    masks, boxes = to_tensor(masks), to_tensor(boxes[:, :4])
+    img_masks = PasteMask.apply(masks, boxes, img_size, threshold)
+    img_masks = img_masks.numpy().copy()
+    return img_masks.transpose((1, 2, 0)) if channels_last else img_masks
+def paste_masks_old(masks, boxes, img_size, thresh=0.5):
+    """Paste a set of masks on an image by resize."""
+    def scale_boxes(boxes, scale_factor=1.):
+        """Scale the boxes."""
+        w = (boxes[:, 2] - boxes[:, 0]) * 0.5 * scale_factor
+        h = (boxes[:, 3] - boxes[:, 1]) * 0.5 * scale_factor
+        x_ctr = (boxes[:, 2] + boxes[:, 0]) * 0.5
+        y_ctr = (boxes[:, 3] + boxes[:, 1]) * 0.5
+        boxes_scaled = np.zeros(boxes.shape)
+        boxes_scaled[:, 0], boxes_scaled[:, 1] = x_ctr - w, y_ctr - h
+        boxes_scaled[:, 2], boxes_scaled[:, 3] = x_ctr + w, y_ctr + h
+        return boxes_scaled
+    num_boxes = boxes.shape[0]
+    assert masks.shape[0] == num_boxes
+    img_shape = list(img_size) + [num_boxes]
+    output = np.zeros(img_shape, 'uint8')
+    size = masks[0].shape[0]
+    scale_factor = (size + 2.) / size
+    boxes = scale_boxes(boxes, scale_factor).astype(np.int32)
+    padded_mask = np.zeros((size + 2, size + 2), 'float32')
+    for i in range(num_boxes):
+        box, mask = boxes[i, :4], masks[i]
+        padded_mask[1:-1, 1:-1] = mask[:, :]
+        w = max(box[2] - box[0], 1)
+        h = max(box[3] - box[1], 1)
+        mask = cv2.resize(padded_mask, (w, h))
+        mask = np.array(mask > thresh, 'uint8')
+        x1, y1 = max(box[0], 0), max(box[1], 0)
+        x2, y2 = min(box[2], img_size[1]), min(box[3], img_size[0])
+        mask = mask[y1 - box[1]:y2 - box[1], x1 - box[0]:x2 - box[0]]
+        output[y1:y2, x1:x2, i] = mask
+    return output
--- a/seetadet/utils/mask/metrics.py
+++ b/seetadet/utils/mask/metrics.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Mask metrics."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+def mask_overlap(box1, box2, mask1, mask2):
+    """Compute the overlap of two masks."""
+    x1 = max(box1[0], box2[0])
+    y1 = max(box1[1], box2[1])
+    x2 = min(box1[2], box2[2])
+    y2 = min(box1[3], box2[3])
+    if x1 > x2 or y1 > y2:
+        return 0
+    w = x2 - x1
+    h = y2 - y1
+    # Get masks in the intersection part.
+    start_ya = y1 - box1[1]
+    start_xa = x1 - box1[0]
+    inter_mask_a = mask1[start_ya: start_ya + h, start_xa:start_xa + w]
+    start_yb = y1 - box2[1]
+    start_xb = x1 - box2[0]
+    inter_mask_b = mask2[start_yb: start_yb + h, start_xb:start_xb + w]
+    assert inter_mask_a.shape == inter_mask_b.shape
+    inter = np.logical_and(inter_mask_b, inter_mask_a).sum()
+    union = mask1.sum() + mask2.sum() - inter
+    if union < 1.:
+        return 0.
+    return float(inter) / float(union)
--- a/seetadet/utils/nms/__init__.py
+++ b/seetadet/utils/nms/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Non-Maximum Suppression utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.nms.helper import gpu_nms
+from seetadet.utils.nms.helper import nms
+from seetadet.utils.nms.helper import soft_nms
--- a/seetadet/utils/nms/helper.py
+++ b/seetadet/utils/nms/helper.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Helper functions of Non-Maximum Suppression."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.ops.normalization import to_tensor
+from seetadet.ops.vision import NonMaxSuppression
+try:
+    from seetadet.utils.nms.cython_nms import cpu_nms
+    from seetadet.utils.nms.cython_nms import cpu_soft_nms
+except ImportError:
+    cpu_nms = cpu_soft_nms = print
+def gpu_nms(dets, thresh):
+    """Filter out the dets using GPU - NMS."""
+    if dets.shape[0] == 0:
+        return []
+    scores = dets[:, 4]
+    order = scores.argsort()[::-1]
+    sorted_dets = to_tensor(dets[order, :])
+    keep = NonMaxSuppression.apply(sorted_dets, iou_threshold=thresh)
+    return order[keep.numpy()]
+def nms(dets, thresh):
+    """Filter out the dets using NMS."""
+    if dets.shape[0] == 0:
+        return []
+    if cpu_nms is print:
+        raise ImportError('Failed to load <cython_nms> library.')
+    return cpu_nms(dets, thresh)
+def soft_nms(dets, thresh, method='linear', sigma=0.5, score_thresh=0.001):
+    """Filter out the dets using Soft - NMS."""
+    if dets.shape[0] == 0:
+        return []
+    if cpu_soft_nms is print:
+        raise ImportError('Failed to load <cython_nms> library.')
+    methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
+    if method not in methods:
+        raise ValueError('Unknown soft nms method: ' + method)
+    return cpu_soft_nms(dets, thresh, methods[method], sigma, score_thresh)
--- a/seetadet/utils/polygon/__init__.py
+++ b/seetadet/utils/polygon/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Polygon utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.polygon.helper import crop_polygons
+from seetadet.utils.polygon.helper import flip_polygons
--- a/seetadet/utils/polygon/helper.py
+++ b/seetadet/utils/polygon/helper.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Helper functions for polygon."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+import shapely.geometry as geometry
+def flip_polygons(polygons, width):
+    """Flip the polygons horizontally."""
+    for i, p in enumerate(polygons):
+        p_flipped = p.copy()
+        p_flipped[0::2] = width - p[0::2]
+        polygons[i] = p_flipped
+    return polygons
+def crop_polygons(polygons, crop_box):
+    """Crop the polygons."""
+    x, y = crop_box[:2]
+    crop_box = geometry.box(*crop_box).buffer(0.0)
+    crop_polygons = []
+    for p in polygons:
+        p = p.reshape((-1, 2))
+        p = geometry.Polygon(p).buffer(0.0)
+        if not p.is_valid:
+            continue
+        cropped = p.intersection(crop_box)
+        if cropped.is_empty:
+            continue
+        cropped = getattr(cropped, 'geoms', [cropped])
+        for new_p in cropped:
+            if not isinstance(new_p, geometry.Polygon) or not new_p.is_valid:
+                continue
+            coords = np.asarray(new_p.exterior.coords)[:-1]
+            coords[:, 0] -= x
+            coords[:, 1] -= y
+            crop_polygons.append(coords.flatten())
+    return crop_polygons
--- a/seetadet/utils/profiler/__init__.py
+++ b/seetadet/utils/profiler/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Profiler utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.profiler.stats import SmoothedValue
+from seetadet.utils.profiler.timer import Timer
+from seetadet.utils.profiler.timer import get_progress
--- a/seetadet/utils/profiler/stats.py
+++ b/seetadet/utils/profiler/stats.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Trackable statistics."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import collections
+import numpy as np
+class SmoothedValue(object):
+    """Track values and provide smoothed report."""
+    def __init__(self, window_size=None):
+        self.deque = collections.deque(maxlen=window_size)
+        self.total = 0.0
+        self.count = 0
+    def update(self, value):
+        self.deque.append(value)
+        self.count += 1
+        self.total += value
+    def mean(self):
+        return np.mean(self.deque)
+    def median(self):
+        return np.median(self.deque)
+    def average(self):
+        return self.total / self.count
+class ExponentialMovingAverage(object):
+    """Track values and provide EMA report."""
+    def __init__(self, decay=0.9):
+        self.value = None
+        self.decay = decay
+        self.total = 0.0
+        self.count = 0
+    def update(self, value):
+        if self.value is None:
+            self.value = value
+        else:
+            self.value = (self.decay * self.value +
+                          (1.0 - self.decay) * value)
+        self.total += value
+        self.count += 1
+    def global_average(self):
+        return self.total / self.count
+    def running_average(self):
+        return float(self.value)
+    def __float__(self):
+        return self.running_average()
--- a/seetadet/utils/profiler/timer.py
+++ b/seetadet/utils/profiler/timer.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Timing functions."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import contextlib
+import datetime
+import time
+class Timer(object):
+    """Simple timer."""
+    def __init__(self):
+        self.total_time = 0.
+        self.calls = 0
+        self.start_time = 0.
+        self.diff = 0.
+        self.average_time = 0.
+    def add_diff(self, diff, n=1, average=True):
+        self.total_time += diff
+        self.calls += n
+        self.average_time = self.total_time / self.calls
+        return self.average_time if average else self.diff
+    @contextlib.contextmanager
+    def tic_and_toc(self, n=1, average=True):
+        try:
+            yield self.tic()
+        finally:
+            self.toc(n, average)
+    def tic(self):
+        self.start_time = time.time()
+        return self
+    def toc(self, n=1, average=True):
+        self.diff = time.time() - self.start_time
+        return self.add_diff(self.diff, n, average)
+def get_progress(timer, step, max_steps):
+    """Return the progress information."""
+    eta_seconds = timer.average_time * (max_steps - step)
+    eta = str(datetime.timedelta(seconds=int(eta_seconds)))
+    progress = (step + 1.) / max_steps
+    return ('< PROGRESS: {:.2%} | SPEED: {:.3f}s / iter | ETA: {} >'
+            .format(progress, timer.average_time, eta))
--- a/seetadet/utils/vis/__init__.py
+++ b/seetadet/utils/vis/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Visualization utilities."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+from seetadet.utils.vis.colormap import colormap
+from seetadet.utils.vis.visualizer import Visualizer
--- a/seetadet/utils/vis/colormap.py
+++ b/seetadet/utils/vis/colormap.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Colormap for really neat visualizations."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import numpy as np
+def colormap(rgb=False):
+    color_list = np.array([
+        0.000, 0.447, 0.741,
+        0.850, 0.325, 0.098,
+        0.929, 0.694, 0.125,
+        0.494, 0.184, 0.556,
+        0.466, 0.674, 0.188,
+        0.301, 0.745, 0.933,
+        0.635, 0.078, 0.184,
+        0.300, 0.300, 0.300,
+        0.600, 0.600, 0.600,
+        1.000, 0.000, 0.000,
+        1.000, 0.500, 0.000,
+        0.749, 0.749, 0.000,
+        0.000, 1.000, 0.000,
+        0.000, 0.000, 1.000,
+        0.667, 0.000, 1.000,
+        0.333, 0.333, 0.000,
+        0.333, 0.667, 0.000,
+        0.333, 1.000, 0.000,
+        0.667, 0.333, 0.000,
+        0.667, 0.667, 0.000,
+        0.667, 1.000, 0.000,
+        1.000, 0.333, 0.000,
+        1.000, 0.667, 0.000,
+        1.000, 1.000, 0.000,
+        0.000, 0.333, 0.500,
+        0.000, 0.667, 0.500,
+        0.000, 1.000, 0.500,
+        0.333, 0.000, 0.500,
+        0.333, 0.333, 0.500,
+        0.333, 0.667, 0.500,
+        0.333, 1.000, 0.500,
+        0.667, 0.000, 0.500,
+        0.667, 0.333, 0.500,
+        0.667, 0.667, 0.500,
+        0.667, 1.000, 0.500,
+        1.000, 0.000, 0.500,
+        1.000, 0.333, 0.500,
+        1.000, 0.667, 0.500,
+        1.000, 1.000, 0.500,
+        0.000, 0.333, 1.000,
+        0.000, 0.667, 1.000,
+        0.000, 1.000, 1.000,
+        0.333, 0.000, 1.000,
+        0.333, 0.333, 1.000,
+        0.333, 0.667, 1.000,
+        0.333, 1.000, 1.000,
+        0.667, 0.000, 1.000,
+        0.667, 0.333, 1.000,
+        0.667, 0.667, 1.000,
+        0.667, 1.000, 1.000,
+        1.000, 0.000, 1.000,
+        1.000, 0.333, 1.000,
+        1.000, 0.667, 1.000,
+        0.167, 0.000, 0.000,
+        0.333, 0.000, 0.000,
+        0.500, 0.000, 0.000,
+        0.667, 0.000, 0.000,
+        0.833, 0.000, 0.000,
+        1.000, 0.000, 0.000,
+        0.000, 0.167, 0.000,
+        0.000, 0.333, 0.000,
+        0.000, 0.500, 0.000,
+        0.000, 0.667, 0.000,
+        0.000, 0.833, 0.000,
+        0.000, 1.000, 0.000,
+        0.000, 0.000, 0.167,
+        0.000, 0.000, 0.333,
+        0.000, 0.000, 0.500,
+        0.000, 0.000, 0.667,
+        0.000, 0.000, 0.833,
+        0.000, 0.000, 1.000,
+        0.000, 0.000, 0.000,
+        0.143, 0.143, 0.143,
+        0.286, 0.286, 0.286,
+        0.429, 0.429, 0.429,
+        0.571, 0.571, 0.571,
+        0.714, 0.714, 0.714,
+        0.857, 0.857, 0.857,
+        1.000, 1.000, 1.000]).astype(np.float32)
+    color_list = color_list.reshape((-1, 3)) * 255
+    if not rgb:
+        color_list = color_list[:, ::-1]
+    return color_list
--- a/seetadet/utils/vis/visualizer.py
+++ b/seetadet/utils/vis/visualizer.py
+# ------------------------------------------------------------
+# Copyright (c) Facebook, Inc. and its affiliates.
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# Codes are based on:
+#
+#     <https://github.com/facebookresearch/detectron2/blob/main/detectron2/utils/visualizer.py>
+#
+# ------------------------------------------------------------
+"""Visualizer."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import cv2
+import matplotlib.backends.backend_agg
+import matplotlib.colors
+import matplotlib.figure
+import matplotlib.patches
+import matplotlib.pyplot
+import numpy as np
+from seetadet.utils.mask import mask_from
+from seetadet.utils.mask import mask_to_polygons
+from seetadet.utils.mask import paste_masks
+from seetadet.utils.vis.colormap import colormap
+_SMALL_OBJECT_AREA_THRESH = 1000
+class VisImage(object):
+    """VisImage."""
+    def __init__(self, img, scale=1.0):
+        self.img = img
+        self.scale = scale
+        self.shape = (h, w) = img.shape[:2]
+        self.font_size = max(np.sqrt(h * w) // 90, 10 // scale)
+        self._setup_figure(img)
+    def _setup_figure(self, img):
+        fig = matplotlib.figure.Figure(frameon=False)
+        self.dpi = fig.get_dpi()
+        fig.set_size_inches((self.shape[1] * self.scale + 1e-2) / self.dpi,
+                            (self.shape[0] * self.scale + 1e-2) / self.dpi)
+        self.canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(fig)
+        ax = fig.add_axes([0.0, 0.0, 1.0, 1.0])
+        ax.axis('off')
+        self.fig = fig
+        self.ax = ax
+        self.ax.imshow(img)
+    def save(self, filepath):
+        cv2.imwrite(filepath, self.get_image())
+    def get_image(self, rgb=False):
+        canvas = self.canvas
+        s, (width, height) = canvas.print_to_buffer()
+        buffer = np.frombuffer(s, dtype='uint8')
+        img_rgba = buffer.reshape(height, width, 4)
+        img_rgb, _ = np.split(img_rgba, [3], axis=2)
+        img_rgb = img_rgb.astype('uint8', copy=False)
+        return img_rgb if rgb else img_rgb[:, :, ::-1]
+class Visualizer(object):
+    """"Visualizer."""
+    def __init__(self, class_names=None, score_thresh=0.7):
+        self.class_names = class_names
+        self.score_thresh = score_thresh
+        self.colormap = colormap(rgb=True) / 255.
+        self.output = None
+    def _convert_from_dict_format(self, objects):
+        boxes, masks, labels = [], [], []
+        for obj in objects:
+            score = obj.get('score', 1.0)
+            name = obj.get('class', 'object')
+            if score < self.score_thresh:
+                continue
+            boxes.append(list(obj['bbox']) + [score])
+            labels.append('{} {:.0f}%'.format(name, score * 100))
+            if 'segmentation' in obj:
+                masks.append(mask_from(obj['segmentation']['counts'].encode(),
+                                       obj['segmentation']['size']))
+        boxes = np.array(boxes, 'float32') if len(boxes) > 0 else boxes
+        masks = np.stack(masks) if len(masks) > 0 else masks
+        return boxes, masks, labels
+    def _convert_from_cls_format(self, cls_boxes=None, cls_masks=None):
+        boxes, masks, labels = [], [], []
+        for i, name in enumerate(self.class_names):
+            if name == '__background__':
+                continue
+            if cls_boxes is not None and len(cls_boxes[i]) > 0:
+                boxes.append(cls_boxes[i])
+                scores = cls_boxes[i][:, -1].tolist()
+                labels += ['{} {:.0f}%'.format(name, s * 100) for s in scores]
+            if cls_masks is not None and len(cls_masks[i]):
+                masks.append(cls_masks[i])
+        boxes = np.concatenate(boxes) if len(boxes) > 0 else boxes
+        masks = np.concatenate(masks) if len(masks) > 0 else masks
+        return boxes, masks, labels
+    def overlay_instances(self, boxes, masks, labels):
+        """Overlay instances."""
+        if boxes is None or len(boxes) == 0:
+            return self.output
+        # Filter instances.
+        keep = np.where(boxes[:, -1] > self.score_thresh)[0]
+        if len(keep) == 0:
+            return self.output
+        boxes, labels = boxes[keep], [labels[i] for i in keep]
+        masks = masks[keep] if len(masks) > 0 else []
+        # Paste masks.
+        if len(masks) > 0 and masks.shape[-2:] != self.output.shape[:2]:
+            masks = paste_masks(masks, boxes, self.output.shape[:2],
+                                channels_last=False)
+        # Display in largest to smallest order to reduce occlusion.
+        if boxes.shape[1] == 5:
+            areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
+        elif boxes.shape[1] == 6:
+            areas = boxes[:, 2] * boxes[:, 3]
+        else:
+            raise ValueError('Excepted box4d or box5d.')
+        keep = np.argsort(-areas)
+        boxes, labels = boxes[keep], [labels[i] for i in keep]
+        masks = masks[keep] if len(masks) > 0 else []
+        colors = self.colormap[np.arange(len(boxes)) % len(self.colormap)]
+        for i, box in enumerate(boxes):
+            if boxes.shape[1] == 5:
+                self.draw_box(box, edge_color=colors[i])
+            self.draw_box_label(box, labels[i])
+            if len(masks) > 0:
+                polygons = mask_to_polygons(masks[i])
+                for p in polygons:
+                    self.draw_polygon(p.reshape((-1, 2)), color=colors[i])
+        return self.output
+    def draw_instances(self, img, boxes, masks):
+        """Draw instances."""
+        self.output = VisImage(img[:, :, ::-1])
+        assert len(boxes) == len(self.class_names)
+        boxes, masks, labels = self._convert_from_cls_format(boxes, masks)
+        self.overlay_instances(boxes, masks, labels)
+        return self.output
+    def draw_objects(self, img, objects):
+        """Draw objects."""
+        self.output = VisImage(img[:, :, ::-1])
+        boxes, masks, labels = self._convert_from_dict_format(objects)
+        self.overlay_instances(boxes, masks, labels)
+        return self.output
+    def draw_box(self, box, alpha=0.5, edge_color='g', line_style='-'):
+        """Draw box."""
+        x0, y0, x1, y1 = box[:4]
+        width, height = x1 - x0, y1 - y0
+        line_width = max(self.output.font_size / 4, 1)
+        self.output.ax.add_patch(
+            matplotlib.patches.Rectangle(
+                (x0, y0),
+                width,
+                height,
+                fill=False,
+                edgecolor=edge_color,
+                linewidth=line_width * self.output.scale,
+                alpha=alpha,
+                linestyle=line_style))
+        return self.output
+    def draw_box_label(self, box, label):
+        """Draw box label."""
+        x0, y0, x1, y1 = box[:4]
+        text_pos = (x0, y0)
+        instance_area = (y1 - y0) * (x1 - x0)
+        if (instance_area < _SMALL_OBJECT_AREA_THRESH * self.output.scale
+                or y1 - y0 < 40 * self.output.scale):
+            if y1 >= self.output.shape[0] - 5:
+                text_pos = (x1, y0)
+            else:
+                text_pos = (x0, y1)
+        height_ratio = (y1 - y0) / np.sqrt(self.output.shape[0] * self.output.shape[1])
+        font_size = (np.clip((height_ratio - 0.02) / 0.08 + 1, 1.2, 2)
+                     * 0.5 * self.output.font_size)
+        self.draw_text(label, text_pos, font_size=font_size)
+        return self.output
+    def draw_text(
+        self,
+        text,
+        position,
+        font_size=None,
+        color='w',
+        horizontal_alignment='left',
+        rotation=0,
+    ):
+        """Draw text."""
+        if not font_size:
+            font_size = self.output.font_size
+        color = np.maximum(list(matplotlib.colors.to_rgb(color)), 0.2)
+        color[np.argmax(color)] = max(0.8, np.max(color))
+        x, y = position
+        self.output.ax.text(
+            x,
+            y,
+            text,
+            size=font_size * self.output.scale,
+            family='sans-serif',
+            bbox={'facecolor': 'black', 'alpha': 0.8,
+                  'pad': 0, 'edgecolor': 'none'},
+            verticalalignment='top',
+            horizontalalignment=horizontal_alignment,
+            color=color,
+            zorder=10,
+            rotation=rotation)
+        return self.output
+    def draw_polygon(self, segment, color, edge_color=None, alpha=0.5):
+        """Draw polygon."""
+        edge_color = edge_color or color
+        edge_color = matplotlib.colors.to_rgb(edge_color) + (1,)
+        polygon = matplotlib.patches.Polygon(
+            segment,
+            fill=True,
+            facecolor=matplotlib.colors.to_rgb(color) + (alpha,),
+            edgecolor=edge_color,
+            linewidth=max(self.output.font_size // 15 * self.output.scale, 1))
+        self.output.ax.add_patch(polygon)
+        return self.output
--- a/seetadet/version.py
+++ b/seetadet/version.py
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+version = '0.1.0a0'
+git_version = 'None'
--- a/setup.py
+++ b/setup.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Python setup script."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import os
+import shutil
+import subprocess
+import sys
+import setuptools
+import setuptools.command.build_py
+import setuptools.command.install
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--version', default=None)
+    args, unknown = parser.parse_known_args()
+    sys.argv = [sys.argv[0]] + unknown
+    args.git_version = None
+    args.long_description = ''
+    if args.version is None and os.path.exists('version.txt'):
+        with open('version.txt', 'r') as f:
+            args.version = f.read().strip()
+    if os.path.exists('.git'):
+        try:
+            git_version = subprocess.check_output(
+                ['git', 'rev-parse', 'HEAD'], cwd='./')
+            args.git_version = git_version.decode('ascii').strip()
+        except (OSError, subprocess.CalledProcessError):
+            pass
+    if os.path.exists('README.md'):
+        with open(os.path.join('README.md'), encoding='utf-8') as f:
+            args.long_description = f.read()
+    return args
+def build_extensions(parallel=4):
+    """Prepare the package files."""
+    # Compile cxx sources.
+    py_exec = sys.executable
+    if subprocess.call(
+        'cd csrc/cxx && '
+        '{} setup.py build_ext -b ../../ -f --no-python-abi-suffix=0 -j {} &&'
+        '{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
+    ) > 0:
+        raise RuntimeError('Failed to build the cxx sources.')
+    # Compile pyx sources.
+    if subprocess.call(
+        'cd csrc/pyx && '
+        '{} setup.py build_ext -b ../../ -f --cython-c-in-temp -j {} &&'
+        '{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
+    ) > 0:
+        raise RuntimeError('Failed to build the pyx sources.')
+def clean_builds():
+    """Clean the builds."""
+    for path in ['build', 'seeta_det.egg-info']:
+        if os.path.exists(path):
+            shutil.rmtree(path)
+def find_packages(top):
+    """Return the python sources installed to package."""
+    packages = []
+    for root, _, _ in os.walk(top):
+        if os.path.exists(os.path.join(root, '__init__.py')):
+            packages.append(root)
+    return packages
+def find_package_data(top):
+    """Return the external data installed to package."""
+    headers, libraries = [], []
+    if sys.platform == 'win32':
+        dylib_suffix = '.pyd'
+    elif sys.platform == 'darwin':
+        dylib_suffix = '.dylib'
+    else:
+        dylib_suffix = '.so'
+    for root, _, files in os.walk(top):
+        root = root[len(top + '/'):]
+        for file in files:
+            if file.endswith(dylib_suffix):
+                libraries.append(os.path.join(root, file))
+    return headers + libraries
+class BuildPyCommand(setuptools.command.build_py.build_py):
+    """Enhanced 'build_py' command."""
+    def build_packages(self):
+        with open('seetadet/version.py', 'w') as f:
+            f.write("from __future__ import absolute_import\n"
+                    "from __future__ import division\n"
+                    "from __future__ import print_function\n\n"
+                    "version = '{}'\n"
+                    "git_version = '{}'\n".format(args.version, args.git_version))
+        super(BuildPyCommand, self).build_packages()
+    def build_package_data(self):
+        parallel = 4
+        for k in ('build', 'install'):
+            v = self.get_finalized_command(k).parallel
+            parallel = max(parallel, (int(v) if v else v) or 1)
+        build_extensions(parallel=parallel)
+        self.package_data = {'seetadet': find_package_data('seetadet')}
+        super(BuildPyCommand, self).build_package_data()
+class InstallCommand(setuptools.command.install.install):
+    """Enhanced 'install' command."""
+    user_options = setuptools.command.install.install.user_options
+    user_options += [('parallel=', 'j', "number of parallel build jobs")]
+    def initialize_options(self):
+        self.parallel = None
+        super(InstallCommand, self).initialize_options()
+        self.old_and_unmanageable = True
+args = parse_args()
+setuptools.setup(
+    name='seeta-det',
+    version=args.version,
+    description='SeetaDet: A platform implementing popular object detection algorithms.',
+    long_description=args.long_description,
+    long_description_content_type='text/markdown',
+    url='https://github.seetatech.com/seetaresearch/seetadet',
+    author='SeetaTech',
+    license='BSD 2-Clause',
+    packages=find_packages('seetadet'),
+    cmdclass={'build_py': BuildPyCommand, 'install': InstallCommand},
+    install_requires=['opencv-python',
+                      'Pillow>=7.1',
+                      'pyyaml',
+                      'prettytable',
+                      'matplotlib',
+                      'codewithgpu',
+                      'shapely',
+                      'Cython',
+                      'pycocotools>=2.0.2'],
+    classifiers=['Development Status :: 5 - Production/Stable',
+                 'Intended Audience :: Developers',
+                 'Intended Audience :: Education',
+                 'Intended Audience :: Science/Research',
+                 'License :: OSI Approved :: BSD License',
+                 'Programming Language :: C++',
+                 'Programming Language :: Python :: 3',
+                 'Programming Language :: Python :: 3 :: Only',
+                 'Topic :: Scientific/Engineering',
+                 'Topic :: Scientific/Engineering :: Mathematics',
+                 'Topic :: Scientific/Engineering :: Artificial Intelligence'],
+)
+clean_builds()
--- a/tools/__init__.py
+++ b/tools/__init__.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
--- a/tools/distributed/train.py
+++ b/tools/distributed/train.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Train a detection network."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import dragon
+import numpy
+from seetadet.core.config import cfg
+from seetadet.core.coordinator import Coordinator
+from seetadet.core.engine import train_engine
+from seetadet.data.build import build_dataset
+from seetadet.utils import logging
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Train a detection network')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        default=None,
+        help='config file')
+    parser.add_argument(
+        '--exp_dir',
+        default='',
+        help='experiment dir')
+    parser.add_argument(
+        '--tensorboard',
+        action='store_true',
+        help='write metrics to tensorboard or not')
+    return parser.parse_args()
+if __name__ == '__main__':
+    args = parse_args()
+    coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
+    checkpoint, start_iter = coordinator.get_checkpoint()
+    cfg.TRAIN.WEIGHTS = checkpoint or cfg.TRAIN.WEIGHTS
+    # Setup the distributed environment.
+    world_rank = dragon.distributed.get_rank()
+    world_size = dragon.distributed.get_world_size()
+    if cfg.NUM_GPUS != world_size:
+        raise ValueError(
+            'Excepted staring of {} processes, got {}.'
+            .format(cfg.NUM_GPUS, world_size))
+    # Setup the logging modules.
+    logging.set_root(world_rank == 0)
+    # Select the GPU depending on the rank of process.
+    cfg.GPU_ID = [i for i in range(cfg.NUM_GPUS)][world_rank]
+    # Fix the random seed for reproducibility.
+    numpy.random.seed(cfg.RNG_SEED + world_rank)
+    dragon.random.set_seed(cfg.RNG_SEED)
+    # Inspect the dataset.
+    dataset_size = build_dataset(cfg.TRAIN.DATASET).size
+    logging.info('Dataset({}): {} images will be used to train.'
+                 .format(cfg.TRAIN.DATASET, dataset_size))
+    # Run training.
+    logging.info('Checkpoints will be saved to `{:s}`'
+                 .format(coordinator.path_at('checkpoints')))
+    with dragon.distributed.new_group(
+            ranks=[i for i in range(cfg.NUM_GPUS)],
+            verbose=True).as_default():
+        train_engine.run_train(
+            coordinator, start_iter,
+            enable_tensorboard=args.tensorboard)
--- a/tools/export.py
+++ b/tools/export.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Export a detection network into the onnx model."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import os
+import dragon.vm.torch as torch
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.core.coordinator import Coordinator
+from seetadet.models.build import build_detector
+from seetadet.ops import onnx as _  # noqa
+from seetadet.utils import logging
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Export a detection network into the onnx model')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        default=None,
+        help='config file')
+    parser.add_argument(
+        '--exp_dir',
+        default='',
+        help='experiment dir')
+    parser.add_argument(
+        '--model_dir',
+        default='',
+        help='model dir')
+    parser.add_argument(
+        '--gpu',
+        type=int,
+        default=0,
+        help='index of GPU to use')
+    parser.add_argument(
+        '--iter',
+        type=int,
+        default=None,
+        help='checkpoint of given step')
+    parser.add_argument(
+        '--input_shape',
+        nargs='+',
+        type=int,
+        default=(1, 512, 512, 3),
+        help='input image shape')
+    parser.add_argument(
+        '--opset',
+        type=int,
+        default=None,
+        help='opset version to export')
+    parser.add_argument(
+        '--check_model',
+        type=bool,
+        default=True,
+        help='check the model validation or not')
+    return parser.parse_args()
+def find_weights(args, coordinator):
+    """Return the weights for exporting."""
+    weights_list = []
+    if args.model_dir:
+        for file in os.listdir(args.model_dir):
+            if not file.endswith('.pkl'):
+                continue
+            weights_list.append(os.path.join(args.model_dir, file))
+    else:
+        checkpoint, _ = coordinator.get_checkpoint(args.iter)
+        weights_list.append(checkpoint)
+    return weights_list[0]
+def get_dummay_inputs(args):
+    n, h, w, c = args.input_shape
+    im_batch = torch.zeros(n, h, w, c, dtype='uint8')
+    im_info = torch.tensor([[h, w, 1., 1.] for _ in range(n)], dtype='float32')
+    strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
+    strides = np.array(strides)[:, None]
+    grid_shapes = np.stack([[h, w]] * len(strides))
+    grid_shapes = (grid_shapes - 1) // strides + 1
+    grid_info = torch.tensor(grid_shapes, dtype='int64')
+    return {'img': im_batch, 'im_info': im_info, 'grid_info': grid_info}
+if __name__ == '__main__':
+    args = parse_args()
+    logging.info('Called with args:\n' + str(args))
+    coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
+    logging.info('Using config:\n' + str(cfg))
+    # Run exporting.
+    weights = find_weights(args, coordinator)
+    weights_name = os.path.splitext(os.path.basename(weights))[0]
+    output_dir = args.model_dir or coordinator.path_at('exports')
+    logging.info('Exports will be saved to ' + output_dir)
+    detector = build_detector(args.gpu, weights)
+    inputs = get_dummay_inputs(args)
+    torch.onnx.export(
+        model=detector,
+        args=inputs,
+        f=os.path.join(output_dir, weights_name + '.onnx'),
+        verbose=True,
+        opset_version=args.opset,
+        enable_onnx_checker=args.check_model,
+    )
--- a/tools/serve.py
+++ b/tools/serve.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#      <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Serve a detection network."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import collections
+import os
+import multiprocessing as mp
+import time
+import codewithgpu
+import numpy as np
+from seetadet.core.config import cfg
+from seetadet.core.coordinator import Coordinator
+from seetadet.core.engine import test_engine
+from seetadet.utils import logging
+from seetadet.utils import profiler
+from seetadet.utils.mask import encode_masks
+from seetadet.utils.mask import paste_masks
+from seetadet.utils.vis import Visualizer
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Serve a detection network')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        default=None,
+        help='config file')
+    parser.add_argument(
+        '--exp_dir',
+        default='',
+        help='experiment dir')
+    parser.add_argument(
+        '--iter',
+        type=int,
+        default=None,
+        help='iteration of checkpoint')
+    parser.add_argument(
+        '--model_dir',
+        default='',
+        help='model dir')
+    parser.add_argument(
+        '--score_thresh',
+        type=float,
+        default=0.7,
+        help='score threshold for inference')
+    parser.add_argument(
+        '--batch_timeout',
+        type=float,
+        default=1,
+        help='timeout to wait for a batch')
+    parser.add_argument(
+        '--queue_size',
+        type=int,
+        default=512,
+        help='size of the memory queue')
+    parser.add_argument(
+        '--gpu',
+        nargs='+',
+        type=int,
+        default=None,
+        help='index of GPUs to use')
+    parser.add_argument(
+        '--deterministic',
+        action='store_true',
+        help='set cudnn deterministic or not')
+    parser.add_argument(
+        '--app',
+        default='gradio',
+        help='application framework')
+    parser.add_argument(
+        '--processes',
+        type=int,
+        default=1,
+        help='number of flask processes')
+    parser.add_argument(
+        '--port',
+        type=int,
+        default=5050,
+        help='listening port')
+    return parser.parse_args()
+class ServingCommand(codewithgpu.ServingCommand):
+    """Command to run serving."""
+    def __init__(self, output_queue, score_thresh=0.7, perf_every=100):
+        super(ServingCommand, self).__init__(app_library='flask')
+        self.output_queue = output_queue
+        self.output_dict = mp.Manager().dict()
+        self.score_thresh = score_thresh
+        self.perf_every = perf_every
+        self.classes = cfg.MODEL.CLASSES
+        self.max_dets = cfg.TEST.DETECTIONS_PER_IM
+    def make_objects(self, outputs):
+        """Main the detection objects."""
+        boxes = outputs.pop('boxes')
+        masks = outputs.pop('masks', None)
+        objects = []
+        for j, name in enumerate(self.classes):
+            if name == '__background__':
+                continue
+            inds = np.where(boxes[j][:, 4] > self.score_thresh)[0]
+            if len(inds) == 0:
+                continue
+            for box in boxes[j][inds]:
+                objects.append({'bbox': box[:4].astype(float).tolist(),
+                                'score': float(box[4]), 'class': name})
+            if masks is not None:
+                rles = encode_masks(paste_masks(
+                    masks[j][inds], boxes[j][inds], outputs['im_shape'][:2]))
+                for i, rle in enumerate(rles[::-1]):
+                    objects[-(i + 1)]['segmentation'] = rle
+        return objects
+    def run(self):
+        """Main loop to make the serving outputs."""
+        count, timers = 0, collections.defaultdict(profiler.Timer)
+        while True:
+            count += 1
+            img_id, time_diffs, outputs = self.output_queue.get()
+            outputs = test_engine.filter_outputs(outputs, self.max_dets)
+            for name, diff in time_diffs.items():
+                timers[name].add_diff(diff)
+            self.output_dict[img_id] = self.make_objects(outputs)
+            if count % self.perf_every == 0:
+                logging.info('im_detect: {:d} [{:.3f}s + {:.3f}s]'
+                             .format(count, timers['im_detect'].average_time,
+                                     timers['misc'].average_time))
+def find_weights(args, coordinator):
+    """Return the weights for serving."""
+    weights_list = []
+    if args.model_dir:
+        for file in os.listdir(args.model_dir):
+            if file.endswith('.pkl'):
+                weights_list.append(os.path.join(args.model_dir, file))
+    else:
+        checkpoint, _ = coordinator.get_checkpoint(args.iter)
+        weights_list.append(checkpoint)
+    return weights_list[0]
+def build_flask_app(queues, command):
+    """Build the flask application."""
+    import flask
+    app = flask.Flask('seetadet.serve')
+    logging._logging.getLogger('werkzeug').setLevel('ERROR')
+    debug_objects = os.environ.get('FLASK_DEBUG', False)
+    @app.route("/upload", methods=['POST'])
+    def upload():
+        img_id, img = command.get_image()
+        queues[img_id % len(queues)].put((img_id, img))
+        return flask.jsonify({'image_id': img_id})
+    @app.route("/get", methods=['POST'])
+    def get():
+        def try_get(retry_time=0.005):
+            try:
+                req = flask.request.get_json(force=True)
+                img_id = req['image_id']
+            except KeyError:
+                err_msg, img_id = 'Not found "image_id" in data.', ''
+                flask.abort(flask.Response(err_msg))
+            while img_id not in command.output_dict:
+                time.sleep(retry_time)
+            return img_id, command.output_dict.pop(img_id)
+        img_id, objects = try_get(retry_time=0.005)
+        msg = 'ImageId = %d, #Detects = %d' % (img_id, len(objects))
+        if debug_objects:
+            msg += (('\n * ' if len(objects) > 0 else '') +
+                    ('\n * '.join(str(obj) for obj in objects)))
+        logging.info(msg)
+        return flask.jsonify({'objects': objects})
+    return app
+def build_gradio_app(queues, command):
+    """Build the gradio application."""
+    import cv2
+    import gradio
+    visualizer = Visualizer(class_names=command.classes, score_thresh=0.0)
+    def upload_and_get(img_path):
+        with command.example_id.get_lock():
+            command.example_id.value += 1
+            img_id = command.example_id.value
+        img = cv2.imread(img_path)
+        queues[img_id % len(queues)].put((img_id, img))
+        while img_id not in command.output_dict:
+            time.sleep(0.005)
+        objects = command.output_dict.pop(img_id)
+        logging.info('ImageId = %d, #Detects = %d' % (img_id, len(objects)))
+        vis_img = visualizer.draw_objects(img, objects).get_image(rgb=True)
+        objects_list = [(i, obj['class'], round(obj['score'], 3),
+                         str(np.round(obj['bbox'], 2).tolist()))
+                        for i, obj in enumerate(objects)]
+        return vis_img, objects_list
+    app = gradio.Interface(
+        fn=upload_and_get,
+        inputs=gradio.Image(type='filepath', label='Image', show_label=False),
+        outputs=[gradio.Image(label='Visualization'),
+                 gradio.Dataframe(headers=['Id', 'Category', 'Score', 'BBox'],
+                                  label='Objects')],
+        examples=['../data/images/' + x for x in os.listdir('../data/images')],
+        css=".h-60 {height: auto}", allow_flagging='never')
+    app.temp_dirs.add('../data/images')
+    return app
+if __name__ == '__main__':
+    logging.set_formatter("%(asctime)s %(levelname)s %(message)s")
+    args = parse_args()
+    logging.info('Called with args:\n' + str(args))
+    coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
+    logging.info('Using config:\n' + str(cfg))
+    # Build actors.
+    weights = find_weights(args, coordinator)
+    devices = args.gpu if args.gpu else [cfg.GPU_ID]
+    num_devices = len(devices)
+    queues = [mp.Queue(args.queue_size) for _ in range(num_devices + 1)]
+    commands = [test_engine.InferenceCommand(
+        queues[i], queues[-1], kwargs={
+            'cfg': cfg,
+            'device': devices[i],
+            'weights': weights,
+            'deterministic': args.deterministic,
+            'batch_timeout': args.batch_timeout,
+            'verbose': i == 0,
+        }) for i in range(num_devices)]
+    commands += [ServingCommand(queues[-1])]
+    actors = [mp.Process(target=command.run) for command in commands]
+    for actor in actors:
+        actor.start()
+    # Build app.
+    if args.app == 'flask':
+        app = build_flask_app(queues[:-1], commands[-1])
+        app.run(port=args.port, threaded=args.processes == 1,
+                processes=args.processes)
+    elif args.app == 'gradio':
+        app = build_gradio_app(queues[:-1], commands[-1])
+        app.queue(concurrency_count=args.processes)
+        app.launch(server_port=args.port)
+    else:
+        raise ValueError('Unsupported application framework: ' + args.app)
--- a/tools/test.py
+++ b/tools/test.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Test a detection network."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import multiprocessing
+import os
+from seetadet.core.config import cfg
+from seetadet.core.coordinator import Coordinator
+from seetadet.core.engine import test_engine
+from seetadet.data.build import build_dataset
+from seetadet.utils import logging
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Test a detection network')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        default=None,
+        help='config file')
+    parser.add_argument(
+        '--exp_dir',
+        default='',
+        help='experiment dir')
+    parser.add_argument(
+        '--model_dir',
+        default='',
+        help='model dir')
+    parser.add_argument(
+        '--gpu',
+        nargs='+',
+        type=int,
+        default=None,
+        help='index of GPUs to use')
+    parser.add_argument(
+        '--iter',
+        nargs='+',
+        type=int,
+        default=None,
+        help='iteration step of checkpoints')
+    parser.add_argument(
+        '--last',
+        type=int,
+        default=1,
+        help='last N checkpoints')
+    parser.add_argument(
+        '--read_every',
+        type=int,
+        default=100,
+        help='read every-n images for testing')
+    parser.add_argument(
+        '--vis',
+        type=float,
+        default=0,
+        help='score threshold for visualization')
+    parser.add_argument(
+        '--precision',
+        default='',
+        help='compute precision')
+    parser.add_argument(
+        '--deterministic',
+        action='store_true',
+        help='set cudnn deterministic or not')
+    return parser.parse_args()
+def find_weights(args, coordinator):
+    """Return the weights for testing."""
+    weights_list = []
+    if args.model_dir:
+        for file in os.listdir(args.model_dir):
+            if not file.endswith('.pkl'):
+                continue
+            weights_list.append(os.path.join(args.model_dir, file))
+        return weights_list
+    if args.iter is not None:
+        for iter in args.iter:
+            checkpoint, _ = coordinator.get_checkpoint(iter, wait=True)
+            weights_list.append(checkpoint)
+        return weights_list
+    for i in range(1, args.last + 1):
+        checkpoint, _ = coordinator.get_checkpoint(last_idx=i)
+        if checkpoint is None:
+            break
+        weights_list.append(checkpoint)
+    return weights_list
+if __name__ == '__main__':
+    args = parse_args()
+    logging.info('Called with args:\n' + str(args))
+    coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
+    cfg.MODEL.PRECISION = args.precision or cfg.MODEL.PRECISION
+    logging.info('Using config:\n' + str(cfg))
+    # Inspect dataset.
+    dataset_size = build_dataset(cfg.TEST.DATASET).size
+    logging.info('Dataset({}): {} images will be used to test.'
+                 .format(cfg.TEST.DATASET, dataset_size))
+    # Run testing.
+    for weights in find_weights(args, coordinator):
+        weights_name = os.path.splitext(os.path.basename(weights))[0]
+        output_dir = coordinator.path_at('results/' + weights_name)
+        logging.info('Results will be saved to ' + output_dir)
+        vis_output_dir = None
+        if args.vis > 0:
+            vis_output_dir = coordinator.path_at('visualizations/' + weights_name)
+            logging.info('Visualizations will be saved to ' + vis_output_dir)
+        process = multiprocessing.Process(
+            target=test_engine.run_test,
+            kwargs={'test_cfg': cfg,
+                    'weights': weights,
+                    'output_dir': output_dir,
+                    'devices': args.gpu,
+                    'deterministic': args.deterministic,
+                    'read_every': args.read_every,
+                    'vis_thresh': args.vis,
+                    'vis_output_dir': vis_output_dir})
+        process.start()
+        process.join()
--- a/tools/train.py
+++ b/tools/train.py
+# ------------------------------------------------------------
+# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
+#
+# Licensed under the BSD 2-Clause License.
+# You should have received a copy of the BSD 2-Clause License
+# along with the software. If not, See,
+#
+#     <https://opensource.org/licenses/BSD-2-Clause>
+#
+# ------------------------------------------------------------
+"""Train a detection network."""
+from __future__ import absolute_import
+from __future__ import division
+from __future__ import print_function
+import argparse
+import os
+import sys
+import dragon
+import numpy
+from seetadet.core.config import cfg
+from seetadet.core.coordinator import Coordinator
+from seetadet.core.engine import train_engine
+from seetadet.data.build import build_dataset
+from seetadet.utils import logging
+def parse_args():
+    """Parse arguments."""
+    parser = argparse.ArgumentParser(
+        description='Train a detection network')
+    parser.add_argument(
+        '--cfg',
+        dest='cfg_file',
+        default=None,
+        help='config file')
+    parser.add_argument(
+        '--exp_dir',
+        default=None,
+        help='experiment dir')
+    parser.add_argument(
+        '--tensorboard',
+        action='store_true',
+        help='write metrics to tensorboard or not')
+    return parser.parse_args()
+def run_distributed(args, coordinator):
+    """Run distributed training."""
+    import subprocess
+    cmd = 'mpirun --allow-run-as-root -n {} --bind-to none '.format(cfg.NUM_GPUS)
+    cmd += '{} {}'.format(sys.executable, 'distributed/train.py')
+    cmd += ' --cfg {}'.format(os.path.abspath(args.cfg_file))
+    cmd += ' --exp_dir {}'.format(coordinator.exp_dir)
+    cmd += ' --tensorboard' if args.tensorboard else ''
+    return subprocess.call(cmd, shell=True)
+if __name__ == '__main__':
+    args = parse_args()
+    logging.info('Called with args:\n' + str(args))
+    coordinator = Coordinator(args.cfg_file, args.exp_dir)
+    checkpoint, start_iter = coordinator.get_checkpoint()
+    cfg.TRAIN.WEIGHTS = checkpoint or cfg.TRAIN.WEIGHTS
+    logging.info('Using config:\n' + str(cfg))
+    if cfg.NUM_GPUS > 1:
+        # Run a distributed task.
+        run_distributed(args, coordinator)
+    else:
+        # Fix the random seed for reproducibility.
+        numpy.random.seed(cfg.RNG_SEED)
+        dragon.random.set_seed(cfg.RNG_SEED)
+        # Inspect the dataset.
+        dataset_size = build_dataset(cfg.TRAIN.DATASET).size
+        logging.info('Dataset({}): {} images will be used to train.'
+                     .format(cfg.TRAIN.DATASET, dataset_size))
+        # Run training.
+        logging.info('Checkpoints will be saved to `{:s}`'
+                     .format(coordinator.path_at('checkpoints')))
+        train_engine.run_train(coordinator, start_iter,
+                               enable_tensorboard=args.tensorboard)
--- a/version.txt
+++ b/version.txt
+0.1.0a0