Commit 696e7be3 by Ting PAN

Intial repository

0 parents
Showing with 13019 additions and 0 deletions
[flake8]
max-line-length = 120
ignore = E741, # ambiguous variable name
F403, # ‘from module import *’ used; unable to detect undefined names
F405, # name may be undefined, or defined from star imports: module
F811, # redefinition of unused name from line N
F821, # undefined name
W503, # line break before binary operator
W504 # line break after binary operator
# module imported but unused
per-file-ignores = __init__.py: F401
# Compiled Object files
*.slo
*.lo
*.o
*.cuo
# Compiled Dynamic libraries
*.so
*.dll
*.dylib
# Compiled Static libraries
*.lai
*.la
*.a
*.lib
# Compiled python
*.pyc
__pycache__
# Compiled MATLAB
*.mex*
# IPython notebook checkpoints
.ipynb_checkpoints
# Editor temporaries
*.swp
*~
# Sublime Text settings
*.sublime-workspace
*.sublime-project
# Eclipse Project settings
*.*project
.settings
# QtCreator files
*.user
# VSCode files
.vscode
# IDEA files
.idea
# OSX dir files
.DS_Store
# Android files
.gradle
*.iml
local.properties
Copyright (c) 2017, SeetaTech, Co.,Ltd. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
# Benchmark and Model Zoo
## Introduction
### Pretrained Models
Refer to [Pretrained Models](data/pretrained) for details.
## Baselines
### Faster R-CNN
Refer to [Faster R-CNN](configs/faster_rcnn) for details.
### Mask R-CNN
Refer to [Mask R-CNN](configs/mask_rcnn) for details.
### Pascal VOC
Refer to [Pascal VOC](configs/pascal_voc) for details.
# SeetaDet
SeetaDet is a platform implementing popular object detection algorithms.
This platform works with [**SeetaDragon**](https://dragon.seetatech.com), and uses the [**PyTorch**](https://dragon.seetatech.com/api/python/#pytorch) style.
<img src="https://dragon.seetatech.com/download/seetadet/assets/banner.png"/>
## Installation
Install from PyPI:
```bash
pip install seeta-det
```
Or, clone this repository to local disk and install:
```bash
cd seetadet && pip install .
```
You can also install from the remote repository:
```bash
pip install git+ssh://git@github.com/seetaresearch/seetadet.git
```
If you prefer to develop locally, build but not install to ***site-packages***:
```bash
cd seetadet && python setup.py build
```
## Quick Start
### Train a detection model
```bash
cd tools
python train.py --cfg <MODEL_YAML>
```
We have provided the default YAML examples into [configs](configs).
### Test a detection model
```bash
cd tools
python test.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
```
### Export a detection model to ONNX
```bash
cd tools
python export.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
```
### Serve a detection model
```bash
cd tools
python serve.py --cfg <MODEL_YAML> --exp_dir <EXP_DIR> --iter <ITERATION>
```
## Benchmark and Model Zoo
Results and models are available in the [Model Zoo](MODEL_ZOO.md).
## License
[BSD 2-Clause license](LICENSE)
# Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
## Introduction
```
@article{Ren_2017,
title={Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
publisher={Institute of Electrical and Electronics Engineers (IEEE)},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
year={2017},
month={Jun},
}
```
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :--------------: | :----: | :-----: |
| [R50-FPN](coco_faster_rcnn_R_50_FPN_1x.yml) | 1x | 37.04 | 37.7 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/model_7abb52ab.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_1x/logs.json) |
| [R50-FPN](coco_faster_rcnn_R_50_FPN_3x.yml) | 3x | 37.04 | 39.8 | [model](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x/model_04e548ca.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/faster_rcnn/coco_faster_rcnn_R_50_FPN_3x/logs.json) |
NUM_GPUS: 8
MODEL:
TYPE: 'faster_rcnn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50_v1a.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
TEST:
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NUM_GPUS: 8
MODEL:
TYPE: 'faster_rcnn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50_v1a.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [210000, 250000]
MAX_STEPS: 270000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_faster_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
TEST:
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
# Mask R-CNN
## Introduction
```
@article{He_2017,
title={Mask R-CNN},
journal={2017 IEEE International Conference on Computer Vision (ICCV)},
publisher={IEEE},
author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
year={2017},
month={Oct}
}
```
## COCO Instance Segmentation Baselines
| Model | Lr sched | Infer time (fps) | box AP | mask AP | Download |
| :---: | :------: | :---------------: | :----: | :-----: | :------: |
| [R50-FPN](coco_mask_rcnn_R_50_FPN_1x.yml) | 1x | 30.30 | 38.3 | 34.9 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/model_b27317db.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_1x/logs.json) |
| [R50-FPN](coco_mask_rcnn_R_50_FPN_3x.yml) | 3x | 30.30 | 40.7 | 36.8 | [model](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x/model_6f7e3878.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/mask_rcnn/coco_mask_rcnn_R_50_FPN_3x/logs.json) |
NUM_GPUS: 8
MODEL:
TYPE: 'mask_rcnn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50_v1a.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
DATASET: '../data/datasets/coco_train2017'
LOADER: 'mask_train'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
TEST:
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NUM_GPUS: 8
MODEL:
TYPE: 'mask_rcnn'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50_v1a.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
SOLVER:
BASE_LR: 0.02
DECAY_STEPS: [210000, 250000]
MAX_STEPS: 270000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_mask_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
DATASET: '../data/datasets/coco_train2017'
LOADER: 'mask_train'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
TEST:
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
# Pascal VOC
## Introduction
```latex
@Article{Everingham10,
author = "Everingham, M. and Van~Gool, L. and Williams, C. K. I. and Winn, J. and Zisserman, A.",
title = "The Pascal Visual Object Classes (VOC) Challenge",
journal = "International Journal of Computer Vision",
volume = "88",
year = "2010",
number = "2",
month = jun,
pages = "303--338",
}
```
## Object Detection Baselines
### Faster R-CNN
| Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :--------------: | :----: | :------: |
| [R50-FPN](voc_faster_rcnn_R_50_FPN_15e.yml) | 15e | 47.62 | 82.1 | [model](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_faster_rcnn_R_50_FPN_15e/model_3dcb03f9.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_faster_rcnn_R_50_FPN_15e/logs.json) |
### RetinaNet
| Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :--------------: | :----: | :------: |
| [R50-FPN](voc_retinanet_R_50_FPN_120e.yml) | 120 | 58.82 | 82.4 | [model](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_retinanet_R_50_FPN_120e/model_1ae4cd3d.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_retinanet_R_50_FPN_120e/logs.json) |
### SSD
| Model | Lr sched | Infer time (fps) | box AP | Download |
| :---: | :------: | :--------------: | :----: | :------: |
| [VGG16-SSD300](voc_ssd300_VGG_16_120e.yml) | 120 | 125 | 77.8 | [model](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_ssd300_VGG_16_120e/model_3417d961.pkl) &#124; [log](https://dragon.seetatech.com/download/seetadet/pascal_voc/voc_ssd300_VGG_16_120e/logs.json) |
NUM_GPUS: 2
MODEL:
TYPE: 'faster_rcnn'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'resnet50.fpn'
FPN:
MIN_LEVEL: 2
MAX_LEVEL: 6
ANCHOR_GENERATOR:
STRIDES: [4, 8, 16, 32, 64]
FAST_RCNN:
BBOX_REG_LOSS_TYPE: 'smooth_l1'
SOLVER:
BASE_LR: 0.002
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_cascade_rcnn_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/voc_trainval0712'
USE_DIFF: True
IMS_PER_BATCH: 2
SCALES: [480, 512, 544, 576, 608, 640]
MAX_SIZE: 1000
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [640]
MAX_SIZE: 1000
NMS_THRESH: 0.45
NUM_GPUS: 1
MODEL:
TYPE: 'retinanet'
PRECISION: 'float32'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'resnet50.fpn'
SOLVER:
BASE_LR: 0.01
WARM_UP_STEPS: 3000
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_retinanet_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50_in1k_cls90e.pkl'
DATASET: '../data/datasets/voc_trainval0712'
USE_DIFF: True
IMS_PER_BATCH: 16
SCALES: [512]
SCALES_RANGE: [0.1, 2.0]
MAX_SIZE: 512
CROP_SIZE: 512
COLOR_JITTER: 0.5
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
MAX_SIZE: 512
CROP_SIZE: 512
NMS_THRESH: 0.45
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'vgg16_fcn.ssd300'
NORM: ''
FREEZE_AT: 0
COARSEST_STRIDE: 300
FPN:
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [8, 16, 32, 64, 100, 300]
SIZES: [[30, 60], [60, 110],[110, 162],
[162, 213], [213, 264], [264, 315]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]]
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_ssd300_VGG_16'
TRAIN:
WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
DATASET: '../data/datasets/voc_trainval0712'
LOADER: 'ssd_train'
USE_DIFF: True
IMS_PER_BATCH: 16
SCALES: [300]
SCALES_RANGE: [0.25, 1.0]
COLOR_JITTER: 0.5
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 8
SCALES: [300]
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
NUM_GPUS: 1
MODEL:
TYPE: 'ssd'
PRECISION: 'float16'
CLASSES: ['__background__',
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
BACKBONE:
TYPE: 'vgg16_fcn.ssd512'
NORM: ''
FREEZE_AT: 0
COARSEST_STRIDE: 512
FPN:
ACTIVATION: 'ReLU'
ANCHOR_GENERATOR:
STRIDES: [8, 16, 32, 64, 128, 256, 512]
SIZES: [[35.84, 76.8],
[76.8, 153.6],
[153.6, 230.4],
[230.4, 307.2],
[307.2, 384.0],
[384.0, 460.8],
[460.8, 537.6]]
ASPECT_RATIOS: [[1, 2, 0.5],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5, 3, 0.33],
[1, 2, 0.5],
[1, 2, 0.5]]
SOLVER:
BASE_LR: 0.001
WEIGHT_DECAY: 0.0005
DECAY_STEPS: [80000, 100000]
MAX_STEPS: 120000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'voc_ssd512_VGG_16'
AUG:
COLOR_JITTER: 0.5
TRAIN:
WEIGHTS: '../data/pretrained/VGG-16-FCN_in1k.pkl'
DATASET: '../data/datasets/voc_trainval0712'
IMS_PER_BATCH: 16
SCALES: [512]
SCALES_RANGE: [0.25, 1.0]
LOADER: 'ssd_train'
TEST:
DATASET: '../data/datasets/voc_test2007'
JSON_DATASET: '../data/datasets/voc_test2007.json'
EVALUATOR: 'voc2007'
IMS_PER_BATCH: 1
SCALES: [512]
NMS_THRESH: 0.45
SCORE_THRESH: 0.01
# Focal Loss for Dense Object Detection
## Introduction
```
@inproceedings{lin2017focal,
title={Focal loss for dense object detection},
author={Lin, Tsung-Yi and Goyal, Priya and Girshick, Ross and He, Kaiming and Doll{\'a}r, Piotr},
booktitle={Proceedings of the IEEE international conference on computer vision},
year={2017}
}
```
## COCO Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | box AP | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_1x.yml) | 1x | 0.051 | 37.4 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_1x/model_final.pkl) |
| [R-50-FPN-800](coco_retinanet_R-50-FPN_800_2x.yml) | 2x | 0.051 | 39.1 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/coco_retinanet_R-50-FPN_800_2x/model_final.pkl) |
## Pascal VOC Object Detection Baselines
| Model | Lr sched | Infer time (s/im) | AP@0.5 | Download |
| :---: | :------: | :---------------: | :----: | :------: |
| [R-50-FPN-512](voc_retinanet_R-50-FPN_512_120e.yml) | 120e | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
| [R-50-FPN-512](voc_retinanet_R-50-FPN_640_120e.yml) | 120e | 0.017 | 83.0 | [model](https://dragon.seetatech.com/download/models/seetadet/retinanet/voc_retinanet_R-50-FPN_512/model_final.pkl) |
NUM_GPUS: 8
MODEL:
TYPE: 'retinanet'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50_v1a.fpn'
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [60000, 80000]
MAX_STEPS: 90000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_retinanet_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
TEST:
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
NUM_GPUS: 8
MODEL:
TYPE: 'retinanet'
PRECISION: 'float16'
CLASSES: ['__background__',
'person', 'bicycle', 'car', 'motorcycle', 'airplane',
'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench',
'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag',
'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball',
'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife',
'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',
'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors',
'teddy bear', 'hair drier', 'toothbrush']
BACKBONE:
TYPE: 'resnet50_v1a.fpn'
SOLVER:
BASE_LR: 0.01
DECAY_STEPS: [210000, 250000]
MAX_STEPS: 270000
SNAPSHOT_EVERY: 5000
SNAPSHOT_PREFIX: 'coco_retinanet_R_50_FPN'
TRAIN:
WEIGHTS: '../data/pretrained/R-50-A_in1k_cls120e.pkl'
DATASET: '../data/datasets/coco_train2017'
IMS_PER_BATCH: 2
SCALES: [640, 672, 704, 736, 768, 800]
MAX_SIZE: 1333
TEST:
DATASET: '../data/datasets/coco_val2017'
JSON_DATASET: '../data/datasets/coco_instances_val2017.json'
EVALUATOR: 'coco'
IMS_PER_BATCH: 1
SCALES: [800]
MAX_SIZE: 1333
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_OPERATORS_MASK_OP_H_
#define DRAGON_EXTENSION_OPERATORS_MASK_OP_H_
#include <dragon/core/operator.h>
namespace dragon {
template <class Context>
class PasteMaskOp final : public Operator<Context> {
public:
PasteMaskOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
mask_threshold_(OP_SINGLE_ARG(float, "mask_threshold", 0.5f)) {
INITIALIZE_OP_REPEATED_ARG(int64_t, sizes);
}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
}
template <typename T>
void DoRunWithType();
protected:
float mask_threshold_;
DECLARE_OP_REPEATED_ARG(int64_t, sizes);
};
DEFINE_OP_REPEATED_ARG(int64_t, PasteMaskOp, sizes);
} // namespace dragon
#endif // DRAGON_EXTENSION_OPERATORS_MASK_OP_H_
#include <dragon/core/workspace.h>
#include "../operators/mask_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void PasteMaskOp<Context>::DoRunWithType() {
auto &X_masks = Input(0), &X_boxes = Input(1), *Y = Output(0);
vector<int64_t> Y_dims({X_masks.dim(0)});
int num_sizes;
sizes(0, &num_sizes);
for (int i = 0; i < num_sizes; ++i) {
Y_dims.push_back(sizes(i));
}
if (num_sizes == 2) {
detection::PasteMask(
Y_dims[0], // N
Y_dims[1], // H
Y_dims[2], // W
X_masks.dim(1), // mask_h
X_masks.dim(2), // mask_w
mask_threshold_,
X_masks.template data<T, Context>(),
X_boxes.template data<float, Context>(),
Y->Reshape(Y_dims)->template mutable_data<uint8_t, Context>(),
ctx());
} else {
LOG(FATAL) << "PasteMask" << num_sizes << "d is not supported.";
}
}
DEPLOY_CPU_OPERATOR(PasteMask);
#ifdef USE_CUDA
DEPLOY_CUDA_OPERATOR(PasteMask);
#endif
#ifdef USE_MPS
DEPLOY_MPS_OPERATOR(PasteMask, PasteMask);
#endif
OPERATOR_SCHEMA(PasteMask).NumInputs(2).NumOutputs(1);
NO_GRADIENT(PasteMask);
} // namespace dragon
#include "../operators/nms_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void NonMaxSuppressionOp<Context>::DoRunWithType() {
auto &X = Input(0), *Y = Output(0);
CHECK(X.ndim() == 2 && X.dim(1) == 5)
<< "\nThe dimensions of boxes should be (num_boxes, 5).";
detection::ApplyNMS(
X.dim(0),
X.dim(0),
0,
iou_threshold_,
X.template mutable_data<T, Context>(),
out_indices_,
ctx());
Y->template CopyFrom<int64_t>(out_indices_);
}
DEPLOY_CPU_OPERATOR(NonMaxSuppression);
#ifdef USE_CUDA
DEPLOY_CUDA_OPERATOR(NonMaxSuppression);
#endif
#ifdef USE_MPS
DEPLOY_MPS_OPERATOR(NonMaxSuppression, NonMaxSuppression);
#endif
OPERATOR_SCHEMA(NonMaxSuppression).NumInputs(1).NumOutputs(1);
NO_GRADIENT(NonMaxSuppression);
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#define DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#include <dragon/core/operator.h>
namespace dragon {
template <class Context>
class NonMaxSuppressionOp final : public Operator<Context> {
public:
NonMaxSuppressionOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
iou_threshold_(OP_SINGLE_ARG(float, "iou_threshold", 0.5f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(0));
}
template <typename T>
void DoRunWithType();
protected:
float iou_threshold_;
vector<int64_t> out_indices_;
};
} // namespace dragon
#endif // DRAGON_EXTENSION_OPERATORS_NMS_OP_H_
#include "../operators/retinanet_decoder_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void RetinaNetDecoderOp<Context>::DoRunWithType() {
auto N = Input(SCORES).dim(0);
auto AxK = Input(SCORES).dim(1);
auto C = Input(SCORES).dim(2);
auto AxKxC = AxK * C;
auto A = int64_t(ratios_.size() * scales_.size());
auto num_lvls = int64_t(strides_.size());
// Generate anchors.
CHECK_EQ(Input(GRID_INFO).dim(0), num_lvls);
cell_anchors_.resize(strides_.size());
vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
for (int i = 0; i < strides_.size(); ++i) {
grid_args[i].stride = strides_[i];
auto& anchors = cell_anchors_[i];
if (int64_t(anchors.size()) == A * 4) continue;
anchors.resize(A * 4);
detection::GenerateAnchors(
strides_[i],
int64_t(ratios_.size()),
int64_t(scales_.size()),
ratios_.data(),
scales_.data(),
anchors.data());
}
// Set grid arguments.
auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
detection::SetGridArgs(AxK, A, grid_info, grid_args);
// Decode detections.
auto* scores = Input(SCORES).template data<T, Context>();
auto* deltas = Input(DELTAS).template data<T, CPUContext>();
auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
auto* Y = Output(0)->Reshape({N * num_lvls * pre_nms_topk_, 7});
auto* dets = Y->template mutable_data<float, CPUContext>();
int64_t size_dets = 0;
for (int batch_ind = 0; batch_ind < N; ++batch_ind) {
detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
im_args.batch_ind = batch_ind;
for (int lvl_ind = 0; lvl_ind < num_lvls; ++lvl_ind) {
detection::SelectTopK(
grid_args[lvl_ind].size * C,
pre_nms_topk_,
score_thresh_,
scores + batch_ind * AxKxC + grid_args[lvl_ind].offset * C,
scores_,
indices_,
ctx());
auto* offset_dets = dets + size_dets * 7;
auto num_dets = int64_t(indices_.size());
size_dets += num_dets;
detection::GetAnchors(
num_dets,
A, // num_cell_anchors
C, // num_classes
grid_args[lvl_ind],
cell_anchors_[lvl_ind].data(),
indices_.data(),
offset_dets);
detection::DecodeDetections(
num_dets,
AxK, // num_anchors
C, // num_classes
im_args,
grid_args[lvl_ind],
scores_.data(),
deltas + batch_ind * Input(DELTAS).stride(0),
indices_.data(),
offset_dets);
}
}
// Shrink to the correct dimensions.
Y->Reshape({size_dets, 7});
}
DEPLOY_CPU_OPERATOR(RetinaNetDecoder);
#ifdef USE_CUDA
DEPLOY_CUDA_OPERATOR(RetinaNetDecoder);
#endif
#ifdef USE_MPS
REGISTER_MPS_OPERATOR(RetinaNetDecoder, RetinaNetDecoderOp<CPUContext>);
#endif
OPERATOR_SCHEMA(RetinaNetDecoder).NumInputs(4).NumOutputs(1);
NO_GRADIENT(RetinaNetDecoder);
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
#define DRAGON_EXTENSION_OPERATORS_RETINANET_DECODER_OP_H_
#include <dragon/core/operator.h>
namespace dragon {
template <class Context>
class RetinaNetDecoderOp final : public Operator<Context> {
public:
RetinaNetDecoderOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topk_(OP_SINGLE_ARG(int64_t, "pre_nms_topk", 1000)),
score_thresh_(OP_SINGLE_ARG(float, "score_thresh", 0.05f)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
}
template <typename T>
void DoRunWithType();
enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
protected:
float score_thresh_;
vector<int64_t> strides_;
vector<float> ratios_, scales_;
int64_t pre_nms_topk_;
vector<float> scores_;
vector<int64_t> indices_;
vector<vector<float>> cell_anchors_;
};
} // namespace dragon
#endif // DRAGON_EXTENSION_PERATORS_RETINANET_DECODER_OP_H_
#include "../operators/rpn_decoder_op.h"
#include "../utils/detection.h"
namespace dragon {
template <class Context>
template <typename T>
void RPNDecoderOp<Context>::DoRunWithType() {
auto N = Input(SCORES).dim(0);
auto AxK = Input(SCORES).dim(1);
auto A = int64_t(ratios_.size() * scales_.size());
auto num_lvls = int64_t(strides_.size());
// Generate anchors.
CHECK_EQ(Input(GRID_INFO).dim(0), num_lvls);
cell_anchors_.resize(strides_.size());
vector<detection::GridArgs<int64_t>> grid_args(strides_.size());
for (int i = 0; i < strides_.size(); ++i) {
grid_args[i].stride = strides_[i];
auto& anchors = cell_anchors_[i];
if (int64_t(anchors.size()) == A * 4) continue;
anchors.resize(A * 4);
detection::GenerateAnchors(
strides_[i],
int64_t(ratios_.size()),
int64_t(scales_.size()),
ratios_.data(),
scales_.data(),
anchors.data());
}
// Set grid arguments.
auto* grid_info = Input(GRID_INFO).template data<int64_t, CPUContext>();
detection::SetGridArgs(AxK, A, grid_info, grid_args);
// Decode proposals.
auto* scores = Input(SCORES).template data<T, CPUContext>();
auto* deltas = Input(DELTAS).template data<T, CPUContext>();
auto* im_info = Input(IM_INFO).template data<float, CPUContext>();
auto* Y = Output("Y")->Reshape({N * num_lvls * pre_nms_topk_, 5});
auto* dets = Y->template mutable_data<float, CPUContext>();
for (int batch_ind = 0; batch_ind < N; ++batch_ind) {
detection::ImageArgs<int64_t> im_args(im_info + batch_ind * 4);
im_args.batch_ind = batch_ind;
for (int lvl_ind = 0; lvl_ind < num_lvls; ++lvl_ind) {
detection::SelectTopK(
grid_args[lvl_ind].size,
pre_nms_topk_,
0.f,
scores + batch_ind * AxK + grid_args[lvl_ind].offset,
scores_,
indices_,
(CPUContext*)nullptr); // Faster.
indices_.resize(pre_nms_topk_, indices_.back());
auto* offset_dets = dets + lvl_ind * pre_nms_topk_ * 5;
detection::GetAnchors(
pre_nms_topk_,
A, // num_cell_anchors
grid_args[lvl_ind],
cell_anchors_[lvl_ind].data(),
indices_.data(),
offset_dets);
detection::DecodeProposals(
pre_nms_topk_,
AxK, // num_anchors
im_args,
grid_args[lvl_ind],
scores_.data(),
deltas + batch_ind * Input(DELTAS).stride(0),
indices_.data(),
offset_dets);
detection::SortBoxes<T, detection::Box5d<T>>(pre_nms_topk_, offset_dets);
}
}
// Apply NMS.
auto* dets_v2 = Y->template data<float, Context>();
int64_t size_rois = 0;
scores_.resize(N * post_nms_topk_);
indices_.resize(N * post_nms_topk_);
for (int batch_ind = 0; batch_ind < N; ++batch_ind) {
std::priority_queue<std::pair<float, int64_t>> pq;
for (int lvl_ind = 0; lvl_ind < num_lvls; ++lvl_ind) {
const auto offset = lvl_ind * pre_nms_topk_;
detection::ApplyNMS(
pre_nms_topk_, // N
pre_nms_topk_, // K
offset * 5, // boxes_offset
nms_thresh_,
dets_v2,
nms_indices_,
ctx());
for (size_t i = 0; i < nms_indices_.size(); ++i) {
const auto index = nms_indices_[i] + offset;
pq.push(std::make_pair(*(dets + index * 5 + 4), index));
}
}
for (int i = 0; i < post_nms_topk_ && !pq.empty(); ++i) {
scores_[size_rois] = batch_ind;
indices_[size_rois++] = pq.top().second;
pq.pop();
}
}
// Apply Histogram.
detection::ApplyHistogram(
size_rois,
min_level_,
max_level_,
canonical_level_,
canonical_scale_,
dets,
scores_.data(),
indices_.data(),
output_rois_);
// Copy to outputs.
for (int i = 0; i < OutputSize(); ++i) {
const auto& rois = output_rois_[i];
vector<int64_t> dims({int64_t(rois.size()) / 5, 5});
auto* Yi = Output(i)->Reshape(dims);
std::memcpy(
Yi->template mutable_data<T, CPUContext>(),
rois.data(),
sizeof(T) * rois.size());
}
}
DEPLOY_CPU_OPERATOR(RPNDecoder);
#ifdef USE_CUDA
DEPLOY_CUDA_OPERATOR(RPNDecoder);
#endif
#ifdef USE_MPS
DEPLOY_MPS_OPERATOR(RPNDecoder, RPNDecoder);
#endif
OPERATOR_SCHEMA(RPNDecoder).NumInputs(4).NumOutputs(1, INT_MAX);
NO_GRADIENT(RPNDecoder);
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
#define DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
#include <dragon/core/operator.h>
namespace dragon {
template <class Context>
class RPNDecoderOp final : public Operator<Context> {
public:
RPNDecoderOp(const OperatorDef& def, Workspace* ws)
: Operator<Context>(def, ws),
strides_(OP_REPEATED_ARG(int64_t, "strides")),
ratios_(OP_REPEATED_ARG(float, "ratios")),
scales_(OP_REPEATED_ARG(float, "scales")),
pre_nms_topk_(OP_SINGLE_ARG(int64_t, "pre_nms_topk", 1000)),
post_nms_topk_(OP_SINGLE_ARG(int64_t, "post_nms_topk", 1000)),
nms_thresh_(OP_SINGLE_ARG(float, "nms_thresh", 0.7f)),
min_level_(OP_SINGLE_ARG(int64_t, "min_level", 2)),
max_level_(OP_SINGLE_ARG(int64_t, "max_level", 5)),
canonical_level_(OP_SINGLE_ARG(int64_t, "canonical_level", 4)),
canonical_scale_(OP_SINGLE_ARG(int64_t, "canonical_scale", 224)) {}
USE_OPERATOR_FUNCTIONS;
void RunOnDevice() override {
DispatchHelper<dtypes::TypesBase<float>>::Call(this, Input(SCORES));
}
template <typename T>
void DoRunWithType();
enum INPUT_TAGS { SCORES = 0, DELTAS = 1, IM_INFO = 2, GRID_INFO = 3 };
protected:
float nms_thresh_;
vector<int64_t> strides_;
vector<float> ratios_, scales_;
int64_t min_level_, max_level_;
int64_t pre_nms_topk_, post_nms_topk_;
int64_t canonical_level_, canonical_scale_;
vector<float> scores_;
vector<int64_t> indices_, nms_indices_;
vector<vector<float>> cell_anchors_;
vector<vector<float>> output_rois_;
};
} // namespace dragon
#endif // DRAGON_EXTENSION_OPERATORS_RPN_DECODER_OP_H_
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build cpp extensions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import glob
import dragon
from dragon.utils import cpp_extension
from setuptools import setup
Extension = cpp_extension.CppExtension
if (dragon.cuda.is_available() and
cpp_extension.CUDA_HOME is not None):
Extension = cpp_extension.CUDAExtension
elif dragon.mps.is_available():
Extension = cpp_extension.MPSExtension
def find_sources(*dirs):
ext_suffixes = ['.cc']
if Extension is cpp_extension.CUDAExtension:
ext_suffixes.append('.cu')
elif Extension is cpp_extension.MPSExtension:
ext_suffixes.append('.mm')
sources = []
for path in dirs:
for ext_suffix in ext_suffixes:
sources += glob.glob(path + '/*' + ext_suffix, recursive=True)
return sources
ext_modules = [
Extension(
name='seetadet.ops._C',
sources=find_sources('**'),
),
]
setup(
name='seetadet',
ext_modules=ext_modules,
cmdclass={'build_ext': cpp_extension.BuildExtension},
)
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_H_
#include "../utils/detection/anchors.h"
#include "../utils/detection/bbox.h"
#include "../utils/detection/mask.h"
#include "../utils/detection/nms.h"
#include "../utils/detection/proposals.h"
#include "../utils/detection/types.h"
#endif // DRAGON_EXTENSION_UTILS_DETECTION_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
/*!
* Anchor Functions.
*/
template <typename IndexT>
inline void SetGridArgs(
const int num_anchors,
const int num_cell_anchors,
const IndexT* grid_info,
vector<GridArgs<IndexT>>& grid_args) {
IndexT grid_offset = 0;
for (int i = 0; i < grid_args.size(); ++i, grid_info += 2) {
auto& args = grid_args[i];
args.h = grid_info[0];
args.w = grid_info[1];
args.size = num_cell_anchors * args.h * args.w;
args.offset = grid_offset;
grid_offset += args.size;
}
std::stringstream ss;
if (grid_offset != num_anchors) {
ss << "Mismatched number of anchors. (Excepted ";
ss << num_anchors << ", Got " << grid_offset << ")";
for (int i = 0; i < grid_args.size(); ++i) {
ss << "\nGrid #" << i << ": "
<< "A=" << num_cell_anchors << ", H=" << grid_args[i].h
<< ", W=" << grid_args[i].w << "\n";
}
}
if (!ss.str().empty()) LOG(FATAL) << ss.str();
}
template <typename T>
inline void GenerateAnchors(
const int stride,
const int num_ratios,
const int num_scales,
const T* ratios,
const T* scales,
T* anchors) {
T* offset_anchors = anchors;
T x = T(0.5) * T(stride), y = T(0.5) * T(stride);
for (int i = 0; i < num_ratios; ++i) {
const T ratio_w = std::sqrt(T(1) / ratios[i]);
const T ratio_h = ratio_w * ratios[i];
for (int j = 0; j < num_scales; ++j) {
offset_anchors[0] = -x * ratio_w * scales[j];
offset_anchors[1] = -y * ratio_h * scales[j];
offset_anchors[2] = x * ratio_w * scales[j];
offset_anchors[3] = y * ratio_h * scales[j];
offset_anchors += 4;
}
}
}
template <typename T>
inline void GetAnchors(
const int num_anchors,
const int num_cell_anchors,
const GridArgs<int64_t>& args,
const T* cell_anchors,
const int64_t* indices,
T* anchors) {
for (int i = 0; i < num_anchors; ++i) {
auto index = indices[i];
const auto w = index % args.w;
index /= args.w;
const auto h = index % args.h;
index /= args.h;
const auto shift_x = T(w * args.stride);
const auto shift_y = T(h * args.stride);
auto* offset_anchors = anchors + i * 5;
const auto* offset_cell_anchors = cell_anchors + index * 4;
offset_anchors[0] = shift_x + offset_cell_anchors[0];
offset_anchors[1] = shift_y + offset_cell_anchors[1];
offset_anchors[2] = shift_x + offset_cell_anchors[2];
offset_anchors[3] = shift_y + offset_cell_anchors[3];
}
}
template <typename T>
inline void GetAnchors(
const int num_anchors,
const int num_cell_anchors,
const int num_classes,
const GridArgs<int64_t>& args,
const T* cell_anchors,
const int64_t* indices,
T* anchors) {
for (int i = 0; i < num_anchors; ++i) {
auto index = indices[i];
index /= num_classes;
const auto w = index % args.w;
index /= args.w;
const auto h = index % args.h;
index /= args.h;
const auto shift_x = T(w * args.stride);
const auto shift_y = T(h * args.stride);
auto* offset_anchors = anchors + i * 7 + 1;
const auto* offset_cell_anchors = cell_anchors + index * 4;
offset_anchors[0] = shift_x + offset_cell_anchors[0];
offset_anchors[1] = shift_y + offset_cell_anchors[1];
offset_anchors[2] = shift_x + offset_cell_anchors[2];
offset_anchors[3] = shift_y + offset_cell_anchors[3];
}
}
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_ANCHORS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#include "../../utils/detection/types.h"
#if defined(__CUDACC__)
#define HOSTDEVICE_DECL inline __host__ __device__
#else
#define HOSTDEVICE_DECL inline
#endif
namespace dragon {
namespace detection {
/*
* BBox Functions.
*/
template <typename T, class BoxT>
inline void SortBoxes(const int N, T* data, bool descend = true) {
auto* boxes = reinterpret_cast<BoxT*>(data);
std::sort(boxes, boxes + N, [descend](BoxT lhs, BoxT rhs) {
return descend ? (lhs.score > rhs.score) : (lhs.score < rhs.score);
});
}
/*
* BBox Utilities.
*/
namespace utils {
template <typename T>
HOSTDEVICE_DECL bool CheckIoU(const T thresh, const T* a, const T* b) {
#if defined(__CUDACC__)
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
const T x2 = min(a[2], b[2]);
const T y2 = min(a[3], b[3]);
const T width = max(T(0), x2 - x1);
const T height = max(T(0), y2 - y1);
#else
const T x1 = std::max(a[0], b[0]);
const T y1 = std::max(a[1], b[1]);
const T x2 = std::min(a[2], b[2]);
const T y2 = std::min(a[3], b[3]);
const T width = std::max(T(0), x2 - x1);
const T height = std::max(T(0), y2 - y1);
#endif
const T inter = width * height;
const T Sa = (a[2] - a[0]) * (a[3] - a[1]);
const T Sb = (b[2] - b[0]) * (b[3] - b[1]);
return inter >= thresh * (Sa + Sb - inter);
}
template <typename T>
inline void BBoxTransform(
const T dx,
const T dy,
const T dw,
const T dh,
const T im_w,
const T im_h,
const T im_scale_h,
const T im_scale_w,
T* bbox) {
const T w = bbox[2] - bbox[0];
const T h = bbox[3] - bbox[1];
const T ctr_x = bbox[0] + T(0.5) * w;
const T ctr_y = bbox[1] + T(0.5) * h;
const T pred_ctr_x = dx * w + ctr_x;
const T pred_ctr_y = dy * h + ctr_y;
const T pred_w = std::exp(dw) * w;
const T pred_h = std::exp(dh) * h;
const T x1 = pred_ctr_x - T(0.5) * pred_w;
const T y1 = pred_ctr_y - T(0.5) * pred_h;
const T x2 = pred_ctr_x + T(0.5) * pred_w;
const T y2 = pred_ctr_y + T(0.5) * pred_h;
bbox[0] = std::max(T(0), std::min(x1, im_w)) / im_scale_w;
bbox[1] = std::max(T(0), std::min(y1, im_h)) / im_scale_h;
bbox[2] = std::max(T(0), std::min(x2, im_w)) / im_scale_w;
bbox[3] = std::max(T(0), std::min(y2, im_h)) / im_scale_h;
}
template <typename T>
inline int GetBBoxLevel(
const int lvl_min,
const int lvl_max,
const int lvl0,
const int s0,
T* bbox) {
const T w = bbox[2] - bbox[0];
const T h = bbox[3] - bbox[1];
if (w <= T(0) || h <= T(0)) return -1;
const T s = std::sqrt(w * h);
const int lvl = lvl0 + std::log2(s / s0 + T(1e-6));
return std::min(std::max(lvl, lvl_min), lvl_max);
}
} // namespace utils
} // namespace detection
} // namespace dragon
#undef HOSTDEVICE_DECL
#endif // DRAGON_EXTENSION_UTILS_DETECTION_BBOX_H_
#include <dragon/core/context.h>
#include "../../../utils/detection/mask.h"
namespace dragon {
namespace detection {
namespace {
template <typename IndexT>
inline bool WithinBounds2d(IndexT h, IndexT w, IndexT H, IndexT W) {
return h >= IndexT(0) && h < H && w >= IndexT(0) && w < W;
}
template <typename T>
void _PasteMask(
const int N,
const int H,
const int W,
const int mask_h,
const int mask_w,
const T thresh,
const T* masks,
const float* boxes,
uint8_t* im) {
const auto HxW = H * W;
for (int n = 0; n < N; ++n) {
const auto count = H * W;
const float* box = boxes + n * 4;
const T* mask = masks + n * mask_h * mask_w;
uint8_t* offset_im = im + n * H * W;
const float box_w_half = (box[2] - box[0]) * 0.5f;
const float box_h_half = (box[3] - box[1]) * 0.5f;
const float mask_w_half = float(mask_w) * 0.5f;
const float mask_h_half = float(mask_w) * 0.5f;
for (int index = 0; index < HxW; ++index) {
const int w = index % W;
const int h = index / W;
const float gx = (float(w) + 0.5f - box[0]) / box_w_half;
const float gy = (float(h) + 0.5f - box[1]) / box_h_half;
const float ix = gx * mask_w_half - 0.5f;
const float iy = gy * mask_h_half - 0.5f;
const int ix_nw = floorf(ix);
const int iy_nw = floorf(iy);
const int ix_ne = ix_nw + 1;
const int iy_ne = iy_nw;
const int ix_sw = ix_nw;
const int iy_sw = iy_nw + 1;
const int ix_se = ix_nw + 1;
const int iy_se = iy_nw + 1;
T nw = T((ix_se - ix) * (iy_se - iy));
T ne = T((ix - ix_sw) * (iy_sw - iy));
T sw = T((ix_ne - ix) * (iy - iy_ne));
T se = T((ix - ix_nw) * (iy - iy_nw));
T val = T(0);
if (WithinBounds2d(iy_nw, ix_nw, mask_h, mask_w)) {
val += mask[iy_nw * mask_w + ix_nw] * nw;
}
if (WithinBounds2d(iy_ne, ix_ne, mask_h, mask_w)) {
val += mask[iy_ne * mask_w + ix_ne] * ne;
}
if (WithinBounds2d(iy_sw, ix_sw, mask_h, mask_w)) {
val += mask[iy_sw * mask_w + ix_sw] * sw;
}
if (WithinBounds2d(iy_se, ix_se, mask_h, mask_w)) {
val += mask[iy_se * mask_w + ix_se] * se;
}
*(offset_im++) = (val >= thresh ? uint8_t(1) : uint8_t(0));
}
}
}
} // namespace
template <>
void PasteMask<float, CPUContext>(
const int N,
const int H,
const int W,
const int mask_h,
const int mask_w,
const float thresh,
const float* masks,
const float* boxes,
uint8_t* im,
CPUContext* ctx) {
_PasteMask(N, H, W, mask_h, mask_w, thresh, masks, boxes, im);
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context.h>
#include "../../../utils/detection/bbox.h"
#include "../../../utils/detection/nms.h"
namespace dragon {
namespace detection {
template <>
void ApplyNMS<float, CPUContext>(
const int N,
const int K,
const int boxes_offset,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
CPUContext* ctx) {
boxes = boxes + boxes_offset;
int num_selected = 0;
indices.resize(K);
vector<char> is_dead(N, 0);
for (int i = 0; i < N; ++i) {
if (is_dead[i]) continue;
indices[num_selected++] = i;
if (num_selected >= K) break;
for (int j = i + 1; j < N; ++j) {
if (is_dead[j]) continue;
if (!utils::CheckIoU(thresh, &boxes[i * 5], &boxes[j * 5])) continue;
is_dead[j] = 1;
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context.h>
#include "../../../utils/detection/proposals.h"
namespace dragon {
namespace detection {
namespace {
template <typename KeyT, typename ValueT>
inline void
ArgPartition(const int N, const int K, const ValueT* values, KeyT* keys) {
std::nth_element(keys, keys + K, keys + N, [&values](KeyT lhs, KeyT rhs) {
return values[lhs] > values[rhs];
});
}
} // namespace
template <>
void SelectTopK<float, CPUContext>(
const int N,
const int K,
const float thresh,
const float* scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CPUContext* ctx) {
int num_selected = 0;
out_indices.resize(N);
if (thresh > 0.f) {
for (int i = 0; i < N; ++i) {
if (scores[i] > thresh) {
out_indices[num_selected++] = i;
}
}
} else {
num_selected = N;
std::iota(out_indices.begin(), out_indices.end(), 0);
}
if (num_selected > K) {
ArgPartition(num_selected, K, scores, out_indices.data());
out_scores.resize(K);
out_indices.resize(K);
for (int i = 0; i < K; ++i) {
out_scores[i] = scores[out_indices[i]];
}
} else {
out_scores.resize(num_selected);
out_indices.resize(num_selected);
for (int i = 0; i < num_selected; ++i) {
out_scores[i] = scores[out_indices[i]];
}
}
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include "../../../utils/detection/mask.h"
namespace dragon {
namespace detection {
namespace {
template <typename IndexT>
inline __device__ bool WithinBounds2d(IndexT h, IndexT w, IndexT H, IndexT W) {
return h >= IndexT(0) && h < H && w >= IndexT(0) && w < W;
}
template <typename T>
__global__ void _PasteMask(
const int nthreads,
const int H,
const int W,
const int mask_h,
const int mask_w,
const T thresh,
const T* masks,
const float* boxes,
uint8_t* im) {
CUDA_1D_KERNEL_LOOP(index, nthreads) {
const int w = index % W;
const int h = index / W % H;
const int n = index / (H * W);
const float* box = boxes + n * 4;
const T* mask = masks + n * mask_h * mask_w;
const float gx = (float(w) + 0.5f - box[0]) / (box[2] - box[0]) * 2.f;
const float gy = (float(h) + 0.5f - box[1]) / (box[3] - box[1]) * 2.f;
const float ix = (gx * float(mask_w) - 1.f) * 0.5f;
const float iy = (gy * float(mask_h) - 1.f) * 0.5f;
const int ix_nw = floorf(ix);
const int iy_nw = floorf(iy);
const int ix_ne = ix_nw + 1;
const int iy_ne = iy_nw;
const int ix_sw = ix_nw;
const int iy_sw = iy_nw + 1;
const int ix_se = ix_nw + 1;
const int iy_se = iy_nw + 1;
T nw = T((ix_se - ix) * (iy_se - iy));
T ne = T((ix - ix_sw) * (iy_sw - iy));
T sw = T((ix_ne - ix) * (iy - iy_ne));
T se = T((ix - ix_nw) * (iy - iy_nw));
T val = T(0);
if (WithinBounds2d(iy_nw, ix_nw, mask_h, mask_w)) {
val += mask[iy_nw * mask_w + ix_nw] * nw;
}
if (WithinBounds2d(iy_ne, ix_ne, mask_h, mask_w)) {
val += mask[iy_ne * mask_w + ix_ne] * ne;
}
if (WithinBounds2d(iy_sw, ix_sw, mask_h, mask_w)) {
val += mask[iy_sw * mask_w + ix_sw] * sw;
}
if (WithinBounds2d(iy_se, ix_se, mask_h, mask_w)) {
val += mask[iy_se * mask_w + ix_se] * se;
}
im[index] = (val >= thresh ? uint8_t(1) : uint8_t(0));
}
}
} // namespace
template <>
void PasteMask<float, CUDAContext>(
const int N,
const int H,
const int W,
const int mask_h,
const int mask_w,
const float thresh,
const float* masks,
const float* boxes,
uint8_t* im,
CUDAContext* ctx) {
const auto NxHxW = N * H * W;
_PasteMask<<<CUDA_BLOCKS(NxHxW), CUDA_THREADS, 0, ctx->cuda_stream()>>>(
NxHxW, H, W, mask_h, mask_w, thresh, masks, boxes, im);
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include "../../../utils/detection/bbox.h"
#include "../../../utils/detection/nms.h"
#include "../../../utils/detection/utils.h"
namespace dragon {
namespace detection {
namespace {
#define NUM_THREADS 64
template <typename T>
__global__ void _NonMaxSuppression(
const int N,
const T thresh,
const T* boxes,
uint64_t* mask) {
const int row_start = blockIdx.y;
const int col_start = blockIdx.x;
if (row_start > col_start) return;
const int row_size = min(N - row_start * NUM_THREADS, NUM_THREADS);
const int col_size = min(N - col_start * NUM_THREADS, NUM_THREADS);
__shared__ T block_boxes[NUM_THREADS * 4];
if (threadIdx.x < col_size) {
auto* offset_block_boxes = block_boxes + threadIdx.x * 4;
auto* offset_boxes = boxes + (col_start * NUM_THREADS + threadIdx.x) * 5;
#pragma unroll
for (int i = 0; i < 4; ++i) {
*(offset_block_boxes++) = *(offset_boxes++);
}
}
__syncthreads();
if (threadIdx.x < row_size) {
const int index = row_start * NUM_THREADS + threadIdx.x;
const T* offset_boxes = boxes + index * 5;
uint64_t val = 0;
const int start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
for (int i = start; i < col_size; ++i) {
if (utils::CheckIoU(thresh, offset_boxes, block_boxes + i * 4)) {
val |= (uint64_t(1) << i);
}
}
mask[index * gridDim.x + col_start] = val;
}
}
} // namespace
template <>
void ApplyNMS<float, CUDAContext>(
const int N,
const int K,
const int boxes_offset,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
CUDAContext* ctx) {
boxes = boxes + boxes_offset;
const auto num_blocks = utils::DivUp(N, NUM_THREADS);
auto* NMS_mask = ctx->workspace()->CreateTensor("NMS_mask");
NMS_mask->Reshape({N * num_blocks});
auto* mask = reinterpret_cast<uint64_t*>(
NMS_mask->template mutable_data<int64_t, CUDAContext>());
vector<uint64_t> mask_host(N * num_blocks);
_NonMaxSuppression<<<
dim3(num_blocks, num_blocks),
NUM_THREADS,
0,
ctx->cuda_stream()>>>(N, thresh, boxes, mask);
CUDA_CHECK(cudaMemcpyAsync(
mask_host.data(),
mask,
mask_host.size() * sizeof(uint64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
ctx->FinishDeviceComputation();
vector<uint64_t> is_dead(num_blocks);
memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
int num_selected = 0;
indices.resize(K);
for (int i = 0; i < N; ++i) {
const int nblock = i / NUM_THREADS, inblock = i % NUM_THREADS;
if (!(is_dead[nblock] & (uint64_t(1) << inblock))) {
indices[num_selected++] = i;
if (num_selected >= K) break;
auto* offset_mask = &mask_host[0] + i * num_blocks;
for (int j = nblock; j < num_blocks; ++j) {
is_dead[j] |= offset_mask[j];
}
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_cuda.h>
#include <dragon/core/workspace.h>
#include <dragon/utils/device/common_thrust.h>
#include "../../../utils/detection/iterator.h"
#include "../../../utils/detection/proposals.h"
namespace dragon {
namespace detection {
namespace {
template <typename KeyT, typename ValueT>
struct ThresholdFunctor {
ThresholdFunctor(ValueT thresh) : thresh_(thresh) {}
inline __device__ bool operator()(
const thrust::tuple<KeyT, ValueT>& kv) const {
return thrust::get<1>(kv) > thresh_;
}
ValueT thresh_;
};
template <typename IterT>
inline void ArgPartition(const int N, const int K, IterT data) {
std::nth_element(
data,
data + K,
data + N,
[](const typename IterT::value_type& lhs,
const typename IterT::value_type& rhs) {
return *lhs.value_ptr > *rhs.value_ptr;
});
}
} // namespace
template <>
void SelectTopK<float, CUDAContext>(
const int N,
const int K,
const float thresh,
const float* scores,
vector<float>& out_scores,
vector<int64_t>& out_indices,
CUDAContext* ctx) {
int num_selected = N;
int64_t* indices = nullptr;
if (thresh > 0.f) {
indices = ctx->workspace()->data<int64_t, CUDAContext>(N, "BufferKernel");
auto policy = thrust::cuda::par.on(ctx->cuda_stream());
auto functor = ThresholdFunctor<int64_t, float>(thresh);
thrust::sequence(policy, indices, indices + N);
auto kv = thrust::make_tuple(indices, const_cast<float*>(scores));
auto first = thrust::make_zip_iterator(kv);
auto last = thrust::partition(policy, first, first + N, functor);
num_selected = last - first;
}
out_scores.resize(num_selected);
out_indices.resize(num_selected);
CUDA_CHECK(cudaMemcpyAsync(
out_scores.data(),
scores,
num_selected * sizeof(float),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
if (thresh > 0.f) {
CUDA_CHECK(cudaMemcpyAsync(
out_indices.data(),
indices,
num_selected * sizeof(int64_t),
cudaMemcpyDeviceToHost,
ctx->cuda_stream()));
} else {
std::iota(out_indices.begin(), out_indices.end(), 0);
}
ctx->FinishDeviceComputation();
if (num_selected > K) {
auto iter = KeyValueMapIterator<KeyValueMap<int64_t, float>>(
out_indices.data(), out_scores.data());
ArgPartition(num_selected, K, iter);
out_scores.resize(K);
out_indices.resize(K);
}
}
} // namespace detection
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
#include <dragon/core/common.h>
namespace dragon {
namespace detection {
template <typename MapT>
class KeyValueMapIterator
: public std::iterator<std::input_iterator_tag, MapT> {
public:
typedef KeyValueMapIterator self_type;
typedef ptrdiff_t difference_type;
typedef MapT value_type;
typedef MapT& reference;
KeyValueMapIterator(
typename MapT::key_type* key_ptr,
typename MapT::value_type* value_ptr)
: key_ptr_(key_ptr), value_ptr_(value_ptr) {}
self_type operator++(int) {
self_type ret = *this;
key_ptr_++;
value_ptr_++;
return ret;
}
self_type operator++() {
key_ptr_++;
value_ptr_++;
return *this;
}
self_type operator--() {
key_ptr_--;
value_ptr_--;
return *this;
}
self_type operator--(int) {
self_type ret = *this;
key_ptr_--;
value_ptr_--;
return ret;
}
reference operator*() const {
if (map_.key_ptr != key_ptr_) {
map_.key_ptr = key_ptr_;
map_.value_ptr = value_ptr_;
}
return map_;
}
self_type operator+(difference_type n) const {
return self_type(key_ptr_ + n, value_ptr_ + n);
}
self_type& operator+=(difference_type n) {
key_ptr_ += n;
value_ptr_ += n;
return *this;
}
self_type operator-(difference_type n) const {
return self_type(key_ptr_ - n, value_ptr_ - n);
}
self_type& operator-=(difference_type n) {
key_ptr_ -= n;
value_ptr_ -= n;
return *this;
}
difference_type operator-(self_type other) const {
return key_ptr_ - other.key_ptr_;
}
bool operator<(const self_type& rhs) const {
return key_ptr_ < rhs.key_ptr_;
}
bool operator<=(const self_type& rhs) const {
return key_ptr_ <= rhs.key_ptr_;
}
bool operator==(const self_type& rhs) const {
return key_ptr_ == rhs.key_ptr_;
}
bool operator!=(const self_type& rhs) const {
return key_ptr_ != rhs.key_ptr_;
}
private:
mutable MapT map_;
typename MapT::key_type* key_ptr_;
typename MapT::value_type* value_ptr_;
};
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_ITERATOR_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_MASK_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_MASK_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
/*
* Mask Functions.
*/
template <typename T, class Context>
void PasteMask(
const int N,
const int H,
const int W,
const int mask_h,
const int mask_w,
const float thresh,
const T* masks,
const float* boxes,
uint8_t* im,
Context* ctx);
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_MASK_H_
#include <dragon/core/context_mps.h>
#include "../../../utils/detection/mask.h"
namespace dragon {
namespace detection {
namespace {
const static string METAL_SHADERS = R"(
#include <metal_stdlib>
using namespace metal;
constant int int_arg1 [[function_constant(0)]]; // H
constant int int_arg2 [[function_constant(1)]]; // W
constant int int_arg3 [[function_constant(2)]]; // mask_h
constant int int_arg4 [[function_constant(3)]]; // mask_w
constant float float_arg1 [[function_constant(4)]]; // thresh
template <typename IndexT>
bool WithinBounds2d(IndexT h, IndexT w, IndexT H, IndexT W) {
return h >= IndexT(0) && h < H && w >= IndexT(0) && w < W;
}
template <typename T>
kernel void PasteMask(
device const T* masks,
device const float* boxes,
device uint8_t* im,
const uint index [[thread_position_in_grid]]) {
const int w = int(index) % int_arg2;
const int h = int(index) / int_arg2 % int_arg1;
const int n = int(index) / (int_arg2 * int_arg1);
device const float* box = boxes + n * 4;
device const T* mask = masks + n * int_arg3 * int_arg4;
const float gx = (float(w) + 0.5f - box[0]) / (box[2] - box[0]) * 2.f;
const float gy = (float(h) + 0.5f - box[1]) / (box[3] - box[1]) * 2.f;
const float ix = (gx * float(int_arg4) - 1.f) * 0.5f;
const float iy = (gy * float(int_arg3) - 1.f) * 0.5f;
const int ix_nw = floor(ix);
const int iy_nw = floor(iy);
const int ix_ne = ix_nw + 1;
const int iy_ne = iy_nw;
const int ix_sw = ix_nw;
const int iy_sw = iy_nw + 1;
const int ix_se = ix_nw + 1;
const int iy_se = iy_nw + 1;
T nw = T((ix_se - ix) * (iy_se - iy));
T ne = T((ix - ix_sw) * (iy_sw - iy));
T sw = T((ix_ne - ix) * (iy - iy_ne));
T se = T((ix - ix_nw) * (iy - iy_nw));
T val = T(0);
if (WithinBounds2d(iy_nw, ix_nw, int_arg3, int_arg4)) {
val += mask[iy_nw * int_arg4 + ix_nw] * nw;
}
if (WithinBounds2d(iy_ne, ix_ne, int_arg3, int_arg4)) {
val += mask[iy_ne * int_arg4 + ix_ne] * ne;
}
if (WithinBounds2d(iy_sw, ix_sw, int_arg3, int_arg4)) {
val += mask[iy_sw * int_arg4 + ix_sw] * sw;
}
if (WithinBounds2d(iy_se, ix_se, int_arg3, int_arg4)) {
val += mask[iy_se * int_arg4 + ix_se] * se;
}
im[index] = (val >= T(float_arg1) ? uint8_t(1) : uint8_t(0));
}
#define INSTANTIATE_KERNEL(T) \
template [[host_name("PasteMask_"#T)]] \
kernel void PasteMask( \
device const T*, device const float*, device uint8_t*, uint);
INSTANTIATE_KERNEL(float);
#undef INSTANTIATE_KERNEL
)";
} // namespace
template <>
void PasteMask<float, MPSContext>(
const int N,
const int H,
const int W,
const int mask_h,
const int mask_w,
const float thresh,
const float* masks,
const float* boxes,
uint8_t* im,
MPSContext* ctx) {
auto kernel = MPSKernel::TypedString<float>("PasteMask");
auto args = vector<MPSConstant>({
MPSConstant(&H, MTLDataTypeInt, 0),
MPSConstant(&W, MTLDataTypeInt, 1),
MPSConstant(&mask_h, MTLDataTypeInt, 2),
MPSConstant(&mask_w, MTLDataTypeInt, 3),
MPSConstant(&thresh, MTLDataTypeFloat, 4),
});
auto* command_buffer = ctx->mps_stream()->command_buffer();
auto* encoder = [command_buffer computeCommandEncoder];
auto* pso = MPSKernel(kernel, METAL_SHADERS).GetState(ctx, args);
[encoder setComputePipelineState:pso];
[encoder setBuffer:id<MTLBuffer>(masks) offset:0 atIndex:0];
[encoder setBuffer:id<MTLBuffer>(boxes) offset:0 atIndex:1];
[encoder setBuffer:id<MTLBuffer>(im) offset:0 atIndex:2];
MPSDispatchThreads((N * H * W), encoder, pso);
[encoder endEncoding];
[encoder release];
}
} // namespace detection
} // namespace dragon
#include <dragon/core/context_mps.h>
#include <dragon/core/workspace.h>
#include "../../../utils/detection/nms.h"
#include "../../../utils/detection/utils.h"
namespace dragon {
namespace detection {
namespace {
#define NUM_THREADS 64
const static string METAL_SHADERS = R"(
#include <metal_stdlib>
using namespace metal;
constant uint uint_arg1 [[function_constant(0)]];
constant float float_arg1 [[function_constant(1)]];
template <typename T>
bool CheckIoU(const T thresh, device const T* a, threadgroup T* b) {
const T x1 = max(a[0], b[0]);
const T y1 = max(a[1], b[1]);
const T x2 = min(a[2], b[2]);
const T y2 = min(a[3], b[3]);
const T width = max(T(0), x2 - x1);
const T height = max(T(0), y2 - y1);
const T inter = width * height;
const T Sa = (a[2] - a[0]) * (a[3] - a[1]);
const T Sb = (b[2] - b[0]) * (b[3] - b[1]);
return inter >= thresh * (Sa + Sb - inter);
}
template <typename T>
kernel void NonMaxSuppression(
device const T* boxes,
device uint64_t* mask,
const uint2 gridDim [[threadgroups_per_grid]],
const uint2 blockIdx [[threadgroup_position_in_grid]],
const uint2 threadIdx [[thread_position_in_threadgroup]]) {
const uint row_start = blockIdx.y;
const uint col_start = blockIdx.x;
if (row_start > col_start) return;
const uint row_size = min(uint_arg1 - row_start * uint(64), uint(64));
const uint col_size = min(uint_arg1 - col_start * uint(64), uint(64));
threadgroup T block_boxes[256];
if (threadIdx.x < col_size) {
threadgroup T* offset_block_boxes = block_boxes + threadIdx.x * 4;
device const T* offset_boxes = boxes + (col_start * uint(64) + threadIdx.x) * 5;
for (int i = 0; i < 4; ++i) {
*(offset_block_boxes++) = *(offset_boxes++);
}
}
threadgroup_barrier(mem_flags::mem_threadgroup);
if (threadIdx.x < row_size) {
const uint index = row_start * uint(64) + threadIdx.x;
device const T* offset_boxes = boxes + index * 5;
uint64_t val = 0;
const uint start = (row_start == col_start) ? (threadIdx.x + 1) : 0;
for (uint i = start; i < col_size; ++i) {
if (CheckIoU(T(float_arg1), offset_boxes, block_boxes + i * 4)) {
val |= (uint64_t(1) << i);
}
}
mask[index * gridDim.x + col_start] = val;
}
}
#define INSTANTIATE_KERNEL(T) \
template [[host_name("NonMaxSuppression_"#T)]] \
kernel void NonMaxSuppression( \
device const T*, device uint64_t*, uint2, uint2, uint2);
INSTANTIATE_KERNEL(float);
#undef INSTANTIATE_KERNEL
)";
} // namespace
template <>
void ApplyNMS<float, MPSContext>(
const int N,
const int K,
const int boxes_offset,
const float thresh,
const float* boxes,
vector<int64_t>& indices,
MPSContext* ctx) {
const auto num_blocks = utils::DivUp(N, NUM_THREADS);
auto* NMS_mask = ctx->workspace()->CreateTensor("NMS_mask");
NMS_mask->Reshape({N * num_blocks});
auto* mask = reinterpret_cast<uint64_t*>(
NMS_mask->template mutable_data<int64_t, MPSContext>());
auto kernel = MPSKernel::TypedString<float>("NonMaxSuppression");
const uint arg1 = N;
auto args = vector<MPSConstant>({
MPSConstant(&arg1, MTLDataTypeUInt, 0),
MPSConstant(&thresh, MTLDataTypeFloat, 1),
});
auto* command_buffer = ctx->mps_stream()->command_buffer();
auto* encoder = [command_buffer computeCommandEncoder];
auto* pso = MPSKernel(kernel, METAL_SHADERS).GetState(ctx, args);
[encoder setComputePipelineState:pso];
[encoder setBuffer:id<MTLBuffer>(boxes) offset:boxes_offset * 4 atIndex:0];
[encoder setBuffer:id<MTLBuffer>(mask) offset:0 atIndex:1];
[encoder dispatchThreadgroups:MTLSizeMake(num_blocks, num_blocks, 1)
threadsPerThreadgroup:MTLSizeMake(NUM_THREADS, 1, 1)];
[encoder endEncoding];
[encoder release];
ctx->FinishDeviceComputation();
mask = reinterpret_cast<uint64_t*>(
const_cast<int64_t*>(NMS_mask->template data<int64_t, CPUContext>()));
vector<uint64_t> is_dead(num_blocks);
memset(&is_dead[0], 0, sizeof(uint64_t) * num_blocks);
int num_selected = 0;
indices.resize(K);
for (int i = 0; i < N; ++i) {
const int nblock = i / NUM_THREADS, inblock = i % NUM_THREADS;
if (!(is_dead[nblock] & (uint64_t(1) << inblock))) {
indices[num_selected++] = i;
if (num_selected >= K) break;
auto* offset_mask = mask + i * num_blocks;
for (int j = nblock; j < num_blocks; ++j) {
is_dead[j] |= offset_mask[j];
}
}
}
indices.resize(num_selected);
}
} // namespace detection
} // namespace dragon
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
template <typename T, class Context>
void ApplyNMS(
const int N,
const int K,
const int boxes_offset,
const T thresh,
const T* boxes,
vector<int64_t>& indices,
Context* ctx);
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_NMS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
#include "../../utils/detection/bbox.h"
#include "../../utils/detection/types.h"
namespace dragon {
namespace detection {
template <typename T, class Context>
void SelectTopK(
const int N,
const int K,
const float thresh,
const T* input_scores,
vector<T>& output_scores,
vector<int64_t>& output_indices,
Context* ctx);
template <typename T>
void DecodeProposals(
const int num_proposals,
const int num_anchors,
const ImageArgs<int64_t>& im_args,
const GridArgs<int64_t>& grid_args,
const T* scores,
const T* deltas,
const int64_t* indices,
T* proposals) {
T* offset_proposals = proposals;
const int64_t index_min = grid_args.offset;
const T* offset_dx = deltas;
const T* offset_dy = deltas + num_anchors;
const T* offset_dw = deltas + num_anchors * 2;
const T* offset_dh = deltas + num_anchors * 3;
for (int i = 0; i < num_proposals; ++i) {
const auto index = indices[i] + index_min;
utils::BBoxTransform(
offset_dx[index],
offset_dy[index],
offset_dw[index],
offset_dh[index],
T(im_args.w),
T(im_args.h),
T(1),
T(1),
offset_proposals);
offset_proposals[4] = scores[i];
offset_proposals += 5;
}
}
template <typename T>
void DecodeDetections(
const int num_dets,
const int num_anchors,
const int num_classes,
const ImageArgs<int64_t>& im_args,
const GridArgs<int64_t>& grid_args,
const T* scores,
const T* deltas,
const int64_t* indices,
T* dets) {
T* offset_dets = dets;
const int64_t index_min = num_classes * grid_args.offset;
const T* offset_dx = deltas;
const T* offset_dy = deltas + num_anchors;
const T* offset_dw = deltas + num_anchors * 2;
const T* offset_dh = deltas + num_anchors * 3;
for (int i = 0; i < num_dets; ++i) {
const auto index = (indices[i] + index_min) / num_classes;
utils::BBoxTransform(
offset_dx[index],
offset_dy[index],
offset_dw[index],
offset_dh[index],
T(im_args.w),
T(im_args.h),
T(im_args.scale_h),
T(im_args.scale_w),
offset_dets + 1);
offset_dets[0] = T(im_args.batch_ind);
offset_dets[5] = scores[i];
offset_dets[6] = T((indices[i] + index_min) % num_classes + 1);
offset_dets += 7;
}
}
template <typename T>
inline void ApplyHistogram(
const int N,
const int lvl_min,
const int lvl_max,
const int lvl0,
const int s0,
const T* boxes,
const T* batch_indices,
const int64_t* box_indices,
vector<vector<T>>& output_rois) {
int K = 0;
vector<int> keep_indices(N), bin_indices(N);
vector<int> bin_count(lvl_max - lvl_min + 1, 0);
for (int i = 0; i < N; ++i) {
const T* offset_boxes = boxes + box_indices[i] * 5;
auto lvl = utils::GetBBoxLevel(lvl_min, lvl_max, lvl0, s0, offset_boxes);
if (lvl < 0) continue; // Empty.
keep_indices[K++] = i;
bin_indices[i] = lvl - lvl_min;
bin_count[lvl - lvl_min]++;
}
keep_indices.resize(K);
output_rois.resize(lvl_max - lvl_min + 1);
for (int i = 0; i < output_rois.size(); ++i) {
auto& rois = output_rois[i];
rois.resize(std::max(bin_count[i], 1) * 5, T(0));
if (bin_count[i] == 0) rois[0] = T(-1); // Ignored.
}
for (auto i : keep_indices) {
const T* offset_boxes = boxes + box_indices[i] * 5;
const auto bin_index = bin_indices[i];
const auto roi_index = --bin_count[bin_index];
auto& rois = output_rois[bin_index];
T* offset_rois = rois.data() + roi_index * 5;
offset_rois[0] = batch_indices[i];
offset_rois[1] = offset_boxes[0];
offset_rois[2] = offset_boxes[1];
offset_rois[3] = offset_boxes[2];
offset_rois[4] = offset_boxes[3];
}
}
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_PROPOSALS_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
#include <dragon/core/common.h>
namespace dragon {
namespace detection {
template <typename T>
struct Box4d {
T x1, y1, x2, y2;
};
template <typename T>
struct Box5d {
T x1, y1, x2, y2, score;
};
template <typename IndexT>
struct ImageArgs {
ImageArgs(const float* im_info) {
h = im_info[0], w = im_info[1];
scale_h = im_info[2], scale_w = im_info[3];
}
IndexT batch_ind, h, w;
float scale_h, scale_w;
};
template <typename IndexT>
struct GridArgs {
IndexT h, w, stride, size, offset;
};
template <typename KeyT, typename ValueT>
struct KeyValueMap {
typedef KeyT key_type;
typedef ValueT value_type;
friend void swap(KeyValueMap& x, KeyValueMap& y) {
std::swap(*x.key_ptr, *y.key_ptr);
std::swap(*x.value_ptr, *y.value_ptr);
}
KeyT* key_ptr = nullptr;
ValueT* value_ptr = nullptr;
};
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_TYPES_H_
/*!
* Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
*
* Licensed under the BSD 2-Clause License.
* You should have received a copy of the BSD 2-Clause License
* along with the software. If not, See,
*
* <https://opensource.org/licenses/BSD-2-Clause>
*
* ------------------------------------------------------------
*/
#ifndef DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
#define DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
namespace dragon {
namespace detection {
/*
* Detection Utilities.
*/
namespace utils {
template <typename T>
inline T DivUp(const T a, const T b) {
return (a + b - T(1)) / b;
}
} // namespace utils
} // namespace detection
} // namespace dragon
#endif // DRAGON_EXTENSION_UTILS_DETECTION_UTILS_H_
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Sergey Karayev
# --------------------------------------------------------
cimport cython
import numpy as np
cimport numpy as np
DTYPE = np.float
ctypedef np.float_t DTYPE_t
@cython.boundscheck(False)
def bbox_overlaps(
np.ndarray[DTYPE_t, ndim=2] boxes,
np.ndarray[DTYPE_t, ndim=2] query_boxes):
"""
Parameters
----------
boxes: (N, 4) ndarray of float
query_boxes: (K, 4) ndarray of float
Returns
-------
overlaps: (N, K) ndarray of overlap between boxes and query_boxes
"""
cdef unsigned int N = boxes.shape[0]
cdef unsigned int K = query_boxes.shape[0]
cdef np.ndarray[DTYPE_t, ndim=2] overlaps = np.zeros((N, K), dtype=DTYPE)
cdef DTYPE_t iw, ih, box_area
cdef DTYPE_t ua
cdef unsigned int k, n
with nogil:
for k in range(K):
box_area = (
(query_boxes[k, 2] - query_boxes[k, 0]) *
(query_boxes[k, 3] - query_boxes[k, 1])
)
for n in range(N):
iw = (
min(boxes[n, 2], query_boxes[k, 2]) -
max(boxes[n, 0], query_boxes[k, 0])
)
if iw > 0:
ih = (
min(boxes[n, 3], query_boxes[k, 3]) -
max(boxes[n, 1], query_boxes[k, 1])
)
if ih > 0:
ua = float(
(boxes[n, 2] - boxes[n, 0]) *
(boxes[n, 3] - boxes[n, 1]) +
box_area - iw * ih
)
overlaps[n, k] = iw * ih / ua
return overlaps
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
cimport cython
import numpy as np
cimport numpy as np
cdef inline np.float32_t max(np.float32_t a, np.float32_t b):
return a if a >= b else b
cdef inline np.float32_t min(np.float32_t a, np.float32_t b):
return a if a <= b else b
@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
def cpu_nms(np.ndarray[np.float32_t, ndim=2] dets, np.float thresh):
cdef np.ndarray[np.float32_t, ndim=1] x1 = dets[:, 0]
cdef np.ndarray[np.float32_t, ndim=1] y1 = dets[:, 1]
cdef np.ndarray[np.float32_t, ndim=1] x2 = dets[:, 2]
cdef np.ndarray[np.float32_t, ndim=1] y2 = dets[:, 3]
cdef np.ndarray[np.float32_t, ndim=1] scores = dets[:, 4]
cdef np.ndarray[np.float32_t, ndim=1] areas = (x2 - x1) * (y2 - y1)
cdef np.ndarray[np.intp_t, ndim=1] order = scores.argsort()[::-1]
cdef int ndets = dets.shape[0]
cdef np.ndarray[np.int_t, ndim=1] suppressed = \
np.zeros((ndets), dtype=np.int)
# nominal indices
cdef int _i, _j
# sorted indices
cdef int i, j
# temp variables for box i's (the box currently under consideration)
cdef np.float32_t ix1, iy1, ix2, iy2, iarea
# variables for computing overlap with box j (lower scoring box)
cdef np.float32_t xx1, yy1, xx2, yy2
cdef np.float32_t w, h
cdef np.float32_t inter, ovr
keep = []
for _i in range(ndets):
i = order[_i]
if suppressed[i] == 1:
continue
keep.append(i)
ix1 = x1[i]
iy1 = y1[i]
ix2 = x2[i]
iy2 = y2[i]
iarea = areas[i]
for _j in range(_i + 1, ndets):
j = order[_j]
if suppressed[j] == 1:
continue
xx1 = max(ix1, x1[j])
yy1 = max(iy1, y1[j])
xx2 = min(ix2, x2[j])
yy2 = min(iy2, y2[j])
w = max(0.0, xx2 - xx1)
h = max(0.0, yy2 - yy1)
inter = w * h
ovr = inter / (iarea + areas[j] - inter)
if ovr >= thresh:
suppressed[j] = 1
return keep
@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
def cpu_soft_nms(np.ndarray[float, ndim=2] boxes, float thresh,
unsigned int method=0, float sigma=0.5, float score_thresh=0.001):
cdef unsigned int N = boxes.shape[0]
cdef float iw, ih, box_area
cdef float ua
cdef int pos = 0
cdef float maxscore = 0
cdef int maxpos = 0
cdef float x1,x2,y1,y2,tx1,tx2,ty1,ty2,ts,area,weight,ov
for i in range(N):
maxscore = boxes[i, 4]
maxpos = i
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
pos = i + 1
# get max box
while pos < N:
if maxscore < boxes[pos, 4]:
maxscore = boxes[pos, 4]
maxpos = pos
pos = pos + 1
# add max box as a detection
boxes[i,0] = boxes[maxpos,0]
boxes[i,1] = boxes[maxpos,1]
boxes[i,2] = boxes[maxpos,2]
boxes[i,3] = boxes[maxpos,3]
boxes[i,4] = boxes[maxpos,4]
# swap ith box with position of max box
boxes[maxpos,0] = tx1
boxes[maxpos,1] = ty1
boxes[maxpos,2] = tx2
boxes[maxpos,3] = ty2
boxes[maxpos,4] = ts
tx1 = boxes[i,0]
ty1 = boxes[i,1]
tx2 = boxes[i,2]
ty2 = boxes[i,3]
ts = boxes[i,4]
pos = i + 1
# NMS iterations, note that N changes if detection boxes fall below threshold
while pos < N:
x1 = boxes[pos, 0]
y1 = boxes[pos, 1]
x2 = boxes[pos, 2]
y2 = boxes[pos, 3]
s = boxes[pos, 4]
area = (x2 - x1) * (y2 - y1)
iw = min(tx2, x2) - max(tx1, x1)
if iw > 0:
ih = min(ty2, y2) - max(ty1, y1)
if ih > 0:
ua = float((tx2 - tx1) * (ty2 - ty1) + area - iw * ih)
ov = iw * ih / ua #iou between max box and detection box
if method == 1: # linear
if ov > thresh:
weight = 1 - ov
else:
weight = 1
elif method == 2: # gaussian
weight = np.exp(-(ov * ov) / sigma)
else: # original NMS
if ov > thresh:
weight = 0
else:
weight = 1
boxes[pos, 4] = weight * boxes[pos, 4]
# if box score falls below threshold, discard the box by swapping with last box
# update N
if boxes[pos, 4] < score_thresh:
boxes[pos,0] = boxes[N-1, 0]
boxes[pos,1] = boxes[N-1, 1]
boxes[pos,2] = boxes[N-1, 2]
boxes[pos,3] = boxes[N-1, 3]
boxes[pos,4] = boxes[N-1, 4]
N = N - 1
pos = pos - 1
pos = pos + 1
keep = [i for i in range(N)]
return keep
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build cython extensions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from distutils.core import setup
from distutils.extension import Extension
import os
from Cython.Build import cythonize
from Cython.Distutils import build_ext
import numpy as np
def clean_builds():
"""Clean the builds."""
for file in os.listdir('./'):
if file.endswith('.c'):
os.remove(file)
ext_modules = [
Extension(
'seetadet.utils.bbox.cython_bbox',
['cython_bbox.pyx'],
extra_compile_args=['-w'],
include_dirs=[np.get_include()],
),
Extension(
'seetadet.utils.nms.cython_nms',
['cython_nms.pyx'],
extra_compile_args=['-w'],
include_dirs=[np.get_include()],
),
]
setup(
name='seetadet',
ext_modules=cythonize(
ext_modules, compiler_directives={'language_level': '3'}),
cmdclass={'build_ext': build_ext},
)
clean_builds()
# Datasets
## Introduction
This folder is kept for the record and json datasets.
Please prepare the datasets following the [documentation](../../scripts/datasets/README.md).
# Pretrained Models
## Introduction
This folder is kept for the pretrained models.
## ImageNet Pretrained Models
### Training settings
- ResNet models trained with 200 epochs follow the procedure in arXiv.1812.01187.
### ResNet
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [R50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls90e.pkl) | 90e | 76.53 | 93.16 | Ours |
| [R50](https://dragon.seetatech.com/download/seetadet/pretrained/R-50_in1k_cls200e.pkl) | 200e | 78.64 | 94.30 | Ours |
| [R50-A](https://dragon.seetatech.com/download/seetadet/pretrained/R-50-A_in1k_cls120e.pkl) | 120e | 75.30 | 92.20 | MSRA |
### MobileNet
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [MobileNetV2](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV2_in1k_cls300e.pkl) | 300e | 71.88 | 90.29 | TorchVision |
| [MobileNetV3L](https://dragon.seetatech.com/download/seetadet/pretrained/MobileNetV3L_in1k_cls600e.pkl) | 600e | 74.04 | 91.34 | TorchVision |
### VGG
| Model | Lr sched | Acc@1 | Acc@5 | Source |
| :---: | :------: | :---: | :---: | :----: |
| [VGG16-FCN](https://dragon.seetatech.com/download/seetadet/pretrained/VGG-16-FCN_in1k.pkl) | - | - | - | weiliu89 |
# Python dependencies required for development.
opencv-python
Pillow
pyyaml
prettytable
matplotlib
codewithgpu
shapely
Cython
pycocotools
# Prepare Datasets
## Create Datasets for PASCAL VOC
We assume that raw dataset has the following structure:
```
VOC<year>
|_ JPEGImages
| |_ <im-1-name>.jpg
| |_ ...
| |_ <im-N-name>.jpg
|_ Annotations
| |_ <im-1-name>.xml
| |_ ...
| |_ <im-N-name>.xml
|_ ImageSets
| |_ Main
| | |_ trainval.txt
| | |_ test.txt
| | |_ ...
```
Create record and json dataset by:
```
python pascal_voc.py \
--rec /path/to/datasets/voc_trainval0712 \
--gt /path/to/datasets/voc_trainval0712.json \
--images /path/to/VOC2007/JPEGImages \
/path/to/VOC2012/JPEGImages \
--annotations /path/to/VOC2007/Annotations \
/path/to/VOC2012/Annotations \
--splits /path/to/VOC2007/ImageSets/Main/trainval.txt \
/path/to/VOC2012/ImageSets/Main/trainval.txt
```
## Create Datasets for COCO
We assume that raw dataset has the following structure:
```
COCO
|_ images
| |_ train2017
| | |_ <im-1-name>.jpg
| | |_ ...
| | |_ <im-N-name>.jpg
|_ annotations
| |_ instances_train2017.json
| |_ ...
```
Create record dataset by:
```
python coco.py \
--rec /path/to/datasets/coco_train2017 \
--images /path/to/COCO/images/train2017 \
--annotations /path/to/COCO/annotations/instances_train2017.json
```
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare MS COCO datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import time
import codewithgpu
from pycocotools.coco import COCO
from pycocotools.mask import frPyObjects
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare MS COCO datasets')
parser.add_argument(
'--rec',
default=None,
help='path to write record dataset')
parser.add_argument(
'--images',
nargs='+',
type=str,
default=None,
help='path of images folder')
parser.add_argument(
'--annotations',
nargs='+',
type=str,
default=None,
help='path of annotations folder')
parser.add_argument(
'--splits',
nargs='+',
type=str,
default=None,
help='path of split file')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def make_example(img_id, img_file, cocoGt):
"""Return the record example."""
img_meta = cocoGt.imgs[img_id]
img_anns = cocoGt.loadAnns(cocoGt.getAnnIds(imgIds=[img_id]))
cat_id_to_cat = dict((v['id'], v['name'])
for v in cocoGt.cats.values())
with open(img_file, 'rb') as f:
img_bytes = bytes(f.read())
height, width = img_meta['height'], img_meta['width']
example = {'id': str(img_id), 'height': height, 'width': width,
'depth': 3, 'content': img_bytes, 'object': []}
for ann in img_anns:
x1 = float(max(0, ann['bbox'][0]))
y1 = float(max(0, ann['bbox'][1]))
x2 = float(min(width, x1 + max(0, ann['bbox'][2])))
y2 = float(min(height, y1 + max(0, ann['bbox'][3])))
mask, polygons = b'', []
segm = ann.get('segmentation', None)
if segm is not None and isinstance(segm, list):
for p in ann['segmentation']:
if len(p) < 6:
print('Remove Invalid segm.')
# Valid polygons have >= 3 points, so require >= 6 coordinates
polygons = [p for p in ann['segmentation'] if len(p) >= 6]
elif segm is not None:
# Crowd masks.
# Some are encoded with wrong height or width.
# Do not use them or decoding error is inevitable.
rle = frPyObjects(ann['segmentation'], height, width)
assert type(rle) == dict
mask = rle['counts']
example['object'].append({
'name': cat_id_to_cat[ann['category_id']],
'xmin': x1, 'ymin': y1, 'xmax': x2, 'ymax': y2,
'mask': mask, 'polygons': polygons,
'difficult': ann.get('iscrowd', 0)})
return example
def write_dataset(args):
assert len(args.images) == len(args.annotations)
if os.path.exists(args.rec):
raise ValueError('The record path is already exist.')
os.makedirs(args.rec)
print('Write record dataset to {}'.format(args.rec))
writer = codewithgpu.RecordWriter(
path=args.rec,
features={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'xmin': 'float64',
'ymin': 'float64',
'xmax': 'float64',
'ymax': 'float64',
'mask': 'bytes',
'polygons': [['float64']],
'difficult': 'int64',
}]
}
)
# Scan all available entries.
print('Scan entries...')
entries, cocoGts = [], []
for ann_file in args.annotations:
cocoGts.append(COCO(ann_file))
if args.splits is not None:
assert len(args.splits) == len(args.images)
for i, split in enumerate(args.splits):
f = open(split, 'r')
for line in f.readlines():
filename = line.strip()
img_id = int(filename)
img_file = os.path.join(args.images[i], filename + '.jpg')
entries.append((img_id, img_file, cocoGts[i]))
f.close()
else:
for i, cocoGt in enumerate(cocoGts):
for info in cocoGt.imgs.values():
img_id = info['id']
img_file = os.path.join(args.images[i], info['file_name'])
entries.append((img_id, img_file, cocoGts[i]))
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, entry in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(*entry))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(args.rec + '/00000.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
if __name__ == '__main__':
args = parse_args()
if args.rec is not None:
write_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare JSON datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import json
import os
import sys
import codewithgpu
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to read record')
parser.add_argument(
'--gt',
default=None,
help='path to write json ground-truth')
parser.add_argument(
'--categories',
nargs='+',
type=str,
default=None,
help='dataset object categories')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def get_image_id(image_name):
image_id = image_name.split('_')[-1].split('.')[0]
try:
return int(image_id)
except ValueError:
return image_name
def write_dataset(args):
dataset = {'images': [], 'categories': [], 'annotations': []}
record_dataset = codewithgpu.RecordDataset(args.rec)
cat_to_cat_id = dict(zip(args.categories,
range(1, len(args.categories) + 1)))
print('Writing json dataset to {}'.format(args.gt))
for cat in args.categories:
dataset['categories'].append({
'name': cat, 'id': cat_to_cat_id[cat]})
for example in record_dataset:
image_id = get_image_id(example['id'])
dataset['images'].append({
'id': image_id, 'height': example['height'],
'width': example['width']})
for obj in example['object']:
if 'x2' in obj:
x1, y1, x2, y2 = obj['x1'], obj['y1'], obj['x2'], obj['y2']
elif 'xmin' in obj:
x1, y1, x2, y2 = obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']
else:
x1, y1, x2, y2 = obj['bbox']
w, h = x2 - x1, y2 - y1
dataset['annotations'].append({
'id': str(len(dataset['annotations'])),
'bbox': [x1, y1, w, h],
'area': w * h,
'iscrowd': obj.get('difficult', 0),
'image_id': image_id,
'category_id': cat_to_cat_id[obj['name']]})
with open(args.gt, 'w') as f:
json.dump(dataset, f)
if __name__ == '__main__':
args = parse_args()
if args.rec is None or not os.path.exists(args.rec):
raise ValueError('Specify the prepared record dataset.')
if args.gt is None:
raise ValueError('Specify the path to write json dataset.')
write_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Prepare PASCAL VOC datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import time
import codewithgpu
import cv2
import numpy as np
import xml.etree.ElementTree
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Prepare PASCAL VOC datasets')
parser.add_argument(
'--rec',
default=None,
help='path to write record dataset')
parser.add_argument(
'--gt',
default=None,
help='path to write json dataset')
parser.add_argument(
'--images',
nargs='+',
type=str,
default=None,
help='path of images folder')
parser.add_argument(
'--annotations',
nargs='+',
type=str,
default=None,
help='path of annotations folder')
parser.add_argument(
'--splits',
nargs='+',
type=str,
default=None,
help='path of split file')
if len(sys.argv) == 1:
parser.print_help()
sys.exit(1)
return parser.parse_args()
def make_example(img_file, xml_file):
"""Return the record example."""
tree = xml.etree.ElementTree.parse(xml_file)
filename = os.path.split(xml_file)[-1]
objects = tree.findall('object')
size = tree.find('size')
example = {'id': filename.split('.')[0], 'object': []}
with open(img_file, 'rb') as f:
img_bytes = bytes(f.read())
if size is not None:
example['height'] = int(size.find('height').text)
example['width'] = int(size.find('width').text)
example['depth'] = int(size.find('depth').text)
else:
img = cv2.imdecode(np.frombuffer(img_bytes, 'uint8'), 3)
example['height'], example['width'], example['depth'] = img.shape
example['content'] = img_bytes
for obj in objects:
bbox = obj.find('bndbox')
is_diff = 0
if obj.find('difficult') is not None:
is_diff = int(obj.find('difficult').text) == 1
example['object'].append({
'name': obj.find('name').text.strip(),
'xmin': float(bbox.find('xmin').text),
'ymin': float(bbox.find('ymin').text),
'xmax': float(bbox.find('xmax').text),
'ymax': float(bbox.find('ymax').text),
'difficult': is_diff})
return example
def write_dataset(args):
"""Write the record dataset."""
assert len(args.splits) == len(args.images)
assert len(args.splits) == len(args.annotations)
if os.path.exists(args.rec):
raise ValueError('The record path is already exist.')
os.makedirs(args.rec)
print('Write record dataset to {}'.format(args.rec))
writer = codewithgpu.RecordWriter(
path=args.rec,
features={
'id': 'string',
'content': 'bytes',
'height': 'int64',
'width': 'int64',
'depth': 'int64',
'object': [{
'name': 'string',
'xmin': 'float64',
'ymin': 'float64',
'xmax': 'float64',
'ymax': 'float64',
'difficult': 'int64',
}]
}
)
# Scan all available entries.
print('Scan entries...')
entries = []
for i, split in enumerate(args.splits):
with open(split, 'r') as f:
lines = f.readlines()
for line in lines:
filename = line.strip()
img_file = os.path.join(args.images[i], filename + '.jpg')
ann_file = os.path.join(args.annotations[i], filename + '.xml')
entries.append((img_file, ann_file))
# Parse and write into record file.
print('Start Time:', time.strftime("%a, %d %b %Y %H:%M:%S", time.gmtime()))
start_time = time.time()
for i, (img_file, xml_file) in enumerate(entries):
if i > 0 and i % 2000 == 0:
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
i, len(entries), now_time - start_time))
writer.write(make_example(img_file, xml_file))
now_time = time.time()
print('{} / {} in {:.2f} sec'.format(
len(entries), len(entries), now_time - start_time))
writer.close()
end_time = time.time()
data_size = os.path.getsize(args.rec + '/00000.data') * 1e-6
print('{} images take {:.2f} MB in {:.2f} sec.'
.format(len(entries), data_size, end_time - start_time))
def write_json_dataset(args):
"""Write the json dataset."""
categories = ['aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor']
import subprocess
scirpt = os.path.dirname(os.path.abspath(__file__)) + '/json_dataset.py'
cmd = '{} {} '.format(sys.executable, scirpt)
cmd += '--rec {} --gt {} '.format(args.rec, args.gt)
cmd += '--categories {} '.format(' '.join(categories))
return subprocess.call(cmd, shell=True)
if __name__ == '__main__':
args = parse_args()
if args.rec is not None:
write_dataset(args)
if args.gt is not None:
write_json_dataset(args)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""A platform implementing popular object detection algorithms."""
from __future__ import absolute_import as _absolute_import
from __future__ import division as _division
from __future__ import print_function as _print_function
# Version
from seetadet.version import version as __version__
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Platform configurations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# Variables
from seetadet.core.config.defaults import cfg # noqa
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Default configurations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config.yacs import CfgNode
_C = cfg = CfgNode()
# ------------------------------------------------------------
# Training options
# ------------------------------------------------------------
_C.TRAIN = CfgNode()
# Initialize network with weights from this file
_C.TRAIN.WEIGHTS = ''
# The train dataset
_C.TRAIN.DATASET = ''
# The loader type for training
_C.TRAIN.LOADER = 'det_train'
# The number of workers to load train data
_C.TRAIN.NUM_WORKERS = 3
# Scales to use during training (can list multiple scales)
# Each scale is the pixel size of an image shortest side
_C.TRAIN.SCALES = (640,)
# Range to jitter the image scales randomly
_C.TRAIN.SCALES_RANGE = (1.0, 1.0)
# Longest side to resize the input image
_C.TRAIN.MAX_SIZE = 1000
# Size to crop the input image
_C.TRAIN.CROP_SIZE = 0
# Images to use per mini-batch
_C.TRAIN.IMS_PER_BATCH = 1
# Use the difficult (occluded/crowd) objects
_C.TRAIN.USE_DIFF = False
# The probability to distort the color
_C.TRAIN.COLOR_JITTER = 0.0
# ------------------------------------------------------------
# Testing options
# ------------------------------------------------------------
_C.TEST = CfgNode()
# The test dataset
_C.TEST.DATASET = ''
# THE JSON format dataset with annotations for evaluation
_C.TEST.JSON_DATASET = ''
# The loader type for testing
_C.TEST.LOADER = 'det_test'
# The evaluator type for dataset
_C.TEST.EVALUATOR = ''
# Scales to use during testing (can list multiple scales)
# Each scale is the pixel size of an image's shortest side
_C.TEST.SCALES = (640,)
# Max pixel size of the longest side of a scaled input image
_C.TEST.MAX_SIZE = 1000
# Size to crop the input image
_C.TEST.CROP_SIZE = 0
# Images to use per mini-batch
_C.TEST.IMS_PER_BATCH = 1
# The threshold for predicting boxes
_C.TEST.SCORE_THRESH = 0.05
# Overlap threshold used for NMS
_C.TEST.NMS_THRESH = 0.5
# Maximum number of detections to return per image
# 100 is based on the limit established for the COCO dataset
_C.TEST.DETECTIONS_PER_IM = 100
# ------------------------------------------------------------
# Model options
# ------------------------------------------------------------
_C.MODEL = CfgNode()
# The model type
_C.MODEL.TYPE = ''
# The compute precision
_C.MODEL.PRECISION = 'float32'
# The name for each object class
_C.MODEL.CLASSES = ['__background__']
# Pixel mean and stddev values for image normalization (BGR order)
_C.MODEL.PIXEL_MEAN = [103.53, 116.28, 123.675]
_C.MODEL.PIXEL_STD = [57.375, 57.12, 58.395]
# Focal loss parameters
_C.MODEL.FOCAL_LOSS_ALPHA = 0.25
_C.MODEL.FOCAL_LOSS_GAMMA = 2.0
# ------------------------------------------------------------
# Backbone options
# ------------------------------------------------------------
_C.BACKBONE = CfgNode()
# The backbone type
_C.BACKBONE.TYPE = ''
# The normalization in backbone modules
_C.BACKBONE.NORM = 'FrozenBN'
# The drop path rate in backbone
_C.BACKBONE.DROP_PATH_RATE = 0.0
# Freeze the first stages/blocks of backbone
_C.BACKBONE.FREEZE_AT = 2
# Stride of the coarsest feature
# This is needed so the input can be padded properly
_C.BACKBONE.COARSEST_STRIDE = 32
# ------------------------------------------------------------
# FPN options
# ------------------------------------------------------------
_C.FPN = CfgNode()
# Finest level of the FPN pyramid
_C.FPN.MIN_LEVEL = 3
# Coarsest level of the FPN pyramid
_C.FPN.MAX_LEVEL = 7
# Starting level of the top-down fusing
_C.FPN.FUSE_LEVEL = 5
# Number of blocks to stack in the FPN
_C.FPN.NUM_BLOCKS = 1
# Channel dimension of the FPN feature levels
_C.FPN.DIM = 256
# The FPN conv module
_C.FPN.CONV = 'Conv2d'
# The fpn normalization module
_C.FPN.NORM = ''
# The fpn activation module
_C.FPN.ACTIVATION = ''
# The feature fusion method
_C.FPN.FUSE_TYPE = 'sum'
# ------------------------------------------------------------
# Anchor generator options
# ------------------------------------------------------------
_C.ANCHOR_GENERATOR = CfgNode()
# The stride of each level
_C.ANCHOR_GENERATOR.STRIDES = [8, 16, 32, 64, 128]
# The anchor size of each stride
_C.ANCHOR_GENERATOR.SIZES = [[32], [64], [128], [256], [512]]
# The aspect ratios of each stride
_C.ANCHOR_GENERATOR.ASPECT_RATIOS = [[0.5, 1.0, 2.0]]
# ------------------------------------------------------------
# RPN options
# ------------------------------------------------------------
_C.RPN = CfgNode()
# Total number of rpn training anchors per image
_C.RPN.BATCH_SIZE = 256
# Fraction of foreground anchors per training batch
_C.RPN.POSITIVE_FRACTION = 0.5
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
_C.RPN.POSITIVE_OVERLAP = 0.7
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
_C.RPN.NEGATIVE_OVERLAP = 0.3
# NMS threshold used on RPN proposals
_C.RPN.NMS_THRESH = 0.7
# Number of top scoring boxes to keep before NMS to RPN proposals
_C.RPN.PRE_NMS_TOPK_TRAIN = 2000
_C.RPN.PRE_NMS_TOPK_TEST = 1000
# Number of top scoring boxes to keep after NMS to RPN proposals
_C.RPN.POST_NMS_TOPK_TRAIN = 1000
_C.RPN.POST_NMS_TOPK_TEST = 1000
# The number of conv layers to stack in the head
_C.RPN.NUM_CONV = 1
# The optional loss for bbox regression
_C.RPN.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
_C.RPN.BBOX_REG_LOSS_WEIGHT = 1.0
# ------------------------------------------------------------
# RetinaNet options
# ------------------------------------------------------------
_C.RETINANET = CfgNode()
# Number of conv layers to stack in the head
_C.RETINANET.NUM_CONV = 4
# The head conv module
_C.RETINANET.CONV = 'Conv2d'
# The head normalization module
_C.RETINANET.NORM = ''
# The head activation module
_C.RETINANET.ACTIVATION = 'ReLU'
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
_C.RETINANET.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
_C.RETINANET.NEGATIVE_OVERLAP = 0.4
# Number of top scoring boxes to keep before NMS
_C.RETINANET.PRE_NMS_TOPK = 1000
# The bbox regression loss type
_C.RETINANET.BBOX_REG_LOSS_TYPE = 'l1'
# The weight for bbox regression loss
_C.RETINANET.BBOX_REG_LOSS_WEIGHT = 1.0
# ------------------------------------------------------------
# FastRCNN options
# ------------------------------------------------------------
_C.FAST_RCNN = CfgNode()
# Total number of training RoIs per image
_C.FAST_RCNN.BATCH_SIZE = 512
# The finest level of RoI feature
_C.FAST_RCNN.MIN_LEVEL = 2
# The coarsest level of RoI feature
_C.FAST_RCNN.MAX_LEVEL = 5
# Fraction of foreground RoIs per training batch
_C.FAST_RCNN.POSITIVE_FRACTION = 0.25
# IoU overlap ratio for labeling a RoI as positive
# RoIs with >= iou overlap are labeled positive
_C.FAST_RCNN.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling a RoI as negative
# RoIs with < iou overlap are labeled negative
_C.FAST_RCNN.NEGATIVE_OVERLAP = 0.5
# RoI pooler type
_C.FAST_RCNN.POOLER_TYPE = 'RoIAlignV2'
# The output size of of RoI pooler
_C.FAST_RCNN.POOLER_RESOLUTION = 7
# The resampling window size of RoI pooler
_C.FAST_RCNN.POOLER_SAMPLING_RATIO = 0
# The number of conv layers to stack in the head
_C.FAST_RCNN.NUM_CONV = 0
# The number of fc layers to stack in the head
_C.FAST_RCNN.NUM_FC = 2
# The hidden dimension of conv head
_C.FAST_RCNN.CONV_HEAD_DIM = 256
# The hidden dimension of fc head
_C.FAST_RCNN.FC_HEAD_DIM = 1024
# The head normalization module
_C.FAST_RCNN.NORM = ''
# Use class agnostic for bbox regression or not
_C.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC = False
# The bbox regression loss type
_C.FAST_RCNN.BBOX_REG_LOSS_TYPE = 'l1'
# The weight for bbox regression loss
_C.FAST_RCNN.BBOX_REG_LOSS_WEIGHT = 1.0
# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
_C.FAST_RCNN.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
# ------------------------------------------------------------
# MaskRCNN options
# ------------------------------------------------------------
_C.MASK_RCNN = CfgNode()
# RoI pooler type
_C.MASK_RCNN.POOLER_TYPE = 'RoIAlignV2'
# The output size of of RoI pooler
_C.MASK_RCNN.POOLER_RESOLUTION = 14
# The resampling window size of RoI pooler
_C.MASK_RCNN.POOLER_SAMPLING_RATIO = 0
# The number of conv layers to stack in the head
_C.MASK_RCNN.NUM_CONV = 4
# The hidden dimension of conv head
_C.MASK_RCNN.CONV_HEAD_DIM = 256
# The head normalization module
_C.MASK_RCNN.NORM = ''
# ------------------------------------------------------------
# CascadeRCNN options
# ------------------------------------------------------------
_C.CASCADE_RCNN = CfgNode()
# Make mask predictions or not
_C.CASCADE_RCNN.MASK_ON = False
# IoU overlap ratios for labeling a RoI as positive
# RoIs with >= iou overlap are labeled positive
_C.CASCADE_RCNN.POSITIVE_OVERLAP = (0.5, 0.6, 0.7)
# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
_C.CASCADE_RCNN.BBOX_REG_WEIGHTS = (
(10.0, 10.0, 5.0, 5.0),
(20.0, 20.0, 10.0, 10.0),
(30.0, 30.0, 15.0, 15.0),
)
# ------------------------------------------------------------
# SSD options
# ------------------------------------------------------------
_C.SSD = CfgNode()
# Fraction of foreground anchors per training batch
_C.SSD.POSITIVE_FRACTION = 0.25
# IoU overlap ratio for labeling an anchor as positive
# Anchors with >= iou overlap are labeled positive
_C.SSD.POSITIVE_OVERLAP = 0.5
# IoU overlap ratio for labeling an anchor as negative
# Anchors with < iou overlap are labeled negative
_C.SSD.NEGATIVE_OVERLAP = 0.5
# Number of top scoring boxes to keep before NMS
_C.SSD.PRE_NMS_TOPK = 300
# The optional loss for bbox regression
# Values supported: 'l1', 'smooth_l1', 'giou'
_C.SSD.BBOX_REG_LOSS_TYPE = 'l1'
# Weight for bbox regression loss
_C.SSD.BBOX_REG_LOSS_WEIGHT = 1.0
# The weights on (dx, dy, dw, dh) for normalizing bbox regression targets
_C.SSD.BBOX_REG_WEIGHTS = (10., 10., 5., 5.)
# ------------------------------------------------------------
# Solver options
# ------------------------------------------------------------
_C.SOLVER = CfgNode()
# The interval to display logs
_C.SOLVER.DISPLAY = 20
# The interval to snapshot a model
_C.SOLVER.SNAPSHOT_EVERY = 5000
# Prefix to yield the path: <prefix>_iter_XYZ.pkl
_C.SOLVER.SNAPSHOT_PREFIX = ''
# Loss scaling factor for mixed precision training
_C.SOLVER.LOSS_SCALE = 1024.0
# Maximum number of SGD iterations
_C.SOLVER.MAX_STEPS = 40000
# Base learning rate for the specified scheduler
_C.SOLVER.BASE_LR = 0.001
# Minimal learning rate for the specified scheduler
_C.SOLVER.MIN_LR = 0.0
# The decay intervals for LRScheduler
_C.SOLVER.DECAY_STEPS = []
# The decay factor for exponential LRScheduler
_C.SOLVER.DECAY_GAMMA = 0.1
# Warm up to ``BASE_LR`` over this number of steps
_C.SOLVER.WARM_UP_STEPS = 1000
# Start the warm up from ``BASE_LR`` * ``FACTOR``
_C.SOLVER.WARM_UP_FACTOR = 1.0 / 1000
# The type of optimizier
_C.SOLVER.OPTIMIZER = 'SGD'
# The type of lr scheduler
_C.SOLVER.LR_POLICY = 'steps_with_decay'
# The layer-wise lr decay
_C.SOLVER.LAYER_LR_DECAY = 1.0
# Momentum to use with SGD
_C.SOLVER.MOMENTUM = 0.9
# L2 regularization for weight parameters
_C.SOLVER.WEIGHT_DECAY = 0.0001
# L2 norm factor for clipping gradients
_C.SOLVER.CLIP_NORM = 0.0
# ------------------------------------------------------------
# Misc options
# ------------------------------------------------------------
# Number of GPUs for distributed training
_C.NUM_GPUS = 1
# Random seed for reproducibility
_C.RNG_SEED = 3
# Default GPU device index
_C.GPU_ID = 0
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# Codes are based on:
#
# <https://github.com/rbgirshick/yacs/blob/master/yacs/config.py>
#
# ------------------------------------------------------------
"""Yet Another Configuration System (YACS)."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import numpy as np
import yaml
class CfgNode(dict):
"""Node for configuration options."""
IMMUTABLE = '__immutable__'
def __init__(self, *args, **kwargs):
super(CfgNode, self).__init__(*args, **kwargs)
self.__dict__[CfgNode.IMMUTABLE] = False
def clone(self):
"""Recursively copy this CfgNode."""
return copy.deepcopy(self)
def freeze(self):
"""Make this CfgNode and all of its children immutable."""
self._immutable(True)
def is_frozen(self):
"""Return mutability."""
return self.__dict__[CfgNode.IMMUTABLE]
def merge_from_file(self, cfg_filename):
"""Load a yaml config file and merge it into this CfgNode."""
with open(cfg_filename, 'r') as f:
other_cfg = CfgNode(yaml.safe_load(f))
self.merge_from_other_cfg(other_cfg)
def merge_from_list(self, cfg_list):
"""Merge config (keys, values) in a list into this CfgNode."""
assert len(cfg_list) % 2 == 0
from ast import literal_eval
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = self
for sub_key in key_list[:-1]:
assert sub_key in d
d = d[sub_key]
sub_key = key_list[-1]
assert sub_key in d
try:
value = literal_eval(v)
except: # noqa
# Handle the case when v is a string literal
value = v
if type(value) != type(d[sub_key]): # noqa
raise TypeError('Type {} does not match original type {}'
.format(type(value), type(d[sub_key])))
d[sub_key] = value
def merge_from_other_cfg(self, other_cfg):
"""Merge ``other_cfg`` into this CfgNode."""
_merge_a_into_b(other_cfg, self)
def _immutable(self, is_immutable):
"""Set immutability recursively to all nested CfgNode."""
self.__dict__[CfgNode.IMMUTABLE] = is_immutable
for v in self.__dict__.values():
if isinstance(v, CfgNode):
v._immutable(is_immutable)
for v in self.values():
if isinstance(v, CfgNode):
v._immutable(is_immutable)
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __repr__(self):
return "{}({})".format(self.__class__.__name__,
super(CfgNode, self).__repr__())
def __setattr__(self, name, value):
if not self.__dict__[CfgNode.IMMUTABLE]:
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
else:
raise AttributeError(
'Attempted to set "{}" to "{}", but CfgNode is immutable'
.format(name, value))
def __str__(self):
def _indent(s_, num_spaces):
s = s_.split("\n")
if len(s) == 1:
return s_
first = s.pop(0)
s = [(num_spaces * " ") + line for line in s]
s = "\n".join(s)
s = first + "\n" + s
return s
r = ""
s = []
for k, v in sorted(self.items()):
seperator = "\n" if isinstance(v, CfgNode) else " "
attr_str = "{}:{}{}".format(str(k), seperator, str(v))
attr_str = _indent(attr_str, 2)
s.append(attr_str)
r += "\n".join(s)
return r
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a."""
if not isinstance(a, dict):
return
for k, v in a.items():
# a must specify keys that are in b
if k not in b:
raise KeyError('{} is not a valid config key'.format(k))
# The types must match, too
v = _check_and_coerce_cfg_value_type(v, b[k], k)
# Recursively merge dicts
if type(v) is CfgNode:
try:
_merge_a_into_b(a[k], b[k])
except: # noqa
print('Error under config key: {}'.format(k))
raise
else:
b[k] = v
def _check_and_coerce_cfg_value_type(value_a, value_b, key):
"""Check if the value type matched."""
type_a, type_b = type(value_a), type(value_b)
if type_a is type_b:
return value_a
if type_b is float and type_a is int:
return float(value_a)
# Exceptions: numpy arrays, strings, tuple<->list
if isinstance(value_b, np.ndarray):
value_a = np.array(value_a, dtype=value_b.dtype)
elif isinstance(value_a, tuple) and isinstance(value_b, list):
value_a = list(value_a)
elif isinstance(value_a, list) and isinstance(value_b, tuple):
value_a = tuple(value_a)
elif isinstance(value_a, dict) and isinstance(value_b, CfgNode):
value_a = CfgNode(value_a)
else:
raise ValueError(
'Type mismatch ({} vs. {}) with values ({} vs. {}) for config '
'key: {}'.format(type_b, type_a, value_b, value_a, key))
return value_a
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Experiment coordinator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import time
import numpy as np
from seetadet.core.config import cfg
from seetadet.utils import logging
class Coordinator(object):
"""Manage the unique experiments."""
def __init__(self, cfg_file, exp_dir=None):
cfg.merge_from_file(cfg_file)
if exp_dir is None:
name = time.strftime('%Y%m%d_%H%M%S',
time.localtime(time.time()))
exp_dir = '../experiments/{}'.format(name)
if not osp.exists(exp_dir):
os.makedirs(exp_dir)
else:
if not osp.exists(exp_dir):
raise ValueError('Invalid experiment dir: ' + exp_dir)
self.exp_dir = exp_dir
def path_at(self, file, auto_create=True):
try:
path = osp.abspath(osp.join(self.exp_dir, file))
if auto_create and not osp.exists(path):
os.makedirs(path)
except OSError:
path = osp.abspath(osp.join('/tmp', file))
if auto_create and not osp.exists(path):
os.makedirs(path)
return path
def get_checkpoint(self, step=None, last_idx=1, wait=False):
path = self.path_at('checkpoints')
def locate(last_idx=None):
files = os.listdir(path)
files = list(filter(lambda x: '_iter_' in x and
x.endswith('.pkl'), files))
file_steps = []
for i, file in enumerate(files):
file_step = int(file.split('_iter_')[-1].split('.')[0])
if step == file_step:
return osp.join(path, files[i]), file_step
file_steps.append(file_step)
if step is None:
if len(files) == 0:
return None, 0
if last_idx > len(files):
return None, 0
file = files[np.argsort(file_steps)[-last_idx]]
file_step = file_steps[np.argsort(file_steps)[-last_idx]]
return osp.join(path, file), file_step
return None, 0
file, file_step = locate(last_idx)
while file is None and wait:
logging.info('Wait for checkpoint at {}.'.format(step))
time.sleep(10)
file, file_step = locate(last_idx)
return file, file_step
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for training library."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from seetadet.core.config import cfg
from seetadet.core.engine import lr_scheduler
def build_optimizer(params, **kwargs):
"""Build the optimizer."""
args = {'lr': cfg.SOLVER.BASE_LR,
'weight_decay': cfg.SOLVER.WEIGHT_DECAY,
'clip_norm': cfg.SOLVER.CLIP_NORM,
'grad_scale': 1.0 / cfg.SOLVER.LOSS_SCALE}
optimizer = kwargs.pop('optimizer', cfg.SOLVER.OPTIMIZER)
if optimizer == 'SGD':
args['momentum'] = cfg.SOLVER.MOMENTUM
args.update(kwargs)
return getattr(torch.optim, optimizer)(params, **args)
def build_lr_scheduler(**kwargs):
"""Build the LR scheduler."""
args = {'lr_max': cfg.SOLVER.BASE_LR,
'lr_min': cfg.SOLVER.MIN_LR,
'warmup_steps': cfg.SOLVER.WARM_UP_STEPS,
'warmup_factor': cfg.SOLVER.WARM_UP_FACTOR}
policy = kwargs.pop('policy', cfg.SOLVER.LR_POLICY)
args.update(kwargs)
if policy == 'steps_with_decay':
return lr_scheduler.MultiStepLR(
decay_steps=cfg.SOLVER.DECAY_STEPS,
decay_gamma=cfg.SOLVER.DECAY_GAMMA, **args)
elif policy == 'linear_decay':
return lr_scheduler.LinearLR(
decay_step=(cfg.SOLVER.DECAY_STEPS or [1])[0],
max_steps=cfg.SOLVER.MAX_STEPS, **args)
elif policy == 'cosine_decay':
return lr_scheduler.CosineLR(
decay_step=(cfg.SOLVER.DECAY_STEPS or [1])[0],
max_steps=cfg.SOLVER.MAX_STEPS, **args)
return lr_scheduler.ConstantLR(**args)
def build_tensorboard(log_dir):
"""Build the tensorboard."""
try:
from dragon.utils.tensorboard import tf
from dragon.utils.tensorboard import TensorBoard
# Avoid using of GPUs by TF API.
if tf is not None:
tf.config.set_visible_devices([], 'GPU')
return TensorBoard(log_dir)
except ImportError:
return None
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Learning rate schedulers."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
class ConstantLR(object):
"""Constant LR scheduler."""
def __init__(self, **kwargs):
self._lr_max = kwargs.pop('lr_max')
self._lr_min = kwargs.pop('lr_min', 0)
self._warmup_steps = kwargs.pop('warmup_steps', 0)
self._warmup_factor = kwargs.pop('warmup_factor', 0)
if kwargs:
raise ValueError('Unexpected arguments: ' + ','.join(v for v in kwargs))
self._step_count = 0
self._last_decay = 1.
def step(self):
self._step_count += 1
def get_lr(self):
if self._step_count < self._warmup_steps:
alpha = (self._step_count + 1.) / self._warmup_steps
return self._lr_max * (alpha + (1. - alpha) * self._warmup_factor)
return self._lr_min + (self._lr_max - self._lr_min) * self.get_decay()
def get_decay(self):
return self._last_decay
class CosineLR(ConstantLR):
"""LR scheduler with cosine decay."""
def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
super(CosineLR, self).__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
self._decay_step = decay_step
self._max_steps = max_steps
def get_decay(self):
t = self._step_count - self._warmup_steps
t_max = self._max_steps - self._warmup_steps
if t > 0 and t % self._decay_step == 0:
self._last_decay = .5 * (1. + math.cos(math.pi * t / t_max))
return self._last_decay
class MultiStepLR(ConstantLR):
"""LR scheduler with multi-steps decay."""
def __init__(self, lr_max, decay_steps, decay_gamma, **kwargs):
super(MultiStepLR, self).__init__(lr_max=lr_max, **kwargs)
self._decay_steps = decay_steps
self._decay_gamma = decay_gamma
self._stage_count = 0
self._num_stages = len(decay_steps)
def get_decay(self):
if self._stage_count < self._num_stages:
k = self._decay_steps[self._stage_count]
while self._step_count >= k:
self._stage_count += 1
if self._stage_count >= self._num_stages:
break
k = self._decay_steps[self._stage_count]
self._last_decay = self._decay_gamma ** self._stage_count
return self._last_decay
class LinearLR(ConstantLR):
"""LR scheduler with linear decay."""
def __init__(self, lr_max, max_steps, lr_min=0, decay_step=1, **kwargs):
super(LinearLR, self).__init__(lr_max=lr_max, lr_min=lr_min, **kwargs)
self._decay_step = decay_step
self._max_steps = max_steps
def get_decay(self):
t = self._step_count - self._warmup_steps
t_max = self._max_steps - self._warmup_steps
if t > 0 and t % self._decay_step == 0:
self._last_decay = 1. - float(t) / t_max
return self._last_decay
if __name__ == '__main__':
def extract_label(scheduler):
class_name = scheduler.__class__.__name__
label = class_name + '('
if class_name == 'CosineLR':
label += 'α=' + str(scheduler._decay_step)
elif class_name == 'LinearCosineLR':
label += 'α=' + str(scheduler._decay_step)
elif class_name == 'MultiStepLR':
label += 'α=' + str(scheduler._decay_steps) + ', '
label += 'γ=' + str(scheduler._decay_gamma)
elif class_name == 'StepLR':
label += 'α=' + str(scheduler._decay_step) + ', '
label += 'γ=' + str(scheduler._decay_gamma)
label += ')'
return label
vis = True
max_steps = 120
shared_args = {
'lr_max': 0.0004,
'warmup_steps': 0,
'warmup_factor': 0.,
}
schedulers = [
# CosineLR(lr_min=0., decay_step=1, max_steps=max_steps, **shared_args),
CosineLR(lr_min=1e-6, decay_step=1, max_steps=140, **shared_args),
]
for i in range(max_steps):
info = 'Step = %d\n' % i
for scheduler in schedulers:
if i == 0:
scheduler.lr_seq = []
info += ' * {}: {}\n'.format(
extract_label(scheduler),
scheduler.get_lr())
scheduler.lr_seq.append(scheduler.get_lr())
scheduler.step()
if not vis:
print(info)
if vis:
import matplotlib.pyplot as plt
plt.figure(1)
plt.title('Visualization of different LR Schedulers')
plt.xlabel('Step')
plt.ylabel('Learning Rate')
line = '-'
colors = ['r', 'g', 'b', 'c', 'm', 'y', 'k']
for i, scheduler in enumerate(schedulers):
plt.plot(
range(max_steps),
scheduler.lr_seq,
colors[i] + line,
linewidth=1.,
label=extract_label(scheduler),
)
plt.legend()
plt.grid(linestyle='--')
plt.show()
plt.savefig('x.png')
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Testing engine."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import datetime
import multiprocessing as mp
import codewithgpu
from dragon.vm import torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.build import build_evaluator
from seetadet.models.build import build_detector
from seetadet.modules.build import build_inference
from seetadet.utils import logging
from seetadet.utils import profiler
from seetadet.utils import vis
class InferenceCommand(codewithgpu.InferenceCommand):
"""Command to run inference."""
def __init__(self, input_queue, output_queue, kwargs):
super(InferenceCommand, self).__init__(input_queue, output_queue)
self.kwargs = kwargs
def build_env(self):
"""Build the environment."""
cfg.merge_from_other_cfg(self.kwargs['cfg'])
cfg.GPU_ID = self.kwargs['device']
cfg.freeze()
logging.set_root(self.kwargs.get('verbose', True))
self.batch_size = cfg.TEST.IMS_PER_BATCH
self.batch_timeout = self.kwargs.get('batch_timeout', None)
if self.kwargs.get('deterministic', False):
torch.backends.cudnn.deterministic = True
def build_model(self):
"""Build and return the model."""
return build_detector(self.kwargs['device'], self.kwargs['weights'])
def build_module(self, model):
"""Build and return the inference module."""
return build_inference(model)
def send_results(self, module, indices, imgs):
"""Send the batch results."""
results = module.get_results(imgs)
time_diffs = module.get_time_diffs()
time_diffs['im_detect'] += time_diffs.pop('im_detect_mask', 0.)
for i, outputs in enumerate(results):
outputs['im_shape'] = imgs[i].shape
self.output_queue.put((indices[i], time_diffs, outputs))
def filter_outputs(outputs, max_dets=100):
"""Limit the max number of detections."""
if max_dets <= 0:
return outputs
boxes = outputs.pop('boxes')
masks = outputs.pop('masks', None)
scores, num_classes = [], len(boxes)
for i in range(num_classes):
if len(boxes[i]) > 0:
scores.append(boxes[i][:, -1])
scores = np.hstack(scores) if len(scores) > 0 else []
if len(scores) > max_dets:
thr = np.sort(scores)[-max_dets]
for i in range(num_classes):
if len(boxes[i]) < 1:
continue
keep = np.where(boxes[i][:, -1] >= thr)[0]
boxes[i] = boxes[i][keep]
if masks is not None:
masks[i] = masks[i][keep]
outputs['boxes'] = boxes
outputs['masks'] = masks
return outputs
def extend_results(index, collection, results):
"""Add image results to the collection."""
if results is None:
return
for _ in range(len(results) - len(collection)):
collection.append([])
for i in range(1, len(results)):
for _ in range(index - len(collection[i]) + 1):
collection[i].append([])
collection[i][index] = results[i]
def run_test(
test_cfg,
weights,
output_dir,
devices,
deterministic=False,
read_every=100,
vis_thresh=0,
vis_output_dir=None,
):
"""Run a model testing.
Parameters
----------
test_cfg : CfgNode
The cfg for testing.
weights : str
The path of model weights to load.
output_dir : str
The path to save results.
devices : Sequence[int]
The index of computing devices.
deterministic : bool, optional, default=False
Set cudnn deterministic or not.
read_every : int, optional, default=100
Read every N images to distribute to devices.
vis_thresh : float, optional, default=0
The score threshold for visualization.
vis_output_dir : str, optional
The path to save visualizations.
"""
cfg.merge_from_other_cfg(test_cfg)
evaluator = build_evaluator(output_dir)
devices = devices if devices else [cfg.GPU_ID]
num_devices = len(devices)
num_images = evaluator.num_images
max_dets = cfg.TEST.DETECTIONS_PER_IM
read_stride = float(num_devices * cfg.TEST.IMS_PER_BATCH)
read_every = int(np.ceil(read_every / read_stride) * read_stride)
visualizer = vis.Visualizer(cfg.MODEL.CLASSES, vis_thresh)
queues = [mp.Queue() for _ in range(num_devices + 1)]
commands = [InferenceCommand(
queues[i], queues[-1], kwargs={
'cfg': test_cfg,
'weights': weights,
'device': devices[i],
'deterministic': deterministic,
'verbose': i == 0,
}) for i in range(num_devices)]
actors = [mp.Process(target=command.run) for command in commands]
for actor in actors:
actor.start()
timers = collections.defaultdict(profiler.Timer)
all_boxes, all_masks, vis_images = [], [], {}
for count in range(1, num_images + 1):
img_id, img = evaluator.get_image()
queues[count % num_devices].put((count - 1, img))
if vis_thresh > 0 and vis_output_dir:
filename = vis_output_dir + '/%s.png' % img_id
vis_images[count - 1] = (filename, img)
if count % read_every > 0 and count < num_images:
continue
if count == num_images:
for i in range(num_devices):
queues[i].put((-1, None))
for _ in range(((count - 1) % read_every + 1)):
index, time_diffs, outputs = queues[-1].get()
outputs = filter_outputs(outputs, max_dets)
extend_results(index, all_boxes, outputs['boxes'])
extend_results(index, all_masks, outputs.get('masks', None))
for name, diff in time_diffs.items():
timers[name].add_diff(diff)
if vis_thresh > 0 and vis_output_dir:
filename, img = vis_images[index]
visualizer.draw_instances(
img=img,
boxes=outputs['boxes'],
masks=outputs.get('masks', None)).save(filename)
del vis_images[index]
avg_time = sum([t.average_time for t in timers.values()])
eta_seconds = avg_time * (num_images - count)
print('\rim_detect: {:d}/{:d} [{:.3f}s + {:.3f}s] (eta: {})'
.format(count, num_images,
timers['im_detect'].average_time,
timers['misc'].average_time,
str(datetime.timedelta(seconds=int(eta_seconds)))),
end='')
print('\nEvaluating detections...')
evaluator.eval_bbox(all_boxes)
if len(all_masks) > 0:
print('Evaluating segmentations...')
evaluator.eval_segm(all_boxes, all_masks)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Training engine."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import os
from dragon.vm import torch
from seetadet.core.config import cfg
from seetadet.core.engine.build import build_lr_scheduler
from seetadet.core.engine.build import build_optimizer
from seetadet.core.engine.build import build_tensorboard
from seetadet.core.engine.utils import count_params
from seetadet.core.engine.utils import get_device
from seetadet.core.engine.utils import get_param_groups
from seetadet.data.build import build_loader_train
from seetadet.models.build import build_detector
from seetadet.utils import logging
from seetadet.utils import profiler
class Trainer(object):
"""Schedule the iterative model training."""
def __init__(self, coordinator, start_iter=0):
# Build loader.
self.loader = build_loader_train()
# Build model.
self.model = build_detector(training=True)
self.model.load_weights(cfg.TRAIN.WEIGHTS, strict=start_iter > 0)
self.model.to(device=get_device(cfg.GPU_ID))
if cfg.MODEL.PRECISION.lower() == 'float16':
self.model.half()
# Build optimizer.
self.loss_scale = cfg.SOLVER.LOSS_SCALE
param_groups_getter = get_param_groups
if cfg.SOLVER.LAYER_LR_DECAY < 1.0:
lr_scale_getter = functools.partial(
self.model.backbone.get_lr_scale,
decay=cfg.SOLVER.LAYER_LR_DECAY)
param_groups_getter = functools.partial(
param_groups_getter, lr_scale_getter=lr_scale_getter)
self.optimizer = build_optimizer(param_groups_getter(self.model))
self.scheduler = build_lr_scheduler()
# Build monitor.
self.coordinator = coordinator
self.metrics = collections.OrderedDict()
self.board = None
@property
def iter(self):
return self.scheduler._step_count
def snapshot(self):
"""Save the checkpoint of current iterative step."""
f = cfg.SOLVER.SNAPSHOT_PREFIX
f += '_iter_{}.pkl'.format(self.iter)
f = os.path.join(self.coordinator.path_at('checkpoints'), f)
if logging.is_root() and not os.path.exists(f):
torch.save(self.model.state_dict(), f, pickle_protocol=4)
logging.info('Wrote snapshot to: {:s}'.format(f))
def add_metrics(self, stats):
"""Add or update the metrics."""
for k, v in stats['metrics'].items():
if k not in self.metrics:
self.metrics[k] = profiler.SmoothedValue()
self.metrics[k].update(v)
def display_metrics(self, stats):
"""Send metrics to the monitor."""
logging.info('Iteration %d, lr = %.8f, time = %.2fs'
% (stats['iter'], stats['lr'], stats['time']))
for k, v in self.metrics.items():
logging.info(' ' * 4 + 'Train net output({}): {:.4f} ({:.4f})'
.format(k, stats['metrics'][k], v.average()))
if self.board is not None:
self.board.scalar_summary('lr', stats['lr'], stats['iter'])
self.board.scalar_summary('time', stats['time'], stats['iter'])
for k, v in self.metrics.items():
self.board.scalar_summary(k, v.average(), stats['iter'])
def step(self):
stats = {'iter': self.iter}
metrics = collections.defaultdict(float)
# Run forward.
timer = profiler.Timer().tic()
inputs = self.loader()
outputs, losses = self.model(inputs), []
for k, v in outputs.items():
if 'loss' in k:
if isinstance(v, (tuple, list)):
losses.append(sum(v[1:], v[0]).mul_(1. / len(v)))
metrics.update(dict(('stage%d_' % (i + 1) + k, float(x))
for i, x in enumerate(v)))
else:
losses.append(v)
metrics[k] += float(v)
# Run backward.
losses = sum(losses[1:], losses[0])
if self.loss_scale != 1.0:
losses *= self.loss_scale
losses.backward()
# Apply update.
stats['lr'] = self.scheduler.get_lr()
for group in self.optimizer.param_groups:
group['lr'] = stats['lr'] * group.get('lr_scale', 1.0)
self.optimizer.step()
self.scheduler.step()
stats['time'] = timer.toc()
stats['metrics'] = collections.OrderedDict(sorted(metrics.items()))
return stats
def train_model(self, start_iter=0):
"""Network training loop."""
timer = profiler.Timer()
max_steps = cfg.SOLVER.MAX_STEPS
display_every = cfg.SOLVER.DISPLAY
progress_every = 10 * display_every
snapshot_every = cfg.SOLVER.SNAPSHOT_EVERY
self.scheduler._step_count = start_iter
while self.iter < max_steps:
with timer.tic_and_toc():
stats = self.step()
self.add_metrics(stats)
if stats['iter'] % display_every == 0:
self.display_metrics(stats)
if self.iter % progress_every == 0:
logging.info(profiler.get_progress(timer, self.iter, max_steps))
if self.iter % snapshot_every == 0:
self.snapshot()
self.metrics.clear()
def run_train(coordinator, start_iter=0, enable_tensorboard=False):
"""Start a network training task."""
trainer = Trainer(coordinator, start_iter=start_iter)
if enable_tensorboard and logging.is_root():
trainer.board = build_tensorboard(coordinator.path_at('logs'))
logging.info('#Params: %.2fM' % count_params(trainer.model))
logging.info('Start training...')
trainer.train_model(start_iter)
trainer.snapshot()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Engine utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import importlib.machinery
import os
import dragon
from dragon.core.framework import backend
from dragon.vm import torch
def count_params(module):
"""Return the number of parameters in MB."""
return sum([v.size().numel() for v in module.parameters()]) / 1e6
def freeze_module(module):
"""Freeze parameters of given module."""
module.eval()
for param in module.parameters():
param.requires_grad = False
def get_device(index):
"""Create the available device object."""
if torch.cuda.is_available():
return torch.device('cuda', index)
try:
if torch.backends.mps.is_available():
return torch.device('mps', index)
except AttributeError:
pass
return torch.device('cpu')
def get_param_groups(module, lr_scale_getter=None):
"""Separate parameters into groups."""
memo, groups = {}, collections.OrderedDict()
for name, param in module.named_parameters():
if not param.requires_grad:
continue
attrs = collections.OrderedDict()
if lr_scale_getter:
attrs['lr_scale'] = lr_scale_getter(name)
memo[name] = param.shape
no_weight_decay = not (name.endswith('weight') and param.dim() > 1)
no_weight_decay = getattr(param, 'no_weight_decay', no_weight_decay)
if no_weight_decay:
attrs['weight_decay'] = 0
group_name = '/'.join(['%s:%s' % (v[0], v[1]) for v in list(attrs.items())])
if group_name not in groups:
groups[group_name] = {'params': []}
groups[group_name].update(attrs)
groups[group_name]['params'].append(param)
return list(groups.values())
def load_library(library_prefix):
"""Load a shared library."""
loader_details = (importlib.machinery.ExtensionFileLoader,
importlib.machinery.EXTENSION_SUFFIXES)
library_prefix = os.path.abspath(library_prefix)
lib_dir, fullname = os.path.split(library_prefix)
finder = importlib.machinery.FileFinder(lib_dir, loader_details)
ext_specs = finder.find_spec(fullname)
if ext_specs is None:
raise ImportError('Could not find the pre-built library '
'for <%s>.' % library_prefix)
backend.load_library(ext_specs.origin)
def synchronize_device(device):
"""Synchronize the computation of device."""
if device.type == 'cuda':
torch.cuda.synchronize(device)
elif device.type == 'mps':
dragon.mps.synchronize(device.index)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Registry class."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
class Registry(object):
"""Registry class."""
def __init__(self, name):
self.name = name
self.registry = collections.OrderedDict()
def has(self, key):
return key in self.registry
def register(self, name, func=None, **kwargs):
def decorated(inner_function):
for key in (name if isinstance(
name, (tuple, list)) else [name]):
self.registry[key] = \
functools.partial(inner_function, **kwargs)
return inner_function
if func is not None:
return decorated(func)
return decorated
def get(self, name, default=None):
if name is None:
return None
if not self.has(name):
if default is not None:
return default
raise KeyError("`%s` is not registered in <%s>."
% (name, self.name))
return self.registry[name]
def try_get(self, name):
if self.has(name):
return self.get(name)
return None
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.data import datasets
from seetadet.data import evaluators
from seetadet.data import pipelines
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Anchor generator for RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
class AnchorGenerator(object):
"""Generate anchors for bbox regression."""
def __init__(self, strides, sizes, aspect_ratios,
scales_per_octave=1):
self.strides = strides
self.sizes = _align_args(strides, sizes)
self.aspect_ratios = _align_args(strides, aspect_ratios)
for i in range(len(self.sizes)):
octave_sizes = []
for j in range(1, scales_per_octave):
scale = 2 ** (float(j) / scales_per_octave)
octave_sizes += [x * scale for x in self.sizes[i]]
self.sizes[i] += octave_sizes
self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
self.cell_anchors = []
for i in range(len(strides)):
self.cell_anchors.append(generate_anchors(
strides[i], self.aspect_ratios[i], self.sizes[i]))
self.grid_shapes = None
self.grid_anchors = None
self.grid_coords = None
def reset_grid(self, max_size):
"""Reset the grid."""
self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
self.grid_coords = self.get_coords(self.grid_shapes)
self.grid_anchors = self.get_anchors(self.grid_shapes)
def num_cell_anchors(self, index=0):
"""Return number of cell anchors."""
return self.cell_anchors[index].shape[0]
def num_anchors(self, shapes):
"""Return the number of grid anchors."""
return sum(self.cell_anchors[i].shape[0] * np.prod(shapes[i])
for i in range(len(shapes)))
def get_slices(self, shapes):
slices, offset = [], 0
for i, shape in enumerate(shapes):
num = self.cell_anchors[i].shape[0] * np.prod(shape)
slices.append(slice(offset, offset + num))
offset = offset + num
return slices
def get_coords(self, shapes):
"""Return the x-y coordinates of grid anchors."""
xs, ys = [], []
for i in range(len(shapes)):
height, width = shapes[i]
x, y = np.arange(0, width), np.arange(0, height)
x, y = np.meshgrid(x, y)
# Add A anchors (A,) to cell K shifts (K,)
# to get shift coords (A, K)
xs.append(np.tile(x.flatten(), self.cell_anchors[i].shape[0]))
ys.append(np.tile(y.flatten(), self.cell_anchors[i].shape[0]))
return np.concatenate(xs), np.concatenate(ys)
def get_anchors(self, shapes):
"""Return the grid anchors."""
grid_anchors = []
for i in range(len(shapes)):
h, w = shapes[i]
shift_x = np.arange(0, w) * self.strides[i]
shift_y = np.arange(0, h) * self.strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
shifts = shifts.astype(self.cell_anchors[i].dtype)
# Add A anchors (A, 1, 4) to cell K shifts (1, K, 4)
# to get shift anchors (A, K, 4)
a, k = self.num_cell_anchors(i), shifts.shape[0]
anchors = (self.cell_anchors[i].reshape((a, 1, 4)) +
shifts.reshape((1, k, 4)))
grid_anchors.append(anchors.reshape((a * k, 4)))
return np.vstack(grid_anchors)
def narrow_anchors(self, shapes, inds, return_anchors=False):
"""Return the valid anchors on given shapes."""
max_shapes = self.grid_shapes
anchors = self.grid_anchors
x_coords, y_coords = self.grid_coords
offset1 = offset2 = num1 = num2 = 0
out_inds, out_anchors = [], []
for i in range(len(max_shapes)):
num1 += self.num_cell_anchors(i) * np.prod(max_shapes[i])
num2 += self.num_cell_anchors(i) * np.prod(shapes[i])
inds_keep = inds[np.where((inds >= offset1) & (inds < num1))[0]]
anchors_keep = anchors[inds_keep] if return_anchors else None
x, y = x_coords[inds_keep], y_coords[inds_keep]
z = ((inds_keep - offset1) // max_shapes[i][1]) // max_shapes[i][0]
keep = np.where((x < shapes[i][1]) & (y < shapes[i][0]))[0]
inds_keep = (z * shapes[i][0] + y) * shapes[i][1] + x + offset2
out_inds.append(inds_keep[keep])
out_anchors.append(anchors_keep[keep] if return_anchors else None)
offset1, offset2 = num1, num2
outputs = [np.concatenate(out_inds)]
if return_anchors:
outputs += [np.concatenate(out_anchors)]
return outputs[0] if len(outputs) == 1 else outputs
def generate_anchors(stride=16, ratios=(0.5, 1, 2), sizes=(32, 64, 128, 256, 512)):
"""Generate anchors by enumerating aspect ratios and sizes."""
scales = np.array(sizes) / stride
base_anchor = np.array([-stride / 2., -stride / 2., stride / 2., stride / 2.])
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors.astype('float32')
def _whctrs(anchor):
"""Return the xywh of an anchor."""
w = anchor[2] - anchor[0]
h = anchor[3] - anchor[1]
x_ctr = anchor[0] + 0.5 * w
y_ctr = anchor[1] + 0.5 * h
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""Return a sef of anchors by widths, heights and center."""
ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
return np.hstack((x_ctr - 0.5 * ws, y_ctr - 0.5 * hs,
x_ctr + 0.5 * ws, y_ctr + 0.5 * hs))
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors by aspect ratios."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = np.sqrt(w * h / ratios)
hs = ws * ratios
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _scale_enum(anchor, scales):
"""Enumerate a set of anchors by scales."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws, hs = w * scales, h * scales
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _align_args(strides, args):
"""Align the args to the strides."""
args = (args * len(strides)) if len(args) == 1 else args
assert len(args) == len(strides)
return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Anchor generator for SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
class AnchorGenerator(object):
"""Generate anchors for bbox regression."""
def __init__(self, strides, sizes, aspect_ratios):
self.strides = strides
self.sizes = _align_args(strides, sizes)
self.aspect_ratios = _align_args(strides, aspect_ratios)
self.scales = [[x / y for x in z] for y, z in zip(strides, self.sizes)]
self.cell_anchors = []
for i in range(len(strides)):
self.cell_anchors.append(generate_anchors(
self.aspect_ratios[i], self.sizes[i]))
self.grid_shapes = None
self.grid_anchors = None
def reset_grid(self, max_size):
"""Reset the grid."""
self.grid_shapes = [(int(np.ceil(max_size / x)),) * 2 for x in self.strides]
self.grid_anchors = self.get_anchors(self.grid_shapes)
def num_cell_anchors(self, index=0):
"""Return number of cell anchors."""
return self.cell_anchors[index].shape[0]
def get_anchors(self, shapes):
"""Return the grid anchors."""
grid_anchors = []
for i in range(len(shapes)):
h, w = shapes[i]
shift_x = (np.arange(0, w) + 0.5) * self.strides[i]
shift_y = (np.arange(0, h) + 0.5) * self.strides[i]
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(),
shift_x.ravel(), shift_y.ravel())).transpose()
shifts = shifts.astype(self.cell_anchors[i].dtype)
# Add a anchors (1, A, 4) to cell K shifts (K, 1, 4)
# to get shift anchors (K, A, 4) and reshape to (K * A, 4)
a = self.cell_anchors[i].shape[0]
k = shifts.shape[0]
anchors = (self.cell_anchors[i].reshape((1, a, 4)) +
shifts.reshape((1, k, 4)).transpose((1, 0, 2)))
grid_anchors.append(anchors.reshape((k * a, 4)))
return np.vstack(grid_anchors)
def generate_anchors(ratios, sizes):
"""Generate anchors by enumerating aspect ratios and sizes."""
min_size, max_size = sizes
base_anchor = np.array([-min_size / 2., -min_size / 2.,
min_size / 2., min_size / 2.])
ratio_anchors = _ratio_enum(base_anchor, ratios)
size_anchors = _size_enum(base_anchor, min_size, max_size)
anchors = np.vstack([ratio_anchors[:1], size_anchors, ratio_anchors[1:]])
return anchors.astype('float32')
def _whctrs(anchor):
"""Return the xywh of an anchor."""
w = anchor[2] - anchor[0]
h = anchor[3] - anchor[1]
x_ctr = anchor[0] + 0.5 * w
y_ctr = anchor[1] + 0.5 * h
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""Return a sef of anchors by widths, heights and center."""
ws, hs = ws[:, np.newaxis], hs[:, np.newaxis]
return np.hstack((x_ctr - 0.5 * ws, y_ctr - 0.5 * hs,
x_ctr + 0.5 * ws, y_ctr + 0.5 * hs))
def _ratio_enum(anchor, ratios):
"""Enumerate a set of anchors by aspect ratios."""
w, h, x_ctr, y_ctr = _whctrs(anchor)
hs = np.sqrt(w * h / ratios)
ws = hs * ratios
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _size_enum(anchor, min_size, max_size):
"""Enumerate a anchor for size wrt base_anchor."""
_, _, x_ctr, y_ctr = _whctrs(anchor)
ws = hs = np.sqrt([min_size * max_size])
return _mkanchors(ws, hs, x_ctr, y_ctr)
def _align_args(strides, args):
"""Align the args to the strides."""
args = (args * len(strides)) if len(args) == 1 else args
assert len(args) == len(strides)
return [[x] if not isinstance(x, (tuple, list)) else x[:] for x in args]
if __name__ == '__main__':
anchor_generator = AnchorGenerator(
strides=(8, 16, 32, 64, 100, 300),
sizes=((30, 60), (60, 110), (110, 162),
(162, 213), (213, 264), (264, 315)),
aspect_ratios=((1, 2, 0.5),
(1, 2, 0.5, 3, 0.33),
(1, 2, 0.5, 3, 0.33),
(1, 2, 0.5, 3, 0.33),
(1, 2, 0.5),
(1, 2, 0.5)))
anchor_generator.reset_grid(max_size=300)
assert anchor_generator.grid_anchors.shape == (8732, 4)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Ground-truth assigners."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from seetadet.utils.bbox import bbox_overlaps
class MaxIoUAssigner(object):
"""Assign ground-truth to boxes according to the IoU."""
def __init__(
self,
pos_iou_thr=0.5,
neg_iou_thr=0.5,
min_pos_iou=0.0,
match_low_quality=True,
gt_max_assign_all=True,
):
"""Create a ``MaxIoUAssigner``.
Parameters
----------
pos_iou_thr : float, optional, default=0.5
The minimum IoU overlap to label positives.
neg_iou_thr : float, optional, default=0.5
The maximum IoU overlap to label negatives.
min_pos_iou : float, optional, default=0.0
The minimum IoU overlap to match low quality.
match_low_quality : bool, optional, default=True
Assign boxes for each gt box or not.
gt_max_assign_all : bool, optional, default=True
Assign all boxes with max overlaps for gt boxes or not.
"""
self.pos_iou_thr = pos_iou_thr
self.neg_iou_thr = neg_iou_thr
self.min_pos_iou = min_pos_iou
self.match_low_quality = match_low_quality
self.gt_max_assign_all = gt_max_assign_all
def assign(self, boxes, gt_boxes):
# Initialize assigns with ignored index "-1".
num_boxes = len(boxes)
labels = np.empty((num_boxes,), 'int8')
labels.fill(-1)
# Overlaps between the anchors and the gt boxes.
overlaps = bbox_overlaps(boxes, gt_boxes)
max_overlaps = overlaps.max(axis=1)
# Background: below threshold IoU.
labels[max_overlaps < self.neg_iou_thr] = 0
# Foreground: above threshold IoU.
labels[max_overlaps >= self.pos_iou_thr] = 1
# Foreground: for each gt, assign anchor(s) with highest overlap.
if self.match_low_quality:
if self.gt_max_assign_all:
gt_max_overlaps = overlaps.max(axis=0)
if self.min_pos_iou > 0:
for i in np.where(gt_max_overlaps >= self.min_pos_iou)[0]:
labels[overlaps[:, i] == gt_max_overlaps[i]] = 1
else:
labels[np.where(overlaps == gt_max_overlaps)[0]] = 1
else:
labels[overlaps.argmax(axis=0)] = 1
# Return the assigned labels for future development.
return labels
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for data."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.core.config import cfg
from seetadet.core.registry import Registry
LOADERS = Registry('loaders')
DATASETS = Registry('datasets')
EVALUATORS = Registry('evaluators')
ANCHOR_SAMPLERS = Registry('anchor_samplers')
def build_anchor_sampler():
"""Build the anchor sampler."""
return ANCHOR_SAMPLERS.try_get(cfg.MODEL.TYPE)()
def build_dataset(path):
"""Build the dataset."""
keys = path.split('://')
if len(keys) >= 2:
return DATASETS.get(keys[0])(keys[1])
return DATASETS.get('default')(path)
def build_loader_train(**kwargs):
"""Build the train loader."""
args = {'dataset': cfg.TRAIN.DATASET,
'batch_size': cfg.TRAIN.IMS_PER_BATCH,
'num_workers': cfg.TRAIN.NUM_WORKERS,
'shuffle': True, 'contiguous': True}
args.update(kwargs)
return LOADERS.get(cfg.TRAIN.LOADER)(**args)
def build_loader_test(**kwargs):
"""Build the test loader."""
args = {'dataset': cfg.TEST.DATASET,
'batch_size': cfg.TEST.IMS_PER_BATCH,
'shuffle': False, 'contiguous': False}
args.update(kwargs)
return LOADERS.get(cfg.TEST.LOADER)(**args)
def build_evaluator(output_dir, **kwargs):
"""Build the evaluator."""
evaluator_type = cfg.TEST.EVALUATOR
if not evaluator_type:
return None
args = {'output_dir': output_dir,
'classes': cfg.MODEL.CLASSES}
if evaluator_type == 'voc2007':
args['use_07_metric'] = True
args.update(kwargs)
evaluator = EVALUATORS.get(evaluator_type)(**args)
ann_file = cfg.TEST.JSON_DATASET
if ann_file:
evaluator.load_annotations(ann_file)
return evaluator
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Datasets."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.data.datasets import dataset
from seetadet.data.datasets.datum import AnnotatedDatum
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Base dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import codewithgpu
from seetadet.core.config import cfg
from seetadet.data.build import DATASETS
class Dataset(object):
"""Base dataset class."""
def __init__(self, source):
self.source = source
self.classes = cfg.MODEL.CLASSES
self.num_classes = len(self.classes)
self.class_to_ind = dict(zip(self.classes, range(self.num_classes)))
@property
def getter(self):
"""Return the dataset getter."""
return type(self)
@property
def size(self):
"""Return the dataset size."""
return 0
@DATASETS.register('default')
class RecordDataset(Dataset):
def __init__(self, source):
super(RecordDataset, self).__init__(source)
@property
def getter(self):
"""Return the dataset getter."""
return codewithgpu.RecordDataset
@property
def size(self):
"""Return the dataset size."""
return self.getter(self.source).size
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Annotated datum."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
import numpy as np
class AnnotatedDatum(object):
"""Wrapper for annotated datum."""
def __init__(self, example):
self._example = example
self._img = None
@property
def id(self):
"""Return the example id."""
return self._example['id']
@property
def height(self):
"""Return the image height."""
return self._example['height']
@property
def width(self):
"""Return the image width."""
return self._example['width']
@property
def img(self):
"""Return the image array."""
if self._img is None:
img_bytes = np.frombuffer(self._example['content'], 'uint8')
self._img = cv2.imdecode(img_bytes, cv2.IMREAD_COLOR)
return self._img
@property
def objects(self):
"""Return the annotated objects."""
objects = []
for obj in self._example['object']:
mask = obj.get('mask', None)
polygons = obj.get('polygons', None)
if 'x3' in obj:
poly = np.array([obj['x1'], obj['y1'],
obj['x2'], obj['y2'],
obj['x3'], obj['y3'],
obj['x4'], obj['y4']], 'float32')
x, y, w, h = cv2.boundingRect(poly.reshape((-1, 2)))
bbox = [x, y, x + w, y + h]
polygons = [poly]
elif 'x2' in obj:
bbox = [obj['x1'], obj['y1'], obj['x2'], obj['y2']]
elif 'xmin' in obj:
bbox = [obj['xmin'], obj['ymin'], obj['xmax'], obj['ymax']]
else:
bbox = obj['bbox']
objects.append({'name': obj['name'],
'bbox': bbox,
'difficult': obj.get('difficult', 0)})
if mask is not None and len(mask) > 0:
objects[-1]['mask'] = mask
elif polygons is not None and len(polygons) > 0:
objects[-1]['polygons'] = [np.array(p) for p in polygons]
return objects
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Evaluators."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.data.evaluators import coco_evaluator
from seetadet.data.evaluators import voc_evaluator
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""COCO dataset evaluator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import prettytable
from pycocotools.cocoeval import COCOeval
from seetadet.data.build import EVALUATORS
from seetadet.data.evaluators.evaluator import Evaluator
@EVALUATORS.register('coco')
class COCOEvaluator(Evaluator):
"""Evaluator for MS COCO dataset."""
def __init__(self, output_dir, classes):
super(COCOEvaluator, self).__init__(output_dir, classes, COCOeval)
def print_eval_results(self, coco_eval):
def get_thr_ind(coco_eval, thr):
ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
(coco_eval.params.iouThrs < thr + 1e-5))[0][0]
iou_thr = coco_eval.params.iouThrs[ind]
assert np.isclose(iou_thr, thr)
return ind
ind_lo = get_thr_ind(coco_eval, 0.5)
ind_hi = get_thr_ind(coco_eval, 0.95)
# Precision: (iou, recall, cls, area range, max dets)
# Recall: (iou, cls, area range, max dets)
# Area range index 0: all area ranges
# Max dets index 2: 100 per image
all_prec = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
all_recall = coco_eval.eval['recall'][ind_lo:(ind_hi + 1), :, 0, 2]
metrics = collections.OrderedDict([
('AP@[IoU=0.5:0.95]', []), ('AR@[IoU=0.5:0.95]', [])])
class_table = prettytable.PrettyTable()
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
ap = np.mean(all_prec[:, :, cls_ind - 1]) # (iou, recall, cls)
recall = np.mean(all_recall[:, cls_ind - 1]) # (iou, cls)
metrics['AP@[IoU=0.5:0.95]'].append(ap)
metrics['AR@[IoU=0.5:0.95]'].append(recall)
for k, v in metrics.items():
v = np.nan_to_num(v, nan=0)
class_table.add_column(k, np.round(v * 100, 2))
class_table.add_column('Class', self.classes[1:])
print('Per class results:\n' + class_table.get_string(), '\n')
print('Summary:')
coco_eval.summarize()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Base evaluator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import json
import os
import numpy as np
from pycocotools.coco import COCO
from seetadet.data.build import build_loader_test
from seetadet.utils import logging
from seetadet.utils.mask import encode_masks
from seetadet.utils.mask import paste_masks
class Evaluator(object):
"""Evaluator using COCO json dataset format."""
def __init__(self, output_dir, classes, eval_type=None):
self.output_dir = output_dir
self.classes = classes
self.num_classes = len(self.classes)
self.class_to_cat_id = dict(zip(self.classes, range(self.num_classes)))
self.eval_type = eval_type
self.cocoGt = None
self.loader = build_loader_test()
self.num_images = self.loader.dataset_size
self.cached_inputs = []
self.records = collections.OrderedDict()
def eval_bbox(self, boxes):
"""Evaluate bbox results."""
if len(self.cocoGt.dataset['annotations']) == 0:
logging.info('No annotations. Skip evaluation.')
return
self.verify_records()
res_file = self.write_bbox_results(boxes)
cocoDt = self.cocoGt.loadRes(res_file)
coco_eval = self.eval_type(self.cocoGt, cocoDt, 'bbox')
coco_eval.evaluate()
coco_eval.accumulate()
self.print_eval_results(coco_eval)
def eval_segm(self, boxes, masks):
"""Evaluate segmentation results."""
if len(self.cocoGt.dataset['annotations']) == 0:
logging.info('No annotations. Skip evaluation.')
return
self.verify_records()
res_file = self.write_segm_results(boxes, masks)
cocoDt = self.cocoGt.loadRes(res_file)
coco_eval = self.eval_type(self.cocoGt, cocoDt, 'segm')
coco_eval.evaluate()
coco_eval.accumulate()
self.print_eval_results(coco_eval)
def get_image(self):
"""Return the next image for evaluation."""
if len(self.cached_inputs) == 0:
inputs = self.loader()
for i, img_meta in enumerate(inputs['img_meta']):
self.cached_inputs.append({
'img': inputs['img'][i],
'objects': inputs['objects'][i],
'id': img_meta['id'],
'height': img_meta['height'],
'width': img_meta['width']})
inputs = self.cached_inputs.pop(0)
img_id, img = inputs.pop('id'), inputs.pop('img')
self.records[img_id] = inputs
return img_id, img
def load_annotations(self, ann_file=None):
"""Load annotations."""
self.cocoGt = COCO(ann_file)
if len(self.cocoGt.dataset) > 0:
self.class_to_cat_id = dict((v['name'], v['id'])
for v in self.cocoGt.cats.values())
def verify_records(self):
"""Verify loaded records."""
if len(self.records) != self.num_images:
raise RuntimeError(
'Mismatched number of records and images. ({} vs. {}).'
'\nCheck if existing duplicate image ids.'
.format(len(self.records), self.num_images))
if self.cocoGt is None:
ann_file = self.write_annotations(self.records, self.output_dir)
self.load_annotations(ann_file)
def print_eval_results(self, coco_eval):
"""Print the evaluation results."""
def bbox_results_one_category(self, boxes, cat_id):
"""Write bbox results of a specific category."""
results = []
for i, img_id in enumerate(self.records.keys()):
dets = boxes[i].astype('float64')
if len(dets) == 0:
continue
xs, ys = dets[:, 0], dets[:, 1]
ws, hs = dets[:, 2] - xs, dets[:, 3] - ys
scores = dets[:, -1]
results.extend([{
'image_id': self.get_image_id(img_id),
'category_id': cat_id,
'bbox': [xs[j], ys[j], ws[j], hs[j]],
'score': scores[j],
} for j in range(dets.shape[0])])
return results
def segm_results_one_category(self, boxes, masks, cat_id):
"""Write segm results of a specific category."""
results = []
for i, (img_id, rec) in enumerate(self.records.items()):
dets = boxes[i]
if len(dets) == 0:
continue
scores = dets[:, -1]
rles = encode_masks(paste_masks(
masks[i], dets, (rec['height'], rec['width'])))
results.extend([{
'image_id': self.get_image_id(img_id),
'category_id': cat_id,
'segmentation': rles[j],
'score': float(scores[j]),
} for j in range(dets.shape[0])])
return results
def write_bbox_results(self, boxes):
"""Write bbox results."""
results = []
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'
.format(cls, cls_ind, self.num_classes - 1))
results.extend(self.bbox_results_one_category(
boxes[cls_ind], self.class_to_cat_id[cls]))
res_file = self.get_res_file(type='bbox')
print('Writing results json to {}'.format(res_file))
with open(res_file, 'w') as f:
json.dump(results, f)
return res_file
def write_segm_results(self, boxes, masks):
"""Write segm results."""
results = []
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'
.format(cls, cls_ind, self.num_classes - 1))
results.extend(self.segm_results_one_category(
boxes[cls_ind], masks[cls_ind], self.class_to_cat_id[cls]))
res_file = self.get_res_file(type='segm')
print('Writing results json to {}'.format(res_file))
with open(res_file, 'w') as fid:
json.dump(results, fid)
return res_file
def write_annotations(self):
"""Write annotations."""
dataset = {'images': [], 'categories': [], 'annotations': []}
for img_id, rec in self.records.items():
dataset['images'].append({
'id': self.get_image_id(img_id),
'height': rec['height'], 'width': rec['width']})
for cls in self.classes:
if cls == '__background__':
continue
dataset['categories'].append({
'name': cls, 'id': self.class_to_cat_id[cls]})
for img_id, rec in self.records.items():
img_size = (rec['height'], rec['width'])
for obj in rec['objects']:
x, y = obj['bbox'][0], obj['bbox'][1]
w, h = obj['bbox'][2] - x, obj['bbox'][3] - y
dataset['annotations'].append({
'id': str(len(dataset['annotations'])),
'bbox': [x, y, w, h],
'area': w * h,
'iscrowd': obj['difficult'],
'image_id': self.get_image_id(img_id),
'category_id': self.class_to_cat_id[obj['name']]})
if 'mask' in obj:
segm = {'size': img_size, 'counts': obj['mask']}
dataset['annotations'][-1]['segmentation'] = segm
elif 'polygons' in obj:
segm = []
for poly in obj['polygons']:
if isinstance(poly, np.ndarray):
poly = poly.tolist()
segm.append(poly)
dataset['annotations'][-1]['segmentation'] = segm
ann_file = self.get_ann_file()
print('Writing annotations json to {}'.format(ann_file))
with open(ann_file, 'w') as f:
json.dump(dataset, f)
return ann_file
def get_ann_file(self):
"""Return the ann filename."""
filename = 'annotations.json'
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
return os.path.join(self.output_dir, filename)
def get_res_file(self, type='bbox'):
"""Return the result filename."""
prefix = ''
if type == 'bbox':
prefix = 'detections'
elif type == 'segm':
prefix = 'segmentations'
elif type == 'kpt':
prefix = 'keypoints'
filename = prefix + '.json'
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
return os.path.join(self.output_dir, filename)
@staticmethod
def get_image_id(image_name):
"""Return the image name from the id."""
image_id = image_name.split('_')[-1].split('.')[0]
try:
return int(image_id)
except ValueError:
return image_name
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Evaluation API on the Pascal VOC dataset."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import datetime
import itertools
import time
import numpy as np
from pycocotools import mask as maskUtils
def voc_ap(rec, prec, use_07_metric=False):
"""Compute VOC AP given precision and recall."""
if use_07_metric:
# 11 point metric.
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# Correct AP calculation.
# First append sentinel values at the end.
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# Compute the precision envelope.
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# To calculate area under PR curve, look for points.
# where X axis (recall) changes value.
i = np.where(mrec[1:] != mrec[:-1])[0]
# And sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
class VOCeval(object):
"""Interface for evaluating detection via COCO object."""
def __init__(self, cocoGt=None, cocoDt=None, iouType='bbox',
iouThrs=[0.5, 0.7], use_07_metric=False):
self.cocoGt = cocoGt
self.cocoDt = cocoDt
self.params = Params(iouType)
self.params.iouThrs = iouThrs
self.params.use_07_metric = use_07_metric
if cocoGt is not None:
self.params.imgIds = sorted(cocoGt.getImgIds())
self.params.catIds = sorted(cocoGt.getCatIds())
self.ious = {}
def _prepare(self):
p = self.params
gts = self.cocoGt.loadAnns(
self.cocoGt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
dts = self.cocoDt.loadAnns(
self.cocoDt.getAnnIds(imgIds=p.imgIds, catIds=p.catIds))
for gt in gts:
gt['ignore'] = gt['ignore'] if 'ignore' in gt else 0
gt['ignore'] = 'iscrowd' in gt and gt['iscrowd']
self._gts = collections.defaultdict(list)
self._dts = collections.defaultdict(list)
for gt in gts:
self._gts[gt['image_id'], gt['category_id']].append(gt)
for dt in dts:
self._dts[dt['image_id'], dt['category_id']].append(dt)
self.eval = {}
def evaluate(self):
tic = time.time()
print('Running per image evaluation...')
p = self.params
print('Evaluate annotation type *{}*'.format(p.iouType))
p.imgIds = list(np.unique(p.imgIds))
p.catIds = list(np.unique(p.catIds))
self._prepare()
self.ious = {(imgId, catId): self.computeIoU(imgId, catId)
for imgId in p.imgIds for catId in p.catIds}
self.evalImgs = [self.evaluateImg(imgId, catId)
for catId in p.catIds for imgId in p.imgIds]
toc = time.time()
print('DONE (t={:0.2f}s).'.format(toc - tic))
def accumulate(self, p=None):
print('Accumulating evaluation results...')
tic = time.time()
if not self.evalImgs:
print('Please run evaluate() first')
if p is None:
p = self.params
print('VOC07 metric? ' + ('Yes' if p.use_07_metric else 'No'))
T, K, I = len(p.iouThrs), len(p.catIds), len(p.imgIds)
recall, ap = np.zeros((T, K)), np.zeros((T, K))
for k in range(K):
E = [self.evalImgs[k * I + i] for i in range(I)]
E = [e for e in E if e is not None]
if len(E) == 0:
continue
dtScores = np.concatenate([e['dtScores'] for e in E])
inds = np.argsort(-dtScores)
dtm = np.concatenate([e['dtMatches'] for e in E], axis=1)[:, inds]
dtIg = np.concatenate([e['dtIgnore'] for e in E], axis=1)[:, inds]
gtIg = np.concatenate([e['gtIgnore'] for e in E])
npig = np.count_nonzero(gtIg == 0)
if npig == 0:
continue
tps = np.logical_and(dtm, np.logical_not(dtIg))
fps = np.logical_and(np.logical_not(dtm), np.logical_not(dtIg))
tp_sum = np.cumsum(tps, axis=1).astype(dtype=np.float)
fp_sum = np.cumsum(fps, axis=1).astype(dtype=np.float)
for t, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
nd = len(tp)
rc = tp / npig
pr = tp / np.maximum(tp + fp, np.spacing(1))
recall[t, k] = rc[-1] if nd else 0
ap[t, k] = voc_ap(rc, pr, use_07_metric=p.use_07_metric)
self.eval = {'counts': [T, K],
'date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'ap': ap, 'recall': recall}
toc = time.time()
print('DONE (t={:0.2f}s).'.format(toc - tic))
def computeIoU(self, imgId, catId):
p = self.params
gt = self._gts[imgId, catId]
dt = self._dts[imgId, catId]
if len(gt) == 0 and len(dt) == 0:
return []
inds = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in inds]
if p.iouType == 'segm':
g = [g['segmentation'] for g in gt]
d = [d['segmentation'] for d in dt]
elif p.iouType == 'bbox':
g = [g['bbox'] for g in gt]
d = [d['bbox'] for d in dt]
else:
raise Exception('unknown iouType for iou computation')
iscrowd = [int(o['iscrowd']) for o in gt]
return maskUtils.iou(d, g, iscrowd)
def evaluateImg(self, imgId, catId):
p = self.params
gt = self._gts[imgId, catId]
dt = self._dts[imgId, catId]
if len(gt) == 0 and len(dt) == 0:
return None
for g in gt:
g['_ignore'] = g['ignore']
gtind = np.argsort([g['_ignore'] for g in gt], kind='mergesort')
gt = [gt[i] for i in gtind]
dtind = np.argsort([-d['score'] for d in dt], kind='mergesort')
dt = [dt[i] for i in dtind]
iscrowd = [int(o['iscrowd']) for o in gt]
ious = (self.ious[imgId, catId][:, gtind]
if len(self.ious[imgId, catId]) > 0 else self.ious[imgId, catId])
T, G, D = len(p.iouThrs), len(gt), len(dt)
gtm, dtm = np.zeros((T, G)), np.zeros((T, D))
gtIg, dtIg = np.array([g['_ignore'] for g in gt]), np.zeros((T, D))
for (tind, iou), (dind, d) in itertools.product(
enumerate(p.iouThrs), enumerate(dt)):
m = -1
for gind, g in enumerate(gt):
if gtm[tind, gind] > 0 and not iscrowd[gind]:
continue
if m > -1 and gtIg[m] == 0 and gtIg[gind] == 1:
break
if ious[dind, gind] <= iou:
continue
m = gind
if m == -1:
continue
dtIg[tind, dind] = gtIg[m]
dtm[tind, dind] = gt[m]['id']
gtm[tind, m] = d['id']
return {'image_id': imgId,
'category_id': catId,
'dtMatches': dtm,
'dtScores': [d['score'] for d in dt],
'gtIgnore': gtIg,
'dtIgnore': dtIg}
class Params(object):
"""Params for evaluation API."""
def setDetParams(self):
self.imgIds = []
self.catIds = []
self.iouThrs = [0.5]
self.use_07_metric = False
def __init__(self, iouType='segm'):
if iouType == 'segm' or iouType == 'bbox':
self.setDetParams()
else:
raise Exception('iouType not supported')
self.iouType = iouType
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""VOC dataset evaluator."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import functools
import numpy as np
import prettytable
from seetadet.data.build import EVALUATORS
from seetadet.data.evaluators.evaluator import Evaluator
from seetadet.data.evaluators.voc_eval import VOCeval
@EVALUATORS.register(['voc', 'voc2007', 'voc2010', 'voc2012'])
class VOCEvaluator(Evaluator):
"""Evaluator for Pascal VOC dataset."""
def __init__(self, output_dir, classes, use_07_metric=False):
eval_type = functools.partial(
VOCeval, iouThrs=[0.5], use_07_metric=use_07_metric)
super(VOCEvaluator, self).__init__(output_dir, classes, eval_type)
def print_eval_results(self, coco_eval):
metrics = collections.OrderedDict()
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
for k, name in zip(('ap', 'recall'), ('AP', 'AR')):
for i, iou in enumerate(coco_eval.params.iouThrs):
name = '%s@[IoU=%s]' % (name, str(iou))
v = coco_eval.eval[k][i, cls_ind - 1]
if name not in metrics:
metrics[name] = []
metrics[name].append(v)
class_table = prettytable.PrettyTable()
summary_list = []
for k, v in metrics.items():
v = np.nan_to_num(v, nan=0)
class_table.add_column(k, np.round(v * 100, 2))
iStr = ' {:<18} {} @[ IoU={:<9} | area={:>6s} | maxDets={:>3d} ] = {:0.3f}'
titleStr, typeStr = 'Average Precision', '(AP)'
if k.startswith('AR'):
titleStr, typeStr = 'Average Recall', '(AR)'
iouStr = '{:0.2f}'.format(float(k.split('IoU=')[-1][:-1]))
summary_list.append(iStr.format(titleStr, typeStr, iouStr, 'all', -1, np.mean(v)))
class_table.add_column('Class', self.classes[1:])
print('Per class results:\n' + class_table.get_string(), '\n')
print('Summary:\n' + '\n'.join(summary_list))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Data loader."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import multiprocessing as mp
import time
import threading
import queue
import codewithgpu
import dragon
from seetadet.core.config import cfg
from seetadet.data.build import build_dataset
from seetadet.utils import logging
from seetadet.utils.blob import blob_vstack
class BalancedQueues(object):
"""Balanced queues."""
def __init__(self, base_queue, num=1):
self.queues = [base_queue]
self.queues += [mp.Queue(base_queue._maxsize) for _ in range(num - 1)]
self.index = 0
def put(self, obj, block=True, timeout=None):
q = self.queues[self.index]
q.put(obj, block=block, timeout=timeout)
self.index = (self.index + 1) % len(self.queues)
def get(self, block=True, timeout=None):
q = self.queues[self.index]
obj = q.get(block=block, timeout=timeout)
self.index = (self.index + 1) % len(self.queues)
return obj
def get_n(self, num=1):
outputs = []
while len(outputs) < num:
obj = self.get()
if obj is not None:
outputs.append(obj)
return outputs
class DataLoaderBase(threading.Thread):
"""Base class of data loader."""
def __init__(self, worker, **kwargs):
super(DataLoaderBase, self).__init__(daemon=True)
self.batch_size = kwargs.get('batch_size', 2)
self.num_readers = kwargs.get('num_readers', 1)
self.num_workers = kwargs.get('num_workers', 3)
self.queue_depth = kwargs.get('queue_depth', 2)
# Initialize distributed group.
rank, group_size = 0, 1
dist_group = dragon.distributed.get_group()
if dist_group is not None:
group_size = dist_group.size
rank = dragon.distributed.get_rank(dist_group)
# Build queues.
self.reader_queue = mp.Queue(self.queue_depth * self.batch_size)
self.worker_queue = mp.Queue(self.queue_depth * self.batch_size)
self.batch_queue = queue.Queue(self.queue_depth)
self.reader_queue = BalancedQueues(self.reader_queue, self.num_workers)
self.worker_queue = BalancedQueues(self.worker_queue, self.num_workers)
# Build readers.
self.readers = []
for i in range(self.num_readers):
partition_id = i
num_partitions = self.num_readers
num_partitions *= group_size
partition_id += rank * self.num_readers
self.readers.append(codewithgpu.DatasetReader(
output_queue=self.reader_queue,
partition_id=partition_id,
num_partitions=num_partitions,
seed=cfg.RNG_SEED + partition_id, **kwargs))
self.readers[i].start()
time.sleep(0.1)
# Build workers.
self.workers = []
for i in range(self.num_workers):
p = worker(**kwargs)
p.seed += (i + rank * self.num_workers)
p.reader_queue = self.reader_queue.queues[i]
p.worker_queue = self.worker_queue.queues[i]
p.start()
self.workers.append(p)
time.sleep(0.1)
# Register cleanup callbacks.
def cleanup():
def terminate(processes):
for p in processes:
p.terminate()
p.join()
terminate(self.workers)
terminate(self.readers)
import atexit
atexit.register(cleanup)
# Start batch prefetching.
self.start()
def next(self):
"""Return the next batch of data."""
return self.__next__()
def run(self):
"""Main loop."""
def __call__(self):
return self.next()
def __iter__(self):
"""Return the iterator self."""
return self
def __next__(self):
"""Return the next batch of data."""
return self.batch_queue.get()
class DataLoader(DataLoaderBase):
"""Loader to return the batch of data."""
def __init__(self, dataset, worker, **kwargs):
dataset = build_dataset(dataset)
self.dataset_size = dataset.size
self.contiguous = kwargs.get('contiguous', True)
self.prefetch_count = kwargs.get('prefetch_count', 50)
self.img_mean = cfg.MODEL.PIXEL_MEAN
self.img_align = (cfg.BACKBONE.COARSEST_STRIDE,) * 2
args = {'path': dataset.source,
'dataset_getter': dataset.getter,
'classes': dataset.classes,
'shuffle': kwargs.get('shuffle', True),
'batch_size': kwargs.get('batch_size', 1),
'num_workers': kwargs.get('num_workers', 1)}
super(DataLoader, self).__init__(worker, **args)
def run(self):
"""Main loop."""
logging.info('Prefetch batches...')
prev_inputs = self.worker_queue.get_n(
self.prefetch_count * self.batch_size)
next_inputs = []
while True:
# Use cached buffer for next N inputs.
if len(next_inputs) == 0:
next_inputs = prev_inputs
if 'aspect_ratio' in next_inputs[0]:
# Inputs are sorted for aspect ratio grouping.
next_inputs.sort(key=lambda d: d['aspect_ratio'][0] > 1)
prev_inputs = []
# Collect the next batch.
outputs = collections.defaultdict(list)
for _ in range(self.batch_size):
inputs = next_inputs.pop(0)
for k, v in inputs.items():
outputs[k].extend(v)
prev_inputs += self.worker_queue.get_n(1)
# Stack batch data.
if self.contiguous:
outputs['img'] = blob_vstack(
outputs['img'], fill_value=self.img_mean,
align=self.img_align)
# Send batch data to consumer.
self.batch_queue.put(outputs)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Data loading pipelines."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import multiprocessing
import cv2
import numpy as np
from seetadet.core.config import cfg
from seetadet.data import transforms
from seetadet.data.build import LOADERS
from seetadet.data.build import build_anchor_sampler
from seetadet.data.datasets import AnnotatedDatum
from seetadet.data.loader import DataLoader
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.bbox import filter_empty_boxes
class WorkerBase(multiprocessing.Process):
"""Base class of data worker."""
def __init__(self):
super(WorkerBase, self).__init__(daemon=True)
self.seed = cfg.RNG_SEED
self.reader_queue = None
self.worker_queue = None
def get_outputs(self, inputs):
"""Return the processed outputs."""
return inputs
def run(self):
"""Main prefetch loop."""
# Disable the opencv threading.
cv2.setNumThreads(1)
# Fix the process-local random seed.
np.random.seed(self.seed)
inputs = []
while True:
# Use cached buffer for next 4 inputs.
while len(inputs) < 4:
inputs.append(self.reader_queue.get())
outputs = self.get_outputs(inputs)
self.worker_queue.put(outputs)
class DetTrainWorker(WorkerBase):
"""Generic train pipeline for detection."""
def __init__(self, **kwargs):
super(DetTrainWorker, self).__init__()
self.parse_boxes = transforms.ParseBoxes()
self.resize = transforms.RandomResize(
scales=cfg.TRAIN.SCALES,
scales_range=cfg.TRAIN.SCALES_RANGE,
max_size=cfg.TRAIN.MAX_SIZE)
self.flip = transforms.RandomFlip()
self.crop = transforms.RandomCrop(cfg.TRAIN.CROP_SIZE)
self.distort = transforms.ColorJitter(cfg.TRAIN.COLOR_JITTER)
self.anchor_sampler = build_anchor_sampler()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, boxes = datum.img, self.parse_boxes(datum)
img, boxes = self.resize(img, boxes)
img, boxes = self.flip(img, boxes)
img, boxes = self.crop(img, boxes)
boxes = clip_boxes(boxes, img.shape)
boxes = boxes[filter_empty_boxes(boxes)]
if len(boxes) == 0:
return None
img = self.distort(img)
im_scale = self.resize.im_scale
aspect_ratio = float(img.shape[0]) / float(img.shape[1])
outputs = {'img': [img],
'gt_boxes': [boxes],
'im_info': [img.shape[:2] + (im_scale,)],
'aspect_ratio': [aspect_ratio]}
if self.anchor_sampler is not None:
data = self.anchor_sampler.sample(boxes)
for k, v in data.items():
outputs[k] = [v]
return outputs
class MaskTrainWorker(WorkerBase):
"""Generic train pipeline for instance segmentation."""
def __init__(self, **kwargs):
super(MaskTrainWorker, self).__init__()
self.parse_boxes = transforms.ParseBoxes()
self.parse_segms = transforms.ParseSegms()
self.resize = transforms.RandomResize(
scales=cfg.TRAIN.SCALES,
scales_range=cfg.TRAIN.SCALES_RANGE,
max_size=cfg.TRAIN.MAX_SIZE)
self.flip = transforms.RandomFlip()
self.crop = transforms.RandomCrop(cfg.TRAIN.CROP_SIZE)
self.distort = transforms.ColorJitter(cfg.TRAIN.COLOR_JITTER)
self.recompute_boxes = cfg.TRAIN.CROP_SIZE > 0
self.anchor_sampler = build_anchor_sampler()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, boxes = datum.img, self.parse_boxes(datum)
segms = self.parse_segms(datum)
img, boxes, segms = self.resize(img, boxes, segms)
img, boxes, segms = self.flip(img, boxes, segms)
img, boxes, segms = self.crop(img, boxes, segms)
if self.recompute_boxes:
boxes[:, :4] = segms.get_boxes()
else:
boxes = clip_boxes(boxes, img.shape)
keep = filter_empty_boxes(boxes)
boxes, segms = boxes[keep], segms[keep]
if len(boxes) == 0:
return None
img = self.distort(img)
im_scale = self.resize.im_scale
aspect_ratio = float(img.shape[0]) / float(img.shape[1])
outputs = {'img': [img],
'gt_boxes': [boxes],
'gt_segms': [segms],
'im_info': [img.shape[:2] + (im_scale,)],
'scale_jitter': [self.resize.scale_jitter],
'aspect_ratio': [aspect_ratio]}
if self.anchor_sampler is not None:
data = self.anchor_sampler.sample(boxes)
for k, v in data.items():
outputs[k] = [v]
return outputs
class SSDTrainWorker(WorkerBase):
"""Generic train pipeline for SSD detection."""
def __init__(self, **kwargs):
super(SSDTrainWorker, self).__init__()
self.parse_boxes = transforms.ParseBoxes()
self.paste = transforms.RandomPaste()
self.crop = transforms.RandomBBoxCrop()
self.resize = transforms.ResizeWarp(cfg.TRAIN.SCALES[0])
self.flip = transforms.RandomFlip()
self.distort = transforms.ColorJitter(cfg.TRAIN.COLOR_JITTER)
self.anchor_sampler = build_anchor_sampler()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, boxes = datum.img, self.parse_boxes(datum)
boxes /= [(img.shape[1], img.shape[0]) * 2 + (1,)]
img, boxes = self.paste(img, boxes)
img, boxes = self.crop(img, boxes)
if len(boxes) == 0:
return None
img = self.resize(img)
boxes[:, :4] *= img.shape[0]
img, boxes = self.flip(img, boxes)
img = self.distort(img)
outputs = {'img': [img],
'gt_boxes': [boxes],
'im_info': [img.shape[:2]]}
if self.anchor_sampler is not None:
data = self.anchor_sampler.sample(boxes)
for k, v in data.items():
outputs[k] = [v]
return outputs
class DetTestWorker(WorkerBase):
"""Generic test pipeline for detection."""
def __init__(self, **kwargs):
super(DetTestWorker, self).__init__()
def get_outputs(self, inputs):
datum = AnnotatedDatum(inputs.pop(0))
img, objects = datum.img, datum.objects
outputs = {'img': [img], 'objects': [objects],
'img_meta': [{'id': datum.id,
'height': datum.height,
'width': datum.width}]}
return outputs
LOADERS.register('det_train', DataLoader, worker=DetTrainWorker)
LOADERS.register('mask_train', DataLoader, worker=MaskTrainWorker)
LOADERS.register('ssd_train', DataLoader, worker=SSDTrainWorker)
LOADERS.register('det_test', DataLoader, worker=DetTestWorker)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Structures."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.data.structures.mask import PolygonMasks
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask structure."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from copy import deepcopy
import numpy as np
from seetadet.utils.polygon import crop_polygons
from seetadet.utils.polygon import flip_polygons
from seetadet.utils.mask import mask_from
class PolygonMasks(object):
"""Polygon masks."""
def __init__(self, shape=None):
self.data = []
self.shape = list(shape)
def new_masks(self, data, copy=False):
"""Return a new masks object."""
ret = PolygonMasks(self.shape)
ret.data = deepcopy(data) if copy else data
return ret
def apply_flip(self):
"""Apply flip transform."""
for i, mask in enumerate(self.data):
if mask is None:
continue
self.data[i] = flip_polygons(mask, self.shape[1])
return self
def apply_resize(self, size=None, scale=None):
"""Apply resize transform."""
if size is None:
if not isinstance(scale, (tuple, list)):
scale = (scale, scale)
self.shape[0] = int(self.shape[0] * scale[0] + .5)
self.shape[1] = int(self.shape[1] * scale[1] + .5)
else:
if not isinstance(size, (tuple, list)):
size = (size, size)
scale = (size[0] * 1. / self.shape[0],
size[1] * 1. / self.shape[1])
self.shape = list(size)
for mask in self.data:
if mask is None:
continue
for p in mask:
p[0::2] *= scale[1]
p[1::2] *= scale[0]
return self
def apply_crop(self, crop_box):
"""Apply crop transform."""
self.shape = [crop_box[3] - crop_box[1],
crop_box[2] - crop_box[0]]
for i, mask in enumerate(self.data):
if mask is None:
continue
self.data[i] = crop_polygons(mask, crop_box)
def crop_and_resize(self, boxes, mask_size):
"""Return the resized ROI masks."""
return [mask_from(self.data[i], mask_size, boxes[i])
for i in range(len(self.data))]
def get_boxes(self):
"""Return the bounding boxes of masks."""
boxes = np.zeros((len(self.data), 4), 'float32')
for i, mask in enumerate(self.data):
if len(mask) == 0:
continue
xymin = np.array([float('inf'), float('inf')], 'float32')
xymax = np.zeros((2,), 'float32')
for p in mask:
coords = p.reshape((-1, 2)).astype('float32')
xymin = np.minimum(xymin, coords.min(0))
xymax = np.maximum(xymax, coords.max(0))
boxes[i, :2], boxes[i, 2:] = xymin, xymax
return boxes
def append(self, mask):
"""Append a mask."""
assert isinstance(mask, list)
self.data.append(mask)
return self
def extend(self, masks):
"""Append a set of masks."""
for mask in masks:
self.append(mask)
return self
def __getitem__(self, item):
if isinstance(item, slice):
return self.new_masks(self.data[item])
elif isinstance(item, np.ndarray):
return self.new_masks([self.data[i] for i in item.tolist()])
return self.new_masks([self.data[item]])
def __iadd__(self, masks):
if isinstance(masks, PolygonMasks):
self.data += masks.data
return self
return self.extend(masks)
def __len__(self):
return len(self.data)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import numpy.random as npr
from seetadet.core.config import cfg
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.bbox import distribute_boxes
from seetadet.utils.bbox import filter_empty_boxes
class ProposalTargets(object):
"""Generate ground-truth targets for proposals."""
def __init__(self):
super(ProposalTargets, self).__init__()
self.num_classes = len(cfg.MODEL.CLASSES)
self.num_rois = cfg.FAST_RCNN.BATCH_SIZE
self.num_fg_rois = round(cfg.FAST_RCNN.POSITIVE_FRACTION * self.num_rois)
self.bbox_reg_weights = cfg.FAST_RCNN.BBOX_REG_WEIGHTS
self.bbox_reg_cls_agnostic = cfg.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC
self.mask_size = (cfg.MASK_RCNN.POOLER_RESOLUTION * 2,) * 2
self.lvl_min, self.lvl_max = cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL
self.assigner = MaxIoUAssigner(pos_iou_thr=cfg.FAST_RCNN.POSITIVE_OVERLAP,
neg_iou_thr=cfg.FAST_RCNN.NEGATIVE_OVERLAP,
match_low_quality=False)
self.defaults = {'rois': np.array([[-1, 0, 0, 1, 1]], 'float32'),
'labels': np.array([-1], 'int64'),
'bbox_targets': np.zeros((1, 4), 'float32'),
'mask_targets': np.full((1,) + self.mask_size, -1, 'float32')}
def sample_rois(self, rois, gt_boxes):
"""Match and sample positive and negative RoIs."""
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(rois[:, 1:5], gt_boxes)
fg_inds = np.where(labels > 0)[0]
bg_inds = np.where(labels == 0)[0]
# Include ground-truth boxes as foreground regions.
batch_inds = np.full((gt_boxes.shape[0], 1), rois[0, 0])
gt_inds = np.arange(len(rois), len(rois) + len(batch_inds))
fg_inds = np.concatenate((fg_inds, gt_inds))
rois = np.vstack((rois, np.hstack((batch_inds, gt_boxes[:, :4]))))
# Sample foreground regions without replacement.
num_fg_rois = int(min(self.num_fg_rois, fg_inds.size))
fg_inds = npr.choice(fg_inds, num_fg_rois, False)
# Sample background regions without replacement.
num_bg_rois = self.num_rois - num_fg_rois
num_bg_rois = min(num_bg_rois, bg_inds.size)
if bg_inds.size > 0:
bg_inds = npr.choice(bg_inds, num_bg_rois, False)
# Take values via sampled indices.
keep_inds = np.append(fg_inds, bg_inds)
rois = rois[keep_inds]
overlaps = bbox_overlaps(rois[:, 1:5], gt_boxes[:, :4])
gt_assignments = overlaps.argmax(axis=1)
labels = gt_boxes[gt_assignments, 4].astype('int64')
# Reassign background regions.
labels[num_fg_rois:] = 0
return rois, labels, gt_assignments
def distribute_blobs(self, blobs, lvls):
"""Distribute blobs on given levels."""
outputs = collections.defaultdict(list)
lvl_inds = [np.where(lvls == (i + self.lvl_min))[0]
for i in range(self.lvl_max - self.lvl_min + 1)]
for inds in lvl_inds:
for key, blob in blobs.items():
outputs[key].append(blob[inds] if len(inds) > 0
else self.defaults[key])
return outputs
def get_bbox_targets(self, rois, boxes):
return bbox_transform(rois, boxes, weights=self.bbox_reg_weights)
def get_mask_targets(self, rois, segms, inds):
targets = np.full((len(rois),) + self.mask_size, -1, 'float32')
masks = segms[inds].crop_and_resize(rois[inds], self.mask_size)
for i, j in enumerate(inds):
if masks[i] is not None:
targets[j] = masks[i]
return targets
def compute(self, **inputs):
"""Compute proposal targets."""
blobs = collections.defaultdict(list)
all_rois = inputs['rois']
batch_inds = all_rois[:, 0].astype('int32')
# Compute targets per image.
for i, gt_boxes in enumerate(inputs['gt_boxes']):
# Select proposals of this image.
rois = all_rois[np.where(batch_inds == i)[0]]
# Filter empty RoIs.
rois[:, 1:5] = clip_boxes(rois[:, 1:5], inputs['im_info'][i][:2])
rois = rois[filter_empty_boxes(rois[:, 1:5])]
# Sample a batch of RoIs for training.
rois, labels, gt_assignments = self.sample_rois(rois, gt_boxes)
# Fill blobs.
blobs['rois'].append(rois)
blobs['labels'].append(labels)
blobs['bbox_targets'].append(self.get_bbox_targets(
rois[:, 1:5], gt_boxes[gt_assignments, :4]))
if 'gt_segms' in inputs:
fg_inds = np.where(labels > 0)[0]
segms = inputs['gt_segms'][i][gt_assignments]
targets = self.get_mask_targets(rois[:, 1:5], segms, fg_inds)
blobs['mask_targets'].append(targets)
# Concat to get the contiguous blobs.
blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
# Distribute blobs by the level of all ROIs.
lvls = distribute_boxes(blobs['rois'][:, 1:], self.lvl_min, self.lvl_max)
blobs = self.distribute_blobs(blobs, lvls)
# Add the targets using foreground ROIs only.
for lvl in range(self.lvl_max - self.lvl_min + 1):
inds = np.where(blobs['labels'][lvl] > 0)[0]
if len(inds) > 0:
blobs['fg_rois'].append(blobs['rois'][lvl][inds])
blobs['mask_labels'].append(blobs['labels'][lvl][inds] - 1)
if 'mask_targets' in blobs:
blobs['mask_targets'][lvl] = blobs['mask_targets'][lvl][inds]
else:
blobs['fg_rois'].append(self.defaults['rois'])
blobs['mask_labels'].append(np.array([0], 'int64'))
if 'mask_targets' in blobs:
blobs['mask_targets'][lvl] = self.defaults['mask_targets']
# Concat to get contiguous blobs along the levels.
rois, fg_rois = blobs['rois'], blobs['fg_rois']
blobs = dict((k, np.concatenate(blobs[k])) for k in blobs.keys())
# Compute class-specific strides.
bbox_strides = np.arange(len(blobs['rois'])) * (self.num_classes - 1)
mask_strides = np.arange(len(blobs['fg_rois'])) * (self.num_classes - 1)
# Select the foreground RoIs for bbox targets.
fg_inds = np.where(blobs['labels'] > 0)[0]
if len(fg_inds) == 0:
# Sample a proposal randomly to avoid memory issue.
fg_inds = npr.randint(len(blobs['labels']), size=[1])
outputs = {
'rois': [to_tensor(rois[i]) for i in range(len(rois))],
'fg_rois': [to_tensor(fg_rois[i]) for i in range(len(fg_rois))],
'labels': to_tensor(blobs['labels']), 'proposals': np.concatenate(rois),
'bbox_inds': to_tensor(fg_inds if self.bbox_reg_cls_agnostic else
(bbox_strides[fg_inds] + (blobs['labels'][fg_inds] - 1))),
'mask_inds': to_tensor(mask_strides + blobs['mask_labels']),
'bbox_targets': to_tensor(blobs['bbox_targets'][fg_inds]),
'bbox_anchors': to_tensor(blobs['rois'][fg_inds, 1:]),
}
if 'mask_targets' in blobs:
outputs['mask_targets'] = to_tensor(blobs['mask_targets'])
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.build import ANCHOR_SAMPLERS
from seetadet.data.anchors.rpn import AnchorGenerator
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
@ANCHOR_SAMPLERS.register('retinanet')
class AnchorTargets(object):
"""Generate ground-truth targets for anchors."""
def __init__(self):
super(AnchorTargets, self).__init__()
self.generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
scales_per_octave=3)
self.assigner = MaxIoUAssigner(
pos_iou_thr=cfg.RETINANET.POSITIVE_OVERLAP,
neg_iou_thr=cfg.RETINANET.NEGATIVE_OVERLAP)
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.BACKBONE.COARSEST_STRIDE > 0:
stride = float(cfg.BACKBONE.COARSEST_STRIDE)
max_size = int(np.ceil(max_size / stride) * stride)
self.generator.reset_grid(max_size)
def sample(self, gt_boxes):
"""Sample positive and negative anchors."""
anchors = self.generator.grid_anchors
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(anchors, gt_boxes)
# Select foreground and ignored indices
# to avoid too many backgrounds.
# (~100x faster for 200k background indices)
return {'fg_inds': np.where(labels > 0)[0],
'bg_inds': np.where(labels < 0)[0]}
def compute(self, **inputs):
"""Compute anchor targets."""
shapes = [x[:2] for x in inputs['grid_info']]
num_images = len(inputs['gt_boxes'])
num_anchors = self.generator.num_anchors(shapes)
blobs = collections.defaultdict(list)
# "1" is positive, "0" is negative, "-1" is don't care.
labels = np.zeros((num_images, num_anchors), 'int64')
for i, gt_boxes in enumerate(inputs['gt_boxes']):
fg_inds = inputs['fg_inds'][i]
ignore_inds = inputs['bg_inds'][i]
# Narrow anchors to match the feature layout.
ignore_inds = self.generator.narrow_anchors(shapes, ignore_inds)
fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
# Compute bbox targets.
gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
blobs['bbox_anchors'].append(anchors)
blobs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
labels[i, ignore_inds] = -1
labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
# Compute sparse indices.
fg_inds += i * num_anchors
blobs['bbox_inds'].extend([fg_inds])
return {
'labels': to_tensor(labels),
'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Generate targets for RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
import numpy.random as npr
from seetadet.core.config import cfg
from seetadet.data.anchors.rpn import AnchorGenerator
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.data.build import ANCHOR_SAMPLERS
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
@ANCHOR_SAMPLERS.register(['faster_rcnn', 'mask_rcnn', 'cascade_rcnn'])
class AnchorTargets(object):
"""Generate ground-truth targets for anchors."""
def __init__(self):
super(AnchorTargets, self).__init__()
self.generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
self.assigner = MaxIoUAssigner(
pos_iou_thr=cfg.RPN.POSITIVE_OVERLAP,
neg_iou_thr=cfg.RPN.NEGATIVE_OVERLAP)
max_size = max(cfg.TRAIN.MAX_SIZE, max(cfg.TRAIN.SCALES))
if cfg.BACKBONE.COARSEST_STRIDE > 0:
stride = float(cfg.BACKBONE.COARSEST_STRIDE)
max_size = int(np.ceil(max_size / stride) * stride)
self.generator.reset_grid(max_size)
def sample(self, gt_boxes):
"""Sample positive and negative anchors."""
anchors = self.generator.grid_anchors
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(anchors, gt_boxes)
fg_inds = np.where(labels > 0)[0]
bg_inds = np.where(labels == 0)[0]
# Sample sufficient negative labels.
num_bg = cfg.RPN.BATCH_SIZE * 8
if len(bg_inds) > num_bg:
bg_inds = npr.choice(bg_inds, num_bg, False)
# Select foreground and background indices.
return {'fg_inds': fg_inds, 'bg_inds': bg_inds}
def compute(self, **inputs):
"""Compute anchor targets."""
shapes = [x[:2] for x in inputs['grid_info']]
num_anchors = self.generator.num_anchors(shapes)
blobs = collections.defaultdict(list)
for i, gt_boxes in enumerate(inputs['gt_boxes']):
fg_inds = inputs['fg_inds'][i]
bg_inds = inputs['bg_inds'][i]
# Narrow anchors to match the feature layout.
bg_inds = self.generator.narrow_anchors(shapes, bg_inds)
fg_inds, anchors = self.generator.narrow_anchors(shapes, fg_inds, True)
num_fg = int(cfg.RPN.POSITIVE_FRACTION * cfg.RPN.BATCH_SIZE)
if len(fg_inds) > num_fg:
keep = npr.choice(np.arange(len(fg_inds)), num_fg, False)
fg_inds, anchors = fg_inds[keep], anchors[keep]
# Sample negative labels if we have too many.
num_bg = cfg.RPN.BATCH_SIZE - len(fg_inds)
if len(bg_inds) > num_bg:
bg_inds = npr.choice(bg_inds, num_bg, False)
# Compute bbox targets.
gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4])
blobs['bbox_anchors'].append(anchors)
blobs['bbox_targets'].append(bbox_targets)
# Compute sparse indices.
fg_inds += i * num_anchors
bg_inds += i * num_anchors
blobs['cls_inds'] += [fg_inds, bg_inds]
blobs['bbox_inds'] += [fg_inds]
blobs['labels'] += [np.ones_like(fg_inds, 'float32'),
np.zeros_like(bg_inds, 'float32')]
return {
'labels': to_tensor(np.hstack(blobs['labels'])),
'cls_inds': to_tensor(np.hstack(blobs['cls_inds'])),
'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Generate targets for SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.build import ANCHOR_SAMPLERS
from seetadet.data.anchors.ssd import AnchorGenerator
from seetadet.data.assigners import MaxIoUAssigner
from seetadet.ops.normalization import to_tensor
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import bbox_transform
@ANCHOR_SAMPLERS.register('ssd')
class AnchorTargets(object):
"""Generate ground-truth targets for anchors."""
def __init__(self):
super(AnchorTargets, self).__init__()
self.generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
self.assigner = MaxIoUAssigner(
pos_iou_thr=cfg.SSD.POSITIVE_OVERLAP,
neg_iou_thr=cfg.SSD.NEGATIVE_OVERLAP,
gt_max_assign_all=False)
self.neg_pos_ratio = (1.0 / cfg.SSD.POSITIVE_FRACTION) - 1.0
max_size = cfg.ANCHOR_GENERATOR.STRIDES[-1]
self.generator.reset_grid(max_size)
def sample(self, gt_boxes):
"""Sample positive and negative anchors."""
anchors = self.generator.grid_anchors
# Assign ground-truth according to the IoU.
labels = self.assigner.assign(anchors, gt_boxes)
# Select positive and non-positive indices.
return {'fg_inds': np.where(labels > 0)[0],
'bg_inds': np.where(labels <= 0)[0]}
def compute(self, **inputs):
"""Compute anchor targets."""
num_images = len(inputs['gt_boxes'])
num_anchors = self.generator.grid_anchors.shape[0]
cls_score = inputs['cls_score'].numpy().astype('float32')
blobs = collections.defaultdict(list)
# "1" is positive, "0" is negative, "-1" is don't care
labels = np.full((num_images, num_anchors,), -1, 'int64')
for i, gt_boxes in enumerate(inputs['gt_boxes']):
fg_inds = pos_inds = inputs['fg_inds'][i]
neg_inds = inputs['bg_inds'][i]
# Mining hard negatives as background.
num_pos, num_neg = len(pos_inds), len(neg_inds)
num_bg = min(int(num_pos * self.neg_pos_ratio), num_neg)
neg_score = cls_score[i, neg_inds, 0]
bg_inds = neg_inds[np.argsort(neg_score)][:num_bg]
# Compute bbox targets.
anchors = self.generator.grid_anchors[fg_inds]
gt_assignments = bbox_overlaps(anchors, gt_boxes).argmax(axis=1)
bbox_targets = bbox_transform(anchors, gt_boxes[gt_assignments, :4],
weights=cfg.SSD.BBOX_REG_WEIGHTS)
blobs['bbox_anchors'].append(anchors)
blobs['bbox_targets'].append(bbox_targets)
# Compute label assignments.
labels[i, bg_inds] = 0
labels[i, fg_inds] = gt_boxes[gt_assignments, 4]
# Compute sparse indices.
fg_inds += i * num_anchors
blobs['bbox_inds'].extend([fg_inds])
return {
'labels': to_tensor(labels),
'bbox_inds': to_tensor(np.hstack(blobs['bbox_inds'])),
'bbox_targets': to_tensor(np.vstack(blobs['bbox_targets'])),
'bbox_anchors': to_tensor(np.vstack(blobs['bbox_anchors'])),
}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import numpy.random as npr
from seetadet.core.config import cfg
from seetadet.data.structures import PolygonMasks
from seetadet.utils.bbox import bbox_overlaps
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.bbox import flip_boxes
from seetadet.utils.image import im_resize
from seetadet.utils.image import color_jitter
class Transform(object):
"""Base transform type."""
def init_params(self, params=None):
for k, v in (params or {}).items():
if k != 'self' and not k.startswith('_'):
setattr(self, k, v)
def filter_outputs(self, *outputs):
outputs = [x for x in outputs if x is not None]
return outputs if len(outputs) > 1 else outputs[0]
class ParseBoxes(Transform):
"""Parse the ground-truth boxes."""
def __init__(self):
super(ParseBoxes, self).__init__()
self.classes = cfg.MODEL.CLASSES
self.num_classes = len(self.classes)
self.class_indices = dict(zip(self.classes, range(self.num_classes)))
self.use_diff = cfg.TRAIN.USE_DIFF
def __call__(self, datum):
height, width = datum.height, datum.width
objects = list(filter(lambda obj: self.use_diff or
not obj.get('difficult', 0), datum.objects))
boxes = np.empty((len(objects), 5), 'float32')
for i, obj in enumerate(objects):
boxes[i, :] = [max(0, obj['bbox'][0]),
max(0, obj['bbox'][1]),
min(obj['bbox'][2], width),
min(obj['bbox'][3], height),
self.class_indices[obj['name']]]
return boxes
class ParseSegms(Transform):
"""Parse the ground-truth segmentations."""
def __init__(self):
super(ParseSegms, self).__init__()
self.use_diff = cfg.TRAIN.USE_DIFF
def __call__(self, datum):
masks = PolygonMasks((datum.height, datum.width))
objects = filter(lambda obj: self.use_diff or
not obj.get('difficult', 0), datum.objects)
masks += [obj.get('polygons', None) for obj in objects]
return masks
class RandomFlip(Transform):
"""Flip the image randomly."""
def __init__(self, prob=0.5):
super(RandomFlip, self).__init__()
self.prob = prob
self.is_flipped = False
def __call__(self, img, boxes=None, segms=None):
self.is_flipped = npr.rand() < self.prob
img = img[:, ::-1] if self.is_flipped else img
if self.is_flipped and boxes is not None:
boxes = flip_boxes(boxes, img.shape[1])
if self.is_flipped and segms is not None:
segms = segms.apply_flip()
return self.filter_outputs(img, boxes, segms)
class ResizeWarp(Transform):
"""Resize the image to a square size."""
def __init__(self, size):
super(ResizeWarp, self).__init__()
self.size = size
self.im_scale = (1.0, 1.0)
def __call__(self, img, boxes=None):
self.im_scale = (float(self.size) / float(img.shape[0]),
float(self.size) / float(img.shape[1]))
img = im_resize(img, size=self.size)
if boxes is not None:
boxes[:, (0, 2)] = boxes[:, (0, 2)] * self.im_scale[1]
boxes[:, (1, 3)] = boxes[:, (1, 3)] * self.im_scale[0]
return self.filter_outputs(img, boxes)
class RandomResize(Transform):
"""Resize the image randomly."""
def __init__(self, scales=(640,), scales_range=(1.0, 1.0), max_size=1066):
super(RandomResize, self).__init__()
self.scales = scales
self.scales_range = scales_range
self.max_size = max_size
self.im_scale = 1.0
self.scale_jitter = 1.0
def __call__(self, img, boxes=None, segms=None):
im_shape = img.shape
target_size = npr.choice(self.scales)
# Scale along the shortest side.
max_size = max(self.max_size, target_size)
im_size_min = np.min(im_shape[:2])
im_size_max = np.max(im_shape[:2])
self.im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than *MAX_SIZE*.
if np.round(self.im_scale * im_size_max) > max_size:
self.im_scale = float(max_size) / float(im_size_max)
# Apply random scaling to get a range of dynamic scales.
self.scale_jitter = npr.uniform(*self.scales_range)
self.im_scale *= self.scale_jitter
img = im_resize(img, scale=self.im_scale)
if boxes is not None:
boxes[:, :4] *= self.im_scale
if segms is not None:
segms.apply_resize(scale=self.im_scale)
return self.filter_outputs(img, boxes, segms)
class RandomPaste(Transform):
"""Copy image into a larger canvas randomly."""
def __init__(self, prob=0.5):
self.ratio = 1. / cfg.TRAIN.SCALES_RANGE[0]
self.prob = prob if self.ratio > 1 else 0
self.pixel_mean = cfg.MODEL.PIXEL_MEAN
def __call__(self, img, boxes):
if npr.rand() > self.prob:
return img, boxes
im_shape = list(img.shape)
h, w = im_shape[:2]
ratio = npr.uniform(1., self.ratio)
out_h, out_w = int(h * ratio), int(w * ratio)
y1 = int(np.floor(npr.uniform(0., out_h - h)))
x1 = int(np.floor(npr.uniform(0., out_w - w)))
im_shape[:2] = (out_h, out_w)
out_img = np.empty(im_shape, img.dtype)
out_img[:] = self.pixel_mean
out_img[y1:y1 + h, x1:x1 + w, :] = img
out_boxes = boxes.astype(boxes.dtype, copy=True)
out_boxes[:, (0, 2)] = (boxes[:, (0, 2)] * w + x1) / out_w
out_boxes[:, (1, 3)] = (boxes[:, (1, 3)] * h + y1) / out_h
return out_img, out_boxes
class RandomCrop(Transform):
"""Crop the image randomly."""
def __init__(self, crop_size=512):
super(RandomCrop, self).__init__()
self.crop_size = crop_size
self.pixel_mean = cfg.MODEL.PIXEL_MEAN
def __call__(self, img, boxes=None, segms=None):
if self.crop_size <= 0:
return self.filter_outputs(img, boxes, segms)
im_shape = list(img.shape)
h, w = im_shape[:2]
out_h, out_w = (self.crop_size,) * 2
y = npr.randint(max(h - out_h, 0) + 1)
x = npr.randint(max(w - out_w, 0) + 1)
im_shape[:2] = (out_h, out_w)
out_img = np.empty(im_shape, img.dtype)
out_img[:] = self.pixel_mean
out_img[:h, :w] = img[y:y + out_h, x:x + out_w]
img = out_img
if boxes is not None:
boxes[:, (0, 2)] -= x
boxes[:, (1, 3)] -= y
if segms is not None:
segms.apply_crop((x, y, x + out_w, y + out_h))
return self.filter_outputs(img, boxes, segms)
class ColorJitter(Transform):
"""Distort the brightness, contrast and color of image."""
def __init__(self, prob=0.5):
super(ColorJitter, self).__init__()
self.prob = prob
self.brightness_range = (0.875, 1.125)
self.contrast_range = (0.5, 1.5)
self.saturation_range = (0.5, 1.5)
def __call__(self, img):
brightness = contrast = saturation = None
if npr.rand() < self.prob:
brightness = self.brightness_range
if npr.rand() < self.prob:
contrast = self.contrast_range
if npr.rand() < self.prob:
saturation = self.saturation_range
return color_jitter(img, brightness=brightness,
contrast=contrast, saturation=saturation)
class RandomBBoxCrop(Transform):
"""Crop image by sampling a region restricted by bounding boxes."""
def __init__(self, scales_range=(0.3, 1.0), aspect_ratios_range=(0.5, 2.0),
overlaps=(0.0, 0.1, 0.3, 0.5, 0.7, 0.9)):
self.samplers = [{}]
for ov in overlaps:
self.samplers.append({
'scales_range': scales_range,
'aspect_ratios_range': aspect_ratios_range,
'overlaps_range': (ov, 1.0), 'max_trials': 10})
@staticmethod
def generate_sample(param):
scales_range = param.get('scales_range', (1.0, 1.0))
aspect_ratios_range = param.get('aspect_ratios_range', (1.0, 1.0))
scale = npr.uniform(scales_range[0], scales_range[1])
min_aspect_ratio = max(aspect_ratios_range[0], scale**2)
max_aspect_ratio = min(aspect_ratios_range[1], 1. / (scale**2))
aspect_ratio = npr.uniform(min_aspect_ratio, max_aspect_ratio)
bbox_w = scale * (aspect_ratio ** 0.5)
bbox_h = scale / (aspect_ratio ** 0.5)
w_off = npr.uniform(0., 1. - bbox_w)
h_off = npr.uniform(0., 1. - bbox_h)
return np.array([w_off, h_off, w_off + bbox_w, h_off + bbox_h])
@staticmethod
def check_center(sample_box, boxes):
x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
keep = np.where((x_ctr >= sample_box[0]) & (x_ctr <= sample_box[2]) &
(y_ctr >= sample_box[1]) & (y_ctr <= sample_box[3]))[0]
return len(keep) > 0
@staticmethod
def check_overlap(sample_box, boxes, param):
ov_range = param.get('overlaps_range', (0.0, 1.0))
if ov_range[0] == 0.0 and ov_range[1] == 1.0:
return True
ovmax = bbox_overlaps(sample_box[None, :], boxes[:, :4]).max()
if ovmax < ov_range[0] or ovmax > ov_range[1]:
return False
return True
def generate_samples(self, boxes):
crop_boxes = []
for sampler in self.samplers:
for _ in range(sampler.get('max_trials', 1)):
crop_box = self.generate_sample(sampler)
if not self.check_overlap(crop_box, boxes, sampler):
continue
if not self.check_center(crop_box, boxes):
continue
crop_boxes.append(crop_box)
break
return crop_boxes
@classmethod
def crop(cls, img, crop_box, boxes=None):
h, w = img.shape[:2]
w_offset = int(crop_box[0] * w)
h_offset = int(crop_box[1] * h)
crop_w = int((crop_box[2] - crop_box[0]) * w)
crop_h = int((crop_box[3] - crop_box[1]) * h)
img = img[h_offset:h_offset + crop_h, w_offset:w_offset + crop_w]
if boxes is not None:
x_ctr = (boxes[:, 2] + boxes[:, 0]) / 2.0
y_ctr = (boxes[:, 3] + boxes[:, 1]) / 2.0
keep = np.where((x_ctr >= crop_box[0]) & (x_ctr <= crop_box[2]) &
(y_ctr >= crop_box[1]) & (y_ctr <= crop_box[3]))[0]
boxes = boxes[keep]
boxes[:, (0, 2)] = boxes[:, (0, 2)] * w - w_offset
boxes[:, (1, 3)] = boxes[:, (1, 3)] * h - h_offset
boxes = clip_boxes(boxes, (crop_h, crop_w))
boxes[:, (0, 2)] /= crop_w
boxes[:, (1, 3)] /= crop_h
return img, boxes
def __call__(self, img, boxes):
crop_boxes = self.generate_samples(boxes)
if len(crop_boxes) > 0:
crop_box = crop_boxes[npr.randint(len(crop_boxes))]
img, boxes = self.crop(img, crop_box, boxes)
return img, boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.models import backbones
from seetadet.models import decoders
from seetadet.models import dense_heads
from seetadet.models import detectors
from seetadet.models import necks
from seetadet.models import roi_heads
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Backbones."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# Modules
from seetadet.models.backbones import mobilenet_v2
from seetadet.models.backbones import mobilenet_v3
from seetadet.models.backbones import resnet
from seetadet.models.backbones import vgg
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MobileNetV2 backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.conv import ConvNorm2d
def make_divisible(v, divisor=8):
"""Return the divisible value."""
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(self, dim_in, dim_out, kernel_size=3, stride=1, expand_ratio=6):
super(InvertedResidual, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type='ReLU6')
self.has_endpoint = stride == 2
self.apply_shortcut = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.conv1 = (conv_module(dim_in, dim, 1)
if expand_ratio > 1 else nn.Identity())
self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
def forward(self, x):
shortcut = x
x = self.conv1(x)
if self.has_endpoint:
self.endpoint = x
x = self.conv2(x)
x = self.conv3(x)
if self.apply_shortcut:
return x.add_(shortcut)
return x
class MobileNetV2(nn.Module):
"""MobileNetV2 class."""
def __init__(self, depths, dims, strides, expand_ratios, width_mult=1.0):
super(MobileNetV2, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type='ReLU6')
dims = list(map(lambda x: make_divisible(x * width_mult), dims))
self.conv1 = conv_module(3, dims[0], 3, 2)
dim_in, blocks = dims[0], []
self.out_indices, self.out_dims = [], []
for i, (depth, dim) in enumerate(zip(depths, dims[1:-1])):
for j in range(depth):
stride = strides[i] if j == 0 else 1
blocks.append(InvertedResidual(
dim_in, dim, stride=stride,
expand_ratio=expand_ratios[i]))
if blocks[-1].has_endpoint:
self.out_indices.append(len(blocks) - 1)
self.out_dims.append(blocks[-1].dim)
dim_in = dim
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
self.conv2 = conv_module(dim_in, dims[-1], 1)
self.blocks = blocks + [self.conv2]
self.out_dims.append(dims[-1])
self.out_indices.append(len(self.blocks) - 1)
def forward(self, x):
x = self.conv1(x)
outputs = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(blk.__dict__.pop('endpoint', x))
return outputs
BACKBONES.register(
'mobilenet_v2', MobileNetV2,
dims=(32,) + (16, 24, 32, 64, 96, 160, 320) + (1280,),
depths=(1, 2, 3, 4, 3, 3, 1),
strides=(1, 2, 2, 2, 1, 2, 1),
expand_ratios=(1, 6, 6, 6, 6, 6, 6))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""MobileNetV3 backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.conv import ConvNorm2d
def make_divisible(v, divisor=8):
"""Return the divisible value."""
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
if new_v < 0.9 * v:
new_v += divisor
return new_v
class SqueezeExcite(nn.Module):
"""Squeeze-and-Excitation block."""
def __init__(self, dim_in, dim):
super(SqueezeExcite, self).__init__()
self.conv1 = nn.Conv2d(dim_in, dim, 1)
self.conv2 = nn.Conv2d(dim, dim_in, 1)
self.activation1 = nn.ReLU(True)
self.activation2 = nn.Hardsigmoid(True)
def forward(self, x):
scale = x.mean((2, 3), keepdim=True)
scale = self.activation1(self.conv1(scale))
scale = self.activation2(self.conv2(scale))
return x * scale
class InvertedResidual(nn.Module):
"""Invert residual block."""
def __init__(
self,
dim_in,
dim_out,
kernel_size=3,
stride=1,
expand_ratio=3,
squeeze_ratio=1,
activation_type='ReLU',
):
super(InvertedResidual, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type=activation_type)
self.has_endpoint = stride == 2
self.apply_shortcut = stride == 1 and dim_in == dim_out
self.dim = dim = int(round(dim_in * expand_ratio))
self.conv1 = (conv_module(dim_in, dim, 1)
if expand_ratio > 1 else nn.Identity())
self.conv2 = conv_module(dim, dim, kernel_size, stride, groups=dim)
self.se = (SqueezeExcite(dim, make_divisible(dim * squeeze_ratio))
if squeeze_ratio < 1 else nn.Identity())
self.conv3 = conv_module(dim, dim_out, 1, activation_type='')
def forward(self, x):
shortcut = x
x = self.conv1(x)
if self.has_endpoint:
self.endpoint = x
x = self.conv2(x)
x = self.se(x)
x = self.conv3(x)
if self.apply_shortcut:
return x.add_(shortcut)
return x
class MobileNetV3(nn.Module):
"""MobileNetV3 class."""
def __init__(self, depths, dims, kernel_sizes, strides,
expand_ratios, squeeze_ratios, width_mult=1.0):
super(MobileNetV3, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.BACKBONE.NORM,
activation_type='Hardswish')
dims = list(map(lambda x: make_divisible(x * width_mult), dims))
self.conv1 = conv_module(3, dims[0], 3, 2)
dim_in, blocks, coarsest_stride = dims[0], [], 2
self.out_indices, self.out_dims = [], []
for i, (depth, dim) in enumerate(zip(depths, dims[1:])):
coarsest_stride *= strides[i]
layer_expand_ratios = expand_ratios[i]
if not isinstance(layer_expand_ratios, (tuple, list)):
layer_expand_ratios = [layer_expand_ratios]
layer_expand_ratios = list(layer_expand_ratios)
layer_expand_ratios += ([layer_expand_ratios[-1]] *
(depth - len(layer_expand_ratios)))
for j in range(depth):
blocks.append(InvertedResidual(
dim_in, dim,
kernel_size=kernel_sizes[i],
stride=strides[i] if j == 0 else 1,
expand_ratio=layer_expand_ratios[j],
squeeze_ratio=squeeze_ratios[i],
activation_type='Hardswish'
if coarsest_stride >= 16 else 'ReLU'))
if blocks[-1].has_endpoint:
self.out_indices.append(len(blocks) - 1)
self.out_dims.append(blocks[-1].dim)
dim_in = dim
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
self.conv2 = conv_module(dim_in, blocks[-1].dim, 1)
self.blocks = blocks + [self.conv2]
self.out_dims.append(blocks[-1].dim)
self.out_indices.append(len(self.blocks) - 1)
def forward(self, x):
x = self.conv1(x)
outputs = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(blk.__dict__.pop('endpoint', x))
return outputs
BACKBONES.register(
'mobilenet_v3_large', MobileNetV3,
dims=(16,) + (16, 24, 40, 80, 112, 160),
depths=(1, 2, 3, 4, 2, 3),
kernel_sizes=(3, 3, 5, 3, 3, 5),
strides=(1, 2, 2, 2, 1, 2),
expand_ratios=(1, (4, 3), 3, (6, 2.5, 2.3, 2.3), 6, 6),
squeeze_ratios=(1, 1, 0.25, 1, 0.25, 0.25))
BACKBONES.register(
'mobilenet_v3_small', MobileNetV3,
dims=(16,) + (16, 24, 40, 48, 96),
depths=(1, 2, 3, 2, 3),
kernel_sizes=(3, 3, 5, 5, 5),
strides=(2, 2, 2, 1, 2),
expand_ratios=(1, (4.5, 88. / 24), (4, 6, 6), 3, 6),
squeeze_ratios=(0.25, 1, 0.25, 0.25, 0.25))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""ResNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.core.engine.utils import freeze_module
from seetadet.models.build import BACKBONES
from seetadet.ops.build import build_norm
class BasicBlock(nn.Module):
"""The basic resnet block."""
expansion = 1
def __init__(self, dim_in, dim, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(dim_in, dim, 3, stride, padding=1, bias=False)
self.bn1 = build_norm(dim, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.conv2 = nn.Conv2d(dim, dim, 3, padding=1, bias=False)
self.bn2 = build_norm(dim, cfg.BACKBONE.NORM)
self.downsample = downsample
def forward(self, x):
shortcut = x
x = self.relu(self.bn1(self.conv1(x)))
x = self.bn2(self.conv2(x))
if self.downsample is not None:
shortcut = self.downsample(shortcut)
return self.relu(x.add_(shortcut))
class Bottleneck(nn.Module):
"""The bottleneck resnet block."""
expansion = 4
groups, width_per_group = 1, 64
def __init__(self, dim_in, dim, stride=1, downsample=None):
super(Bottleneck, self).__init__()
width = int(dim * (self.width_per_group / 64.)) * self.groups
self.conv1 = nn.Conv2d(dim_in, width, 1, bias=False)
self.bn1 = build_norm(width, cfg.BACKBONE.NORM)
self.conv2 = nn.Conv2d(width, width, 3, stride, padding=1, bias=False)
self.bn2 = build_norm(width, cfg.BACKBONE.NORM)
self.conv3 = nn.Conv2d(width, dim * self.expansion, 1, bias=False)
self.bn3 = build_norm(dim * self.expansion, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.downsample = downsample
def forward(self, x):
shortcut = x
x = self.relu(self.bn1(self.conv1(x)))
x = self.relu(self.bn2(self.conv2(x)))
x = self.bn3(self.conv3(x))
if self.downsample is not None:
shortcut = self.downsample(shortcut)
return self.relu(x.add_(shortcut))
class ResNet(nn.Module):
"""ResNet class."""
def __init__(self, block, depths, stride_in_1x1=False):
super(ResNet, self).__init__()
dim_in, dims, blocks = 64, [64, 128, 256, 512], []
self.out_indices = [v - 1 for v in itertools.accumulate(depths)]
self.out_dims = [dim_in] + [v * block.expansion for v in dims]
self.conv1 = nn.Conv2d(3, dim_in, 7, 2, padding=3, bias=False)
self.bn1 = build_norm(dim_in, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.maxpool = nn.MaxPool2d(3, 2, padding=1)
for i, depth, dim in zip(range(4), depths, dims):
downsample, stride = None, 1 if i == 0 else 2
if stride != 1 or dim_in != dim * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(dim_in, dim * block.expansion, 1, stride, bias=False),
build_norm(dim * block.expansion, cfg.BACKBONE.NORM))
blocks.append(block(dim_in, dim, stride, downsample))
if isinstance(blocks[-1], Bottleneck) and stride_in_1x1:
blocks[-1].conv1.stride = (stride, stride)
blocks[-1].conv2.stride = (1, 1)
dim_in = dim * block.expansion
for _ in range(depth - 1):
blocks.append(block(dim_in, dim))
setattr(self, 'layer%d' % (i + 1), nn.Sequential(*blocks[-depth:]))
self.blocks = blocks
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu')
num_freeze_stages = cfg.BACKBONE.FREEZE_AT
if num_freeze_stages > 0:
self.conv1.apply(freeze_module)
self.bn1.apply(freeze_module)
for i in range(num_freeze_stages - 1, 0, -1):
getattr(self, 'layer%d' % i).apply(freeze_module)
def forward(self, x):
x = self.relu(self.bn1(self.conv1(x)))
x = self.maxpool(x)
outputs = [None]
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(x)
return outputs
class ResNetV1a(ResNet):
"""ResNet with stride in bottleneck 1x1 convolution."""
def __init__(self, block, depths):
super(ResNetV1a, self).__init__(block, depths, stride_in_1x1=True)
BACKBONES.register('resnet18', ResNet, block=BasicBlock, depths=[2, 2, 2, 2])
BACKBONES.register('resnet34', ResNet, block=BasicBlock, depths=[3, 4, 6, 3])
BACKBONES.register('resnet50', ResNet, block=Bottleneck, depths=[3, 4, 6, 3])
BACKBONES.register('resnet101', ResNet, block=Bottleneck, depths=[3, 4, 23, 3])
BACKBONES.register('resnet50_v1a', ResNetV1a, block=Bottleneck, depths=[3, 4, 6, 3])
BACKBONES.register('resnet101_v1a', ResNetV1a, block=Bottleneck, depths=[3, 4, 23, 3])
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""VGGNet backbone."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import itertools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import BACKBONES
from seetadet.ops.build import build_norm
from seetadet.ops.normalization import L2Norm
class VGGBlock(nn.Module):
"""The VGG block."""
def __init__(self, dim_in, dim, downsample=None):
super(VGGBlock, self).__init__()
self.conv = nn.Conv2d(dim_in, dim, 3, padding=1,
bias=not cfg.BACKBONE.NORM)
self.bn = build_norm(dim, cfg.BACKBONE.NORM)
self.relu = nn.ReLU(True)
self.downsample = downsample
def forward(self, x):
if self.downsample is not None:
x = self.downsample(x)
return self.relu(self.bn(self.conv(x)))
class VGG(nn.Module):
"""VGGNet."""
def __init__(self, depths):
super(VGG, self).__init__()
dim_in, dims, blocks = 3, [64, 128, 256, 512, 512], []
self.out_indices = [v - 1 for v in itertools.accumulate(depths)][1:]
self.out_dims = dims[1:]
for i, (depth, dim) in enumerate(zip(depths, dims)):
downsample = nn.MaxPool2d(2, 2, ceil_mode=True) if i > 0 else None
blocks.append(VGGBlock(dim_in, dim, downsample))
for _ in range(depth - 1):
blocks.append(VGGBlock(dim, dim))
setattr(self, 'layer%d' % i, nn.Sequential(*blocks[-depth:]))
dim_in = dim
self.blocks = blocks
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu')
def forward(self, x):
outputs = []
for i, blk in enumerate(self.blocks):
x = blk(x)
if i in self.out_indices:
outputs.append(x)
return outputs
class VGGFCN(VGG):
"""Fully convolutional VGGNet in SSD."""
def __init__(self, depths):
super(VGGFCN, self).__init__(depths)
dim_in, out_index = self.out_dims[-1], self.out_indices[-1]
self.blocks.append(nn.Sequential(
nn.MaxPool2d(3, padding=1),
nn.Conv2d(dim_in, 1024, 3, padding=6, dilation=6),
nn.ReLU(True)))
self.blocks.append(nn.Sequential(nn.Conv2d(1024, 1024, 1), nn.ReLU(True)))
self.layer4.add_module(str(len(self.layer4)), self.blocks[-2])
self.layer4.add_module(str(len(self.layer4)), self.blocks[-1])
self.out_dims = [self.out_dims[-2], 1024] # conv4_3, fc7
self.out_indices = [self.out_indices[-2], out_index + 2] # 9, 14
self.norm = L2Norm(dim_in, init=20.0)
def forward(self, x):
outputs = super(VGGFCN, self).forward(x)
outputs[0] = self.norm(outputs[0])
return outputs
BACKBONES.register('vgg16', VGG, depths=(2, 2, 3, 3, 3))
BACKBONES.register('vgg16_fcn', VGGFCN, depths=(2, 2, 3, 3, 3))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for models."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.core.registry import Registry
from seetadet.core.engine.utils import get_device
BACKBONES = Registry('backbones')
NECKS = Registry('necks')
DETECTORS = Registry('detectors')
def build_backbone():
"""Build the backbone."""
backbone_types = cfg.BACKBONE.TYPE.split('.')
backbone = BACKBONES.get(backbone_types[0])()
backbone_dims = backbone.out_dims
neck = nn.Identity()
if len(backbone_types) > 1:
neck = NECKS.get(backbone_types[1])(backbone_dims)
else:
neck.out_dims = backbone_dims
return backbone, neck
def build_detector(device=None, weights=None, training=False):
"""Create a detector instance.
Parameters
----------
device : int, optional
The index of compute device.
weights : str, optional
The path of weight file.
training : bool, optional, default=False
Return a training detector or not.
"""
model = DETECTORS.get(cfg.MODEL.TYPE)()
if model is None:
raise ValueError('Unknown detector: ' + cfg.MODEL.TYPE)
if weights is not None:
model.load_weights(weights, strict=True)
if device is not None:
model.to(device=get_device(device))
if not training:
model.eval()
model.optimize_for_inference()
return model
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import autograd
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.anchors.rpn import AnchorGenerator
class RetinaNetDecoder(nn.Module):
"""Decode predictions from retinanet."""
def __init__(self):
super(RetinaNetDecoder, self).__init__()
self.anchor_generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS,
scales_per_octave=3)
self.pre_nms_topk = cfg.RETINANET.PRE_NMS_TOPK
self.score_thresh = float(cfg.TEST.SCORE_THRESH)
def forward(self, inputs):
input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
return autograd.Function.apply(
'RetinaNetDecoder',
inputs['cls_score'].device,
inputs=[inputs[k] for k in input_tags],
strides=self.anchor_generator.strides,
ratios=self.anchor_generator.aspect_ratios[0],
scales=self.anchor_generator.scales[0],
pre_nms_topk=self.pre_nms_topk,
score_thresh=self.score_thresh,
)
autograd.Function.register(
'RetinaNetDecoder', lambda **kwargs: {
'strides': kwargs.get('strides', []),
'ratios': kwargs.get('ratios', []),
'scales': kwargs.get('scales', []),
'pre_nms_topk': kwargs.get('pre_nms_topk', 1000),
'score_thresh': kwargs.get('score_thresh', 0.05),
'check_device': False,
})
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RPN decoder."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import autograd
from dragon.vm.torch import nn
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.anchors.rpn import AnchorGenerator
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.bbox import filter_empty_boxes
from seetadet.utils.nms import gpu_nms
class RPNDecoder(nn.Module):
"""Generate proposal regions from RPN."""
def __init__(self):
super(RPNDecoder, self).__init__()
self.anchor_generator = AnchorGenerator(
strides=cfg.ANCHOR_GENERATOR.STRIDES,
sizes=cfg.ANCHOR_GENERATOR.SIZES,
aspect_ratios=cfg.ANCHOR_GENERATOR.ASPECT_RATIOS)
self.min_level = cfg.FAST_RCNN.MIN_LEVEL
self.max_level = cfg.FAST_RCNN.MAX_LEVEL
self.pre_nms_topk = {True: cfg.RPN.PRE_NMS_TOPK_TRAIN,
False: cfg.RPN.PRE_NMS_TOPK_TEST}
self.post_nms_topk = {True: cfg.RPN.POST_NMS_TOPK_TRAIN,
False: cfg.RPN.POST_NMS_TOPK_TEST}
self.nms_thresh = float(cfg.RPN.NMS_THRESH)
def decode_proposals(self, scores, deltas, anchors, im_info):
# Select top-K anchors.
pre_nms_topk = self.pre_nms_topk[self.training]
if pre_nms_topk <= 0 or pre_nms_topk >= len(scores):
order = np.argsort(-scores.squeeze())
else:
inds = np.argpartition(-scores.squeeze(), pre_nms_topk)[:pre_nms_topk]
order = np.argsort(-scores[inds].squeeze())
order = inds[order]
scores, deltas, anchors = scores[order], deltas[order], anchors[order]
# Convert anchors into proposals.
proposals = bbox_transform_inv(anchors, deltas)
proposals = clip_boxes(proposals, im_info[:2])
keep = filter_empty_boxes(proposals)
if len(proposals) != len(keep):
proposals, scores = proposals[keep], scores[keep]
# Apply NMS.
proposals = np.hstack((proposals, scores))
keep = gpu_nms(proposals, self.nms_thresh)
return proposals[keep, :].astype('float32', copy=False)
def forward_train(self, inputs):
shapes = [x[:2] for x in inputs['grid_info']]
anchors = self.anchor_generator.get_anchors(shapes)
cls_score = inputs['cls_score'].numpy()
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1).numpy()
all_rois, batch_size = [], cls_score.shape[0]
lvl_slices, lvl_start = [], 0
post_nms_topk = self.post_nms_topk[self.training]
for shape in shapes:
num_anchors = self.anchor_generator.num_anchors([shape])
lvl_slices.append(slice(lvl_start, lvl_start + num_anchors))
lvl_start = lvl_start + num_anchors
for batch_ind in range(batch_size):
scores = cls_score[batch_ind].reshape((-1, 1))
deltas = bbox_pred[batch_ind]
im_info = inputs['im_info'][batch_ind]
all_proposals = []
for lvl_slice in lvl_slices:
all_proposals.append(self.decode_proposals(
scores[lvl_slice], deltas[lvl_slice],
anchors[lvl_slice], im_info))
proposals = np.concatenate(all_proposals)
proposals, scores = proposals[:, :4], proposals[:, -1]
if post_nms_topk > 0:
keep = np.argsort(-scores)[:post_nms_topk]
proposals = proposals[keep, :]
batch_inds = np.full((proposals.shape[0], 1), batch_ind, 'float32')
all_rois.append(np.hstack((batch_inds, proposals)))
return np.concatenate(all_rois)
def forward(self, inputs):
if self.training:
return self.forward_train(inputs)
input_tags = ['cls_score', 'bbox_pred', 'im_info', 'grid_info']
return autograd.Function.apply(
'RPNDecoder',
inputs['cls_score'].device,
inputs=[inputs[k] for k in input_tags],
outputs=[None] * (self.max_level - self.min_level + 1),
strides=self.anchor_generator.strides,
ratios=self.anchor_generator.aspect_ratios[0],
scales=self.anchor_generator.scales[0],
min_level=self.min_level,
max_level=self.max_level,
pre_nms_topk=self.pre_nms_topk[False],
post_nms_topk=self.post_nms_topk[False],
nms_thresh=self.nms_thresh,
)
autograd.Function.register(
'RPNDecoder', lambda **kwargs: {
'strides': kwargs.get('strides', []),
'ratios': kwargs.get('ratios', []),
'scales': kwargs.get('scales', []),
'pre_nms_topk': kwargs.get('pre_nms_topk', 1000),
'post_nms_topk': kwargs.get('post_nms_topk', 1000),
'nms_thresh': kwargs.get('nms_thresh', 0.7),
'min_level': kwargs.get('min_level', 2),
'max_level': kwargs.get('max_level', 5),
'check_device': False,
})
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import math
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.targets.retinanet import AnchorTargets
from seetadet.ops.build import build_activation
from seetadet.ops.build import build_loss
from seetadet.ops.build import build_norm
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.fusion import fuse_conv_bn
class RetinaNetHead(nn.Module):
"""RetinaNet head."""
def __init__(self, in_dims):
super(RetinaNetHead, self).__init__()
conv_module = functools.partial(
ConvNorm2d, dim_in=in_dims[0], dim_out=in_dims[0],
kernel_size=3, conv_type=cfg.RETINANET.CONV)
norm_module = functools.partial(build_norm, norm_type=cfg.RETINANET.NORM)
self.conv_module = conv_module
self.dim_cls = len(cfg.MODEL.CLASSES) - 1
self.cls_conv = nn.ModuleList(
conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
self.bbox_conv = nn.ModuleList(
conv_module() for _ in range(cfg.RETINANET.NUM_CONV))
self.cls_norm = nn.ModuleList()
self.bbox_norm = nn.ModuleList()
for _ in range(len(self.cls_conv)):
self.cls_norm.append(nn.ModuleList())
self.bbox_norm.append(nn.ModuleList())
for _ in range(len(in_dims)):
self.cls_norm[-1].append(norm_module(in_dims[0]))
self.bbox_norm[-1].append(norm_module(in_dims[0]))
self.targets = AnchorTargets()
num_anchors = self.targets.generator.num_cell_anchors(0)
self.cls_score = conv_module(dim_out=self.dim_cls * num_anchors)
self.bbox_pred = conv_module(dim_out=4 * num_anchors)
self.activation = build_activation(cfg.RETINANET.ACTIVATION, inplace=True)
self.cls_loss = build_loss('sigmoid_focal')
self.bbox_loss = build_loss(cfg.RETINANET.BBOX_REG_LOSS_TYPE, beta=0.1)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.01)
# Bias prior initialization for focal loss.
for name, param in self.cls_score.named_parameters():
if name.endswith('bias'):
nn.init.constant_(param, -math.log((1 - 0.01) / 0.01))
def optimize_for_inference(self):
"""Optimize modules for inference."""
if hasattr(self.cls_norm[0][0], 'momentum'):
cls_conv = nn.ModuleList()
bbox_conv = nn.ModuleList()
for i in range(len(self.cls_norm)):
cls_conv.append(nn.ModuleList())
bbox_conv.append(nn.ModuleList())
cls_state = self.cls_conv[i].state_dict()
bbox_state = self.bbox_conv[i].state_dict()
for j in range(len(self.cls_norm[i])):
cls_conv[i].append(self.conv_module()._apply(
lambda t: t.to(self.cls_norm[i][j].weight.device)))
bbox_conv[i].append(self.conv_module()._apply(
lambda t: t.to(self.bbox_norm[i][j].weight.device)))
cls_conv[i][j].load_state_dict(cls_state)
bbox_conv[i][j].load_state_dict(bbox_state)
fuse_conv_bn(cls_conv[i][j][-1], self.cls_norm[i][j])
fuse_conv_bn(bbox_conv[i][j][-1], self.bbox_norm[i][j])
self._modules['cls_conv'] = cls_conv
self._modules['bbox_conv'] = bbox_conv
def get_outputs(self, inputs):
"""Return the outputs."""
features = list(inputs['features'])
cls_score, bbox_pred = [], []
for j, feature in enumerate(features):
cls_input, box_input = feature, feature
for i in range(len(self.cls_conv)):
if isinstance(self.cls_conv[i], nn.ModuleList):
cls_input = self.cls_conv[i][j](cls_input)
box_input = self.bbox_conv[i][j](box_input)
else:
cls_input = self.cls_conv[i](cls_input)
box_input = self.bbox_conv[i](box_input)
cls_input = self.activation(self.cls_norm[i][j](cls_input))
box_input = self.activation(self.bbox_norm[i][j](box_input))
cls_score.append(self.cls_score(cls_input).reshape_((0, self.dim_cls, -1)))
bbox_pred.append(self.bbox_pred(box_input).reshape_((0, 4, -1)))
cls_score = torch.cat(cls_score, 2) if len(features) > 1 else cls_score[0]
bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
"""Return the losses."""
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
targets['bbox_anchors'])
normalizer = targets['bbox_inds'].size(0)
cls_loss_weight = 1.0 / normalizer
bbox_loss_weight = cfg.RETINANET.BBOX_REG_LOSS_WEIGHT / normalizer
cls_loss = cls_loss.mul_(cls_loss_weight)
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
def forward(self, inputs):
outputs = self.get_outputs(inputs)
if self.training:
targets = self.targets.compute(**inputs)
logits = {'cls_score': outputs['cls_score'].float(),
'bbox_pred': outputs['bbox_pred'].float()}
return self.get_losses(logits, targets)
else:
cls_score = outputs['cls_score'].permute(0, 2, 1)
cls_score = nn.functional.sigmoid(cls_score, inplace=True)
return {'cls_score': cls_score.float(),
'bbox_pred': outputs['bbox_pred'].float()}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RPN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.targets.rpn import AnchorTargets
from seetadet.ops.build import build_loss
class RPNHead(nn.Module):
"""RPN head."""
def __init__(self, in_dims):
super(RPNHead, self).__init__()
self.targets = AnchorTargets()
dim, num_anchors = in_dims[0], self.targets.generator.num_cell_anchors(0)
self.output_conv = nn.ModuleList(nn.Conv2d(
dim, dim, 3, padding=1) for _ in range(cfg.RPN.NUM_CONV))
self.cls_score = nn.Conv2d(dim, num_anchors, 1)
self.bbox_pred = nn.Conv2d(dim, num_anchors * 4, 1)
self.activation = nn.ReLU(inplace=True)
self.cls_loss = nn.BCEWithLogitsLoss(reduction='mean')
self.bbox_loss = build_loss(cfg.RPN.BBOX_REG_LOSS_TYPE, beta=0.1)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.01)
def get_outputs(self, inputs):
"""Return the outputs."""
features = list(inputs['features'])
cls_score, bbox_pred = [], []
for x in features:
for conv in self.output_conv:
x = self.activation(conv(x))
cls_score.append(self.cls_score(x).reshape_((0, -1)))
bbox_pred.append(self.bbox_pred(x).reshape_((0, 4, -1)))
cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
bbox_pred = torch.cat(bbox_pred, 2) if len(features) > 1 else bbox_pred[0]
return {'rpn_cls_score': cls_score, 'rpn_bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
"""Return the losses."""
bbox_pred = inputs['bbox_pred'].permute(0, 2, 1)
bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
cls_score = inputs['cls_score'].flatten(0, 1)[targets['cls_inds']]
cls_loss = self.cls_loss(cls_score, targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
targets['bbox_anchors'])
normalizer = cfg.RPN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
bbox_loss_weight = cfg.RPN.BBOX_REG_LOSS_WEIGHT / normalizer
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'rpn_cls_loss': cls_loss, 'rpn_bbox_loss': bbox_loss}
def forward(self, inputs):
outputs = self.get_outputs(inputs)
outputs['rpn_bbox_pred'] = outputs['rpn_bbox_pred'].float()
outputs['rpn_cls_score'] = outputs['rpn_cls_score'].float()
if self.training:
targets = self.targets.compute(**inputs)
rpn_cls_score = outputs.pop('rpn_cls_score')
outputs['rpn_cls_score'] = rpn_cls_score.data
logits = {'cls_score': rpn_cls_score,
'bbox_pred': outputs['rpn_bbox_pred']}
outputs.update(self.get_losses(logits, targets))
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.data.targets.ssd import AnchorTargets
from seetadet.ops.build import build_loss
from seetadet.ops.conv import ConvNorm2d
class SSDHead(nn.Module):
"""SSD head."""
def __init__(self, in_dims):
super(SSDHead, self).__init__()
self.targets = AnchorTargets()
self.cls_score = nn.ModuleList()
self.bbox_pred = nn.ModuleList()
self.num_classes = len(cfg.MODEL.CLASSES)
conv_module = nn.Conv2d
if cfg.FPN.CONV == 'SepConv2d':
conv_module = functools.partial(ConvNorm2d, conv_type='SepConv2d')
conv_module = functools.partial(conv_module, kernel_size=3, padding=1)
for i, dim in enumerate(in_dims):
num_anchors = self.targets.generator.num_cell_anchors(i)
self.cls_score.append(conv_module(dim, num_anchors * self.num_classes))
self.bbox_pred.append(conv_module(dim, num_anchors * 4))
self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='sum')
self.bbox_loss = build_loss(cfg.SSD.BBOX_REG_LOSS_TYPE)
def get_outputs(self, inputs):
"""Return the outputs."""
features = list(inputs['features'])
cls_score, bbox_pred = [], []
for i, x in enumerate(features):
cls_score.append(self.cls_score[i](x).permute(0, 2, 3, 1).flatten_(1))
bbox_pred.append(self.bbox_pred[i](x).permute(0, 2, 3, 1).flatten_(1))
cls_score = torch.cat(cls_score, 1) if len(features) > 1 else cls_score[0]
bbox_pred = torch.cat(bbox_pred, 1) if len(features) > 1 else bbox_pred[0]
cls_score = cls_score.reshape_((0, -1, self.num_classes))
bbox_pred = bbox_pred.reshape_((0, -1, 4))
return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
"""Return the losses."""
cls_score = inputs['cls_score'].flatten_(0, 1)
bbox_pred = inputs['bbox_pred'].flatten_(0, 1)
bbox_pred = bbox_pred[targets['bbox_inds']]
cls_loss = self.cls_loss(cls_score, targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'],
targets['bbox_anchors'])
normalizer = targets['bbox_inds'].size(0)
cls_loss_weight = 1.0 / normalizer
bbox_loss_weight = cfg.SSD.BBOX_REG_LOSS_WEIGHT / normalizer
cls_loss = cls_loss.mul_(cls_loss_weight)
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
def forward(self, inputs):
outputs = self.get_outputs(inputs)
cls_score = outputs['cls_score']
if self.training:
cls_score_data = nn.functional.softmax(cls_score.data, dim=2)
targets = self.targets.compute(cls_score=cls_score_data, **inputs)
logits = {'cls_score': cls_score.float(),
'bbox_pred': outputs['bbox_pred'].float()}
return self.get_losses(logits, targets)
else:
cls_score = nn.functional.softmax(cls_score, dim=2, inplace=True)
return {'cls_score': cls_score.float(),
'bbox_pred': outputs['bbox_pred'].float()}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Detectors."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.models.detectors.detector import Detector
from seetadet.models.detectors.rcnn import CascadeRCNN
from seetadet.models.detectors.rcnn import FasterRCNN
from seetadet.models.detectors.rcnn import MaskRCNN
from seetadet.models.detectors.retinanet import RetinaNet
from seetadet.models.detectors.ssd import SSD
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Base detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import build_backbone
from seetadet.ops.fusion import get_fusion
from seetadet.ops.normalization import ToTensor
from seetadet.utils import logging
class Detector(nn.Module):
"""Class to build and compute the detection pipelines."""
def __init__(self):
super(Detector, self).__init__()
self.to_tensor = ToTensor()
self.backbone, self.neck = build_backbone()
self.backbone_dims = self.neck.out_dims
def get_inputs(self, inputs):
"""Return the detection inputs.
Parameters
----------
inputs : dict, optional
The optional inputs.
"""
inputs['img'] = self.to_tensor(inputs['img'], normalize=True)
return inputs
def get_features(self, inputs):
"""Return the detection features.
Parameters
----------
inputs : dict
The inputs.
"""
return self.neck(self.backbone(inputs['img']))
def get_outputs(self, inputs):
"""Return the detection outputs.
Parameters
----------
inputs : dict
The inputs.
"""
return inputs
def forward(self, inputs):
"""Define the computation performed at every call.
Parameters
----------
inputs : dict
The inputs.
"""
return self.get_outputs(inputs)
def load_weights(self, weights, strict=False):
"""Load the state dict of this detector.
Parameters
----------
weights : str
The path of the weights file.
"""
return self.load_state_dict(torch.load(weights), strict=strict)
def optimize_for_inference(self):
"""Optimize the graph for the inference."""
# Set precision.
precision = cfg.MODEL.PRECISION.lower()
self.half() if precision == 'float16' else self.float()
logging.info('Set precision: ' + precision)
# Fuse modules.
fusion_memo, last_module = set(), None
for module in self.modules():
if module is self:
continue
if hasattr(module, 'optimize_for_inference'):
module.optimize_for_inference()
fusion_memo.add(module.__class__.__name__)
continue
key, fn = get_fusion(last_module, module)
if fn is not None:
fusion_memo.add(key)
fn(last_module, module)
last_module = module
for key in fusion_memo:
logging.info('Fuse modules: ' + key)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""R-CNN detectors."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
import numpy as np
from seetadet.core.config import cfg
from seetadet.data.targets.rcnn import ProposalTargets
from seetadet.models.build import DETECTORS
from seetadet.models.decoders.rpn import RPNDecoder
from seetadet.models.dense_heads.rpn import RPNHead
from seetadet.models.detectors.detector import Detector
from seetadet.models.roi_heads.fast_rcnn import FastRCNNHead
from seetadet.models.roi_heads.mask_rcnn import MaskRCNNHead
from seetadet.utils.bbox import bbox_transform_inv
@DETECTORS.register('faster_rcnn')
class FasterRCNN(Detector):
"""Faster R-CNN detector."""
def __init__(self):
super(FasterRCNN, self).__init__()
self.rpn_head = RPNHead(self.backbone_dims)
self.bbox_head = FastRCNNHead(self.backbone_dims)
self.rpn_decoder = RPNDecoder()
self.proposal_targets = ProposalTargets()
def get_outputs(self, inputs):
"""Return the detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs = self.rpn_head(inputs)
inputs['rois'] = self.rpn_decoder({
'cls_score': outputs.pop('rpn_cls_score'),
'bbox_pred': outputs.pop('rpn_bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
if self.training:
targets = self.proposal_targets.compute(**inputs)
inputs['rois'] = targets['rois']
outputs.update(self.bbox_head(inputs, targets))
else:
outputs.update(self.bbox_head(inputs))
return outputs
@DETECTORS.register('mask_rcnn')
class MaskRCNN(Detector):
"""Mask R-CNN detector."""
def __init__(self):
super(MaskRCNN, self).__init__()
self.rpn_head = RPNHead(self.backbone_dims)
self.bbox_head = FastRCNNHead(self.backbone_dims)
self.mask_head = MaskRCNNHead(self.backbone_dims)
self.rpn_decoder = RPNDecoder()
self.proposal_targets = ProposalTargets()
def get_outputs(self, inputs):
"""Return the detection outputs."""
inputs, outputs = self.get_inputs(inputs), {}
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs.update(self.rpn_head(inputs))
inputs['rois'] = self.rpn_decoder({
'cls_score': outputs.pop('rpn_cls_score'),
'bbox_pred': outputs.pop('rpn_bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
if self.training:
targets = self.proposal_targets.compute(**inputs)
inputs['rois'] = targets.pop('rois')
outputs.update(self.bbox_head(inputs, targets))
inputs['rois'] = targets.pop('fg_rois')
outputs.update(self.mask_head(inputs, targets))
else:
outputs.update(self.bbox_head(inputs))
self.outputs = {'features': inputs['features']}
return outputs
@DETECTORS.register('cascade_rcnn')
class CascadeRCNN(Detector):
"""Cascade R-CNN detector."""
def __init__(self):
super(CascadeRCNN, self).__init__()
self.cascade_ious = cfg.CASCADE_RCNN.POSITIVE_OVERLAP
self.bbox_reg_weights = cfg.CASCADE_RCNN.BBOX_REG_WEIGHTS
self.rpn_head = RPNHead(self.backbone_dims)
self.bbox_heads = nn.ModuleList(FastRCNNHead(self.backbone_dims)
for _ in range(len(self.cascade_ious)))
if cfg.CASCADE_RCNN.MASK_ON:
self.mask_head = MaskRCNNHead(self.backbone_dims)
else:
self.mask_head = None
self.rpn_decoder = RPNDecoder()
self.proposal_targets = ProposalTargets()
def get_outputs(self, inputs):
"""Return the detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs = self.rpn_head(inputs)
inputs['rois'] = self.rpn_decoder({
'cls_score': outputs.pop('rpn_cls_score'),
'bbox_pred': outputs.pop('rpn_bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
if self.training:
assigner = self.proposal_targets.assigner
outputs['cls_loss'], outputs['bbox_loss'] = [], []
mask_targets = {}
for i, bbox_head in enumerate(self.bbox_heads):
assigner.pos_iou_thr = assigner.neg_iou_thr = self.cascade_ious[i]
self.proposal_targets.bbox_reg_weights = self.bbox_reg_weights[i]
targets = self.proposal_targets.compute(**inputs)
if self.mask_head is not None and 'gt_segms' in inputs:
inputs.pop('gt_segms')
for k in ('fg_rois', 'mask_inds', 'mask_targets'):
mask_targets[k] = targets.pop(k)
proposals, inputs['rois'] = targets['proposals'], targets['rois']
outputs_i = bbox_head(inputs, targets)
outputs['cls_loss'].append(outputs_i['cls_loss'])
outputs['bbox_loss'].append(outputs_i['bbox_loss'])
if i < len(self.bbox_heads) - 1:
boxes = bbox_transform_inv(
proposals[:, 1:5], outputs_i['bbox_pred'].numpy(),
weights=self.bbox_reg_weights[i])
inputs['rois'] = np.hstack((proposals[:, :1], boxes))
if self.mask_head is not None:
inputs['rois'] = mask_targets.pop('fg_rois')
outputs.update(self.mask_head(inputs, mask_targets))
else:
outputs.update(self.bbox_heads[0](inputs))
self.outputs = {'features': inputs['features'], 'rois': inputs['rois']}
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.models.build import DETECTORS
from seetadet.models.decoders.retinanet import RetinaNetDecoder
from seetadet.models.dense_heads.retinanet import RetinaNetHead
from seetadet.models.detectors.detector import Detector
@DETECTORS.register('retinanet')
class RetinaNet(Detector):
"""RetinaNet detector."""
def __init__(self):
super(RetinaNet, self).__init__()
self.bbox_head = RetinaNetHead(self.backbone_dims)
self.bbox_decoder = RetinaNetDecoder()
def get_outputs(self, inputs):
"""Compute detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
inputs['grid_info'] = inputs.pop(
'grid_info', [x.shape[-2:] for x in inputs['features']])
outputs = self.bbox_head(inputs)
if not self.training:
outputs['dets'] = self.bbox_decoder({
'cls_score': outputs.pop('cls_score'),
'bbox_pred': outputs.pop('bbox_pred'),
'im_info': inputs['im_info'],
'grid_info': inputs['grid_info']})
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD detector."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.models.build import DETECTORS
from seetadet.models.dense_heads.ssd import SSDHead
from seetadet.models.detectors.detector import Detector
@DETECTORS.register('ssd')
class SSD(Detector):
"""SSD detector."""
def __init__(self):
super(SSD, self).__init__()
self.bbox_head = SSDHead(self.backbone_dims)
def get_outputs(self, inputs=None):
"""Compute detection outputs."""
inputs = self.get_inputs(inputs)
inputs['features'] = self.get_features(inputs)
outputs = self.bbox_head(inputs)
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Necks."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# Modules
from seetadet.models.necks import bifpn
from seetadet.models.necks import fpn
from seetadet.models.necks import ssd
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""BiFPN neck."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import NECKS
from seetadet.ops.build import build_activation
from seetadet.ops.conv import ConvNorm2d
class FuseOp(nn.Module):
"""Operator to fuse input features."""
def __init__(self, num_inputs):
super(FuseOp, self).__init__()
self.fuse_type = cfg.FPN.FUSE_TYPE
if self.fuse_type == 'weighted':
self.weight = nn.Parameter(torch.ones(num_inputs))
def forward(self, *inputs):
if self.fuse_type == 'weighted':
weights = nn.functional.softmax(self.weight, dim=0).split(1)
outputs = inputs[0] * weights[0]
for x, w in zip(inputs[1:], weights[1:]):
outputs += x * w
else:
outputs = inputs[0]
for x in inputs[1:]:
outputs += x
return outputs
class Block(nn.Module):
"""BiFPN block."""
def __init__(self, in_dims=None):
super(Block, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FPN.NORM, conv_type=cfg.FPN.CONV)
self.dim = cfg.FPN.DIM
self.min_lvl = cfg.FPN.MIN_LEVEL
self.max_lvl = cfg.FPN.MAX_LEVEL
self.highest_lvl = min(self.max_lvl, len(in_dims))
self.coarsest_stride = cfg.BACKBONE.COARSEST_STRIDE
self.output_conv1, self.output_fuse1 = nn.ModuleList(), nn.ModuleList()
self.output_conv2, self.output_fuse2 = nn.ModuleList(), nn.ModuleList()
for lvl in range(self.min_lvl, self.max_lvl):
self.output_conv1 += [conv_module(self.dim, self.dim, 3)]
self.output_conv2 += [conv_module(self.dim, self.dim, 3)]
self.output_fuse1 += [FuseOp(2)]
self.output_fuse2 += [FuseOp(3 if lvl < self.max_lvl - 1 else 2)]
self.activation = build_activation(cfg.FPN.ACTIVATION, inplace=True)
def forward(self, laterals1, laterals2=None):
outputs = [laterals1[-1]]
for i in range(len(laterals1) - 1, 0, -1):
x1, x2 = outputs[0], laterals1[i - 1]
scale = 2 if self.coarsest_stride > 1 else None
size = None if self.coarsest_stride > 1 else x2.shape[2:]
x1 = nn.functional.interpolate(x1, size, scale)
y = self.output_fuse1[i - 1](x1, x2)
outputs.insert(0, self.output_conv1[i - 1](self.activation(y)))
if laterals2 is None:
laterals2 = laterals1[1:]
else:
laterals2 += laterals1[self.highest_lvl - self.min_lvl + 1:]
for i in range(1, len(outputs)):
x1, x2 = outputs[i - 1], laterals2[i - 1]
x1 = nn.functional.max_pool2d(x1, 3, 2, padding=1)
if i < len(outputs) - 1:
y = self.output_fuse2[i - 1](x1, x2, outputs[i])
else:
y = self.output_fuse2[i - 1](x1, x2)
outputs[i] = self.output_conv2[i - 1](self.activation(y))
return outputs
@NECKS.register('bifpn')
class BiFPN(nn.Module):
"""BiFPN to enhance input features."""
def __init__(self, in_dims=None):
super(BiFPN, self).__init__()
conv_module = functools.partial(ConvNorm2d, norm_type=cfg.FPN.NORM)
self.dim = cfg.FPN.DIM
self.min_lvl = cfg.FPN.MIN_LEVEL
self.max_lvl = cfg.FPN.MAX_LEVEL
self.highest_lvl = min(self.max_lvl, len(in_dims))
self.out_dims = [self.dim] * (self.max_lvl - self.min_lvl + 1)
self.lateral_conv1 = nn.ModuleList()
self.lateral_conv2 = nn.ModuleList()
for dim in in_dims[self.min_lvl - 1:self.highest_lvl]:
self.lateral_conv1 += [conv_module(dim, self.dim, 1)]
for dim in in_dims[self.min_lvl:self.highest_lvl]:
self.lateral_conv2 += [conv_module(dim, self.dim, 1)]
for lvl in range(self.highest_lvl + 1, self.max_lvl + 1):
dim = in_dims[-1] if lvl == self.highest_lvl + 1 else self.dim
self.lateral_conv1 += [conv_module(dim, self.dim, 1)
if lvl == self.highest_lvl + 1 else nn.Identity()]
self.blocks = nn.ModuleList(Block(in_dims) for _ in range(cfg.FPN.NUM_BLOCKS))
def forward(self, features):
features = features[self.min_lvl - 1:self.highest_lvl]
laterals1 = [conv(x) for conv, x in zip(self.lateral_conv1, features)]
laterals2 = [conv(x) for conv, x in zip(self.lateral_conv2, features[1:])]
x = features[-1]
for i in range(len(laterals1), len(self.out_dims)):
x = self.lateral_conv1[i](x)
x = nn.functional.max_pool2d(x, 3, 2, padding=1)
laterals1.append(x)
for i, blk in enumerate(self.blocks):
laterals1 = blk(laterals1, laterals2 if i == 0 else None)
return laterals1
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""FPN neck."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import NECKS
from seetadet.ops.conv import ConvNorm2d
@NECKS.register('fpn')
class FPN(nn.Module):
"""FPN to enhance input features."""
def __init__(self, in_dims):
super(FPN, self).__init__()
lateral_conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FPN.NORM)
output_conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FPN.NORM, conv_type=cfg.FPN.CONV)
self.dim = cfg.FPN.DIM
self.min_lvl = cfg.FPN.MIN_LEVEL
self.max_lvl = cfg.FPN.MAX_LEVEL
self.fuse_lvl = cfg.FPN.FUSE_LEVEL
self.highest_lvl = min(self.max_lvl, len(in_dims))
self.coarsest_stride = cfg.BACKBONE.COARSEST_STRIDE
self.out_dims = [self.dim] * (self.max_lvl - self.min_lvl + 1)
self.lateral_conv = nn.ModuleList()
self.output_conv = nn.ModuleList()
for dim in in_dims[self.min_lvl - 1:self.highest_lvl]:
self.lateral_conv += [lateral_conv_module(dim, self.dim, 1)]
self.output_conv += [output_conv_module(self.dim, self.dim, 3)]
if 'rcnn' not in cfg.MODEL.TYPE:
for lvl in range(self.highest_lvl + 1, self.max_lvl + 1):
dim = in_dims[-1] if lvl == self.highest_lvl + 1 else self.dim
self.output_conv += [output_conv_module(dim, self.dim, 3, stride=2)]
def forward(self, features):
features = features[self.min_lvl - 1:self.highest_lvl]
laterals = [conv(x) for conv, x in zip(self.lateral_conv, features)]
for i in range(self.fuse_lvl - self.min_lvl, 0, -1):
y, x = laterals[i - 1], laterals[i]
scale = 2 if self.coarsest_stride > 1 else None
size = None if self.coarsest_stride > 1 else y.shape[2:]
y += nn.functional.interpolate(x, size, scale)
outputs = [conv(x) for conv, x in zip(self.output_conv, laterals)]
if len(self.output_conv) <= len(self.lateral_conv):
for _ in range(len(outputs), len(self.out_dims)):
outputs.append(nn.functional.max_pool2d(outputs[-1], 1, stride=2))
else:
outputs.append(self.output_conv[len(outputs)](features[-1]))
for i in range(len(outputs), len(self.out_dims)):
outputs.append(self.output_conv[i](nn.functional.relu(outputs[-1])))
return outputs
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD neck."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.models.build import NECKS
from seetadet.ops.conv import ConvNorm2d
class SSDNeck(nn.Module):
"""Feature Pyramid Network."""
def __init__(self, in_dims, out_dims, kernel_sizes, strides, paddings):
super(SSDNeck, self).__init__()
self.out_dims = list(in_dims[-2:]) + list(out_dims)
dim_in, self.blocks = in_dims[-1], nn.ModuleList()
conv_module = functools.partial(
ConvNorm2d, conv_type=cfg.FPN.CONV,
norm_type=cfg.FPN.NORM, activation_type=cfg.FPN.ACTIVATION)
for dim, kernel_size, stride, padding in zip(
out_dims, kernel_sizes, strides, paddings):
self.blocks.append(conv_module(dim_in, dim // 2, 1))
self.blocks.append(conv_module(dim // 2, dim, kernel_size, stride, padding))
dim_in = dim
def forward(self, features):
x, outputs = features[-1], features[-2:]
for i, blk in enumerate(self.blocks):
x = blk(x)
if i % 2 > 0:
outputs.append(x)
return outputs
NECKS.register(
'ssd300', SSDNeck,
out_dims=(512, 256, 256, 256),
kernel_sizes=(3, 3, 3, 3),
strides=(2, 2, 1, 1),
paddings=(1, 1, 0, 0))
NECKS.register(
'ssd512', SSDNeck,
out_dims=(512, 256, 256, 256, 256),
kernel_sizes=(3, 3, 3, 3, 4),
strides=(2, 2, 2, 2, 1),
paddings=(1, 1, 1, 1, 1))
NECKS.register(
'ssdlite', SSDNeck,
out_dims=(512, 256, 256, 128),
kernel_sizes=(3, 3, 3, 3),
strides=(2, 2, 2, 2),
paddings=(1, 1, 1, 1))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Fast R-CNN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.ops.build import build_loss
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.vision import RoIPooler
class FastRCNNHead(nn.Module):
"""Fast R-CNN head."""
def __init__(self, in_dims):
super(FastRCNNHead, self).__init__()
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.FAST_RCNN.NORM,
kernel_size=3, activation_type='ReLU')
self.output_conv = nn.ModuleList()
self.output_fc = nn.ModuleList()
for i in range(cfg.FAST_RCNN.NUM_CONV):
dim = in_dims[0] if i == 0 else cfg.FAST_RCNN.CONV_HEAD_DIM
self.output_conv += [conv_module(dim, cfg.FAST_RCNN.CONV_HEAD_DIM)]
for i in range(cfg.FAST_RCNN.NUM_FC):
dim = in_dims[0] * cfg.FAST_RCNN.POOLER_RESOLUTION ** 2
dim = dim if i == 0 else cfg.FAST_RCNN.FC_HEAD_DIM
self.output_fc += [nn.Sequential(nn.Linear(dim, cfg.FAST_RCNN.FC_HEAD_DIM),
nn.ReLU(inplace=True))]
self.cls_score = nn.Linear(cfg.FAST_RCNN.FC_HEAD_DIM, len(cfg.MODEL.CLASSES))
num_classes = 1 if cfg.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC else len(cfg.MODEL.CLASSES) - 1
self.bbox_pred = nn.Linear(cfg.FAST_RCNN.FC_HEAD_DIM, num_classes * 4)
self.pooler = RoIPooler(
pooler_type=cfg.FAST_RCNN.POOLER_TYPE,
resolution=cfg.FAST_RCNN.POOLER_RESOLUTION,
sampling_ratio=cfg.FAST_RCNN.POOLER_SAMPLING_RATIO)
self.cls_loss = nn.CrossEntropyLoss(ignore_index=-1, reduction='mean')
self.bbox_loss = build_loss(cfg.FAST_RCNN.BBOX_REG_LOSS_TYPE)
self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL + 1)]
self.reset_parameters()
def reset_parameters(self):
nn.init.normal_(self.cls_score.weight, std=0.01)
nn.init.normal_(self.bbox_pred.weight, std=0.001)
def get_outputs(self, inputs):
x = torch.cat([self.pooler(
inputs['features'][i], inputs['rois'][i],
spatial_scale=spatial_scale) for i, spatial_scale
in enumerate(self.spatial_scales)])
for layer in self.output_conv:
x = layer(x)
x = x.flatten_(1)
for layer in self.output_fc:
x = layer(x)
cls_score, bbox_pred = self.cls_score(x), self.bbox_pred(x)
return {'cls_score': cls_score, 'bbox_pred': bbox_pred}
def get_losses(self, inputs, targets):
bbox_pred = inputs['bbox_pred'].reshape_((0, -1, 4))
bbox_pred = bbox_pred.flatten_(0, 1)[targets['bbox_inds']]
cls_loss = self.cls_loss(inputs['cls_score'], targets['labels'])
bbox_loss = self.bbox_loss(bbox_pred, targets['bbox_targets'])
normalizer = cfg.FAST_RCNN.BATCH_SIZE * cfg.TRAIN.IMS_PER_BATCH
bbox_loss_weight = cfg.FAST_RCNN.BBOX_REG_LOSS_WEIGHT / normalizer
bbox_loss = bbox_loss.mul_(bbox_loss_weight)
return {'cls_loss': cls_loss, 'bbox_loss': bbox_loss}
def forward(self, inputs, targets=None):
outputs = self.get_outputs(inputs)
if self.training:
logits = {'cls_score': outputs['cls_score'].float(),
'bbox_pred': outputs['bbox_pred'].float()}
outputs = self.get_losses(logits, targets)
outputs['bbox_pred'] = logits['bbox_pred'].data
return outputs
else:
outputs['cls_score'] = nn.functional.softmax(
outputs['cls_score'], dim=1, inplace=True)
return {'rois': torch.cat(inputs['rois']),
'cls_score': outputs['cls_score'].float(),
'bbox_pred': outputs['bbox_pred'].float()}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask R-CNN head."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.ops.conv import ConvNorm2d
from seetadet.ops.vision import RoIPooler
class MaskRCNNHead(nn.Module):
"""Mask R-CNN head."""
def __init__(self, in_dims):
super(MaskRCNNHead, self).__init__()
self.dim = cfg.MASK_RCNN.CONV_HEAD_DIM
conv_module = functools.partial(
ConvNorm2d, norm_type=cfg.MASK_RCNN.NORM,
kernel_size=3, activation_type='ReLU')
self.output_conv = nn.ModuleList()
for i in range(cfg.MASK_RCNN.NUM_CONV):
dim = in_dims[0] if i == 0 else self.dim
self.output_conv += [conv_module(dim, self.dim)]
self.output_conv += [nn.Sequential(
nn.ConvTranspose2d(self.dim, self.dim, 2, 2),
nn.ReLU(True))]
self.mask_pred = nn.Conv2d(self.dim, len(cfg.MODEL.CLASSES) - 1, 1)
self.pooler = RoIPooler(
pooler_type=cfg.MASK_RCNN.POOLER_TYPE,
resolution=cfg.MASK_RCNN.POOLER_RESOLUTION,
sampling_ratio=cfg.MASK_RCNN.POOLER_SAMPLING_RATIO)
self.mask_loss = nn.BCEWithLogitsLoss(reduction='mean')
self.spatial_scales = [1. / (2 ** lvl) for lvl in range(
cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL + 1)]
self.reset_parameters()
def reset_parameters(self):
nn.init.normal_(self.mask_pred.weight, std=0.001)
def get_outputs(self, inputs):
x = torch.cat([self.pooler(
inputs['features'][i], inputs['rois'][i],
spatial_scale=spatial_scale) for i, spatial_scale
in enumerate(self.spatial_scales)])
for layer in self.output_conv:
x = layer(x)
return {'mask_pred': self.mask_pred(x)}
def get_losses(self, inputs, targets):
mask_pred = inputs['mask_pred']
mask_pred = mask_pred.flatten_(0, 1)[targets['mask_inds']]
mask_loss = self.mask_loss(mask_pred, targets['mask_targets'])
return {'mask_loss': mask_loss}
def forward(self, inputs, targets=None):
outputs = self.get_outputs(inputs)
if self.training:
logits = {'mask_pred': outputs['mask_pred'].float()}
return self.get_losses(logits, targets)
else:
outputs['mask_pred'] = nn.functional.sigmoid(
outputs['mask_pred'], inplace=True).float()
return {'mask_pred': outputs['mask_pred']}
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.modules import rcnn
from seetadet.modules import retinanet
from seetadet.modules import ssd
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import types
import codewithgpu
from dragon.vm import torch
from seetadet.core.config import cfg
from seetadet.core.registry import Registry
from seetadet.utils.profiler import Timer
INFERENCE_MODULES = Registry('inference_modules')
def build_inference(model):
"""Build the inference module."""
return INFERENCE_MODULES.get(cfg.MODEL.TYPE)(model)
class InferenceModule(codewithgpu.InferenceModule):
"""Inference module."""
def __init__(self, model):
super(InferenceModule, self).__init__(model)
self.timers = collections.defaultdict(Timer)
def get_time_diffs(self):
"""Return the time differences."""
return dict((k, v.average_time)
for k, v in self.timers.items())
def trace(self, name, func, example_inputs=None):
"""Trace the function and bound to model."""
if not hasattr(self.model, name):
setattr(self.model, name, torch.jit.trace(
func=types.MethodType(func, self.model),
example_inputs=example_inputs))
return getattr(self.model, name)
@staticmethod
def register(model_type, **kwargs):
"""Register a inference module."""
def decorated(func):
return INFERENCE_MODULES.register(model_type, func, **kwargs)
return decorated
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RCNN modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.modules.build import InferenceModule
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.bbox import distribute_boxes
from seetadet.utils.bbox import filter_empty_boxes
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
@InferenceModule.register(['faster_rcnn', 'mask_rcnn', 'cascade_rcnn'])
class RCNNInference(InferenceModule):
"""RCNN inference module."""
def __init__(self, model):
super(RCNNInference, self).__init__(model)
self.forward_model = self.trace(
'forward_eval', lambda self, img, im_info, grid_info:
self.forward({'img': img, 'im_info': im_info,
'grid_info': grid_info}))
@torch.no_grad()
def get_results(self, imgs):
"""Return the inference results."""
img_boxes, proposals = self.forward_bbox(imgs)
if getattr(self.model, 'mask_head', None) is None:
return [{'boxes': boxes} for boxes in img_boxes]
proposals = np.concatenate(sum(proposals, []))
mask_pred = self.forward_mask(proposals)
ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
img_masks = [[] for _ in range(ims_per_batch)]
batch_inds = proposals[:, :1].astype('int32')
for i in range(ims_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
masks, labels = mask_pred[inds], proposals[inds, 5]
num_classes = len(img_boxes[index])
for _ in range(num_classes - len(img_masks[index])):
img_masks[index].append([])
for j in range(1, num_classes):
img_masks[index][j].append(masks[np.where(labels == (j - 1))[0]])
if (i + 1) % num_scales == 0:
v = img_masks[index][j]
img_masks[index][j] = np.vstack(v) if len(v) > 1 else v[0]
return [{'boxes': boxes, 'masks': masks}
for boxes, masks in zip(img_boxes, img_masks)]
@torch.no_grad()
def forward_data(self, imgs):
"""Return the inference data."""
im_batch, im_shapes, im_scales = [], [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
im_batch += scaled_imgs
im_scales += scales
im_shapes += [x.shape[:2] for x in scaled_imgs]
im_batch = blob_vstack(
im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
size=(cfg.TEST.CROP_SIZE,) * 2,
align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
im_shapes = np.array(im_shapes)
im_scales = np.array(im_scales).reshape((len(im_batch), -1))
im_info = np.hstack([im_shapes, im_scales]).astype('float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = grid_shapes.astype('int64')
return im_batch, im_info, grid_info
@torch.no_grad()
def forward_bbox(self, imgs):
"""Run bbox inference."""
im_batch, im_info, grid_info = self.forward_data(imgs)
self.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch),
'im_info': torch.from_numpy(im_info),
'grid_info': torch.from_numpy(grid_info)}
outputs = self.forward_model(inputs['img'], inputs['im_info'],
inputs['grid_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
cls_score, bbox_pred = self.forward_cascade(outputs)
ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [([], [], []) for _ in range(ims_per_batch)]
batch_inds = outputs['rois'][:, :1].astype('int32')
for i in range(ims_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
boxes = bbox_pred[inds] / im_info[i, 2]
boxes = clip_boxes(boxes, imgs[index].shape)
results[index][0].append(cls_score[inds])
results[index][1].append(boxes)
results[index][2].append(batch_inds[inds])
results = [[np.vstack(x) for x in y] for y in results]
self.timers['im_detect'].toc(n=ims_per_batch)
img_boxes, img_proposals = [], []
for scores, boxes, batch_inds in results:
with self.timers['misc'].tic_and_toc():
cls_boxes, cls_proposals = get_cls_results(
scores, boxes, batch_inds, im_info)
img_boxes.append(cls_boxes)
img_proposals.append(cls_proposals)
return img_boxes, img_proposals
@torch.no_grad()
def forward_mask(self, proposals):
"""Run mask inference."""
lvl_min, lvl_max = cfg.FAST_RCNN.MIN_LEVEL, cfg.FAST_RCNN.MAX_LEVEL
lvls = distribute_boxes(proposals[:, 1:5], lvl_min, lvl_max)
roi_inds = [np.where(lvls == (i + lvl_min))[0]
for i in range(lvl_max - lvl_min + 1)]
rois, labels = [], []
for inds in roi_inds:
rois.append(proposals[inds, :5] if len(inds) > 0 else
np.array([[-1, 0, 0, 1, 1]], 'float32'))
labels.append(proposals[inds, 5].astype('int64')
if len(inds) > 0 else np.array([-1], 'int64'))
self.timers['im_detect_mask'].tic()
inputs = {'features': self.model.outputs['features'],
'rois': [self.model.to_tensor(x) for x in rois]}
mask_pred = self.model.mask_head(inputs)['mask_pred']
num_rois, num_classes = mask_pred.shape[:2]
labels = np.concatenate(labels)
fg_inds = np.where(labels >= 0)[0]
strides = np.arange(num_rois) * num_classes
mask_inds = self.model.to_tensor(strides[fg_inds] + labels[fg_inds])
mask_pred = mask_pred.flatten_(0, 1)[mask_inds].numpy()
mask_pred = mask_pred[np.concatenate(roi_inds).argsort()].copy()
self.timers['im_detect_mask'].toc()
return mask_pred
@torch.no_grad()
def forward_cascade(self, outputs):
"""Run cascade inference."""
if not hasattr(self.model, 'bbox_heads'):
bbox_pred = bbox_transform_inv(
outputs['rois'][:, 1:5], outputs['bbox_pred'],
weights=cfg.FAST_RCNN.BBOX_REG_WEIGHTS)
return outputs['cls_score'], bbox_pred
num_stages = len(self.model.bbox_heads)
batch_inds = outputs['rois'][:, :1]
cls_score = outputs['cls_score'].copy()
lvl_slices = np.cumsum([0] + list(x.size(0) for x in self.model.outputs['rois']))
lvl_slices = [slice(lvl_slices[i], lvl_slices[i + 1])
for i in range(len(lvl_slices) - 1)]
inputs = {'features': self.model.outputs['features']}
for i in range(num_stages):
if i > 0:
outputs = self.model.bbox_heads[i](inputs)
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
cls_score += outputs['cls_score']
bbox_pred = bbox_transform_inv(
outputs['rois'][:, 1:5], outputs['bbox_pred'],
weights=self.model.bbox_reg_weights[i])
if i < num_stages - 1:
proposals = np.hstack((batch_inds, bbox_pred))
rois = [proposals[lvl_slice] for lvl_slice in lvl_slices]
inputs['rois'] = [self.model.to_tensor(x) for x in rois]
cls_score *= 1.0 / num_stages
return cls_score, bbox_pred
def get_cls_results(all_scores, all_boxes, batch_inds, im_info):
"""Return the categorical results."""
empty_boxes = np.zeros((0, 5), 'float32')
empty_proposals = np.zeros((0, 6), 'float32')
cls_boxes, cls_proposals = [[]], []
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
scores = all_scores[inds, j]
if cfg.FAST_RCNN.BBOX_REG_CLS_AGNOSTIC:
boxes = all_boxes[inds]
else:
boxes = all_boxes[inds, (j - 1) * 4:j * 4]
keep = filter_empty_boxes(boxes)
if len(keep) == 0:
cls_boxes.append(empty_boxes)
cls_proposals.append(empty_proposals)
continue
scores, boxes = scores[keep], boxes[keep]
dets = np.hstack((boxes, scores[:, np.newaxis]))
dets = dets.astype('float32', copy=False)
keep = nms(dets, cfg.TEST.NMS_THRESH)
batch_inds_keep = batch_inds[inds][keep]
cls_boxes.append(dets[keep, :])
cls_proposals.append(np.hstack((
batch_inds_keep,
cls_boxes[-1][:, :4] * im_info[batch_inds_keep, 2],
np.ones((len(keep), 1)) * (j - 1))).astype('float32'))
return cls_boxes, cls_proposals
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""RetinaNet modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.modules.build import InferenceModule
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
@InferenceModule.register('retinanet')
class RetinaNetInference(InferenceModule):
"""RetinaNet inference module."""
def __init__(self, model):
super(RetinaNetInference, self).__init__(model)
self.forward_model = self.trace(
'forward_eval', lambda self, img, im_info, grid_info:
self.forward({'img': img, 'im_info': im_info,
'grid_info': grid_info}))
@torch.no_grad()
def get_results(self, imgs):
"""Return the inference results."""
results = self.forward_bbox(imgs)
img_boxes = []
for dets in results:
with self.timers['misc'].tic_and_toc():
cls_boxes = get_cls_results(dets)
img_boxes.append(cls_boxes)
return [{'boxes': boxes} for boxes in img_boxes]
@torch.no_grad()
def forward_data(self, imgs):
"""Return the inference data."""
im_batch, im_shapes, im_scales = [], [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, max_size=cfg.TEST.MAX_SIZE)
im_batch += scaled_imgs
im_scales += scales
im_shapes += [x.shape[:2] for x in scaled_imgs]
im_batch = blob_vstack(
im_batch, fill_value=cfg.MODEL.PIXEL_MEAN,
size=(cfg.TEST.CROP_SIZE,) * 2,
align=(cfg.BACKBONE.COARSEST_STRIDE,) * 2)
im_shapes = np.array(im_shapes)
im_scales = np.array(im_scales).reshape((len(im_batch), -1))
im_info = np.hstack([im_shapes, im_scales]).astype('float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([im_batch.shape[1:3]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = grid_shapes.astype('int64')
return im_batch, im_info, grid_info
@torch.no_grad()
def forward_bbox(self, imgs):
"""Run bbox inference."""
im_batch, im_info, grid_info = self.forward_data(imgs)
self.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch),
'im_info': torch.from_numpy(im_info),
'grid_info': torch.from_numpy(grid_info)}
outputs = self.forward_model(inputs['img'], inputs['im_info'],
inputs['grid_info'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [[] for _ in range(ims_per_batch)]
batch_inds = outputs['dets'][:, 0:1].astype('int32')
for i in range(ims_per_batch * num_scales):
index = i // num_scales
inds = np.where(batch_inds == i)[0]
results[index].append(outputs['dets'][inds, 1:])
for index in range(ims_per_batch):
try:
results[index] = np.vstack(results[index])
except ValueError:
results[index] = results[index][0]
self.timers['im_detect'].toc(n=ims_per_batch)
return results
def get_cls_results(all_dets):
"""Return the categorical results."""
empty_boxes = np.zeros((0, 5), 'float32')
cls_boxes = [[]]
labels = all_dets[:, 5].astype('int32')
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(labels == j)[0]
if len(inds) == 0:
cls_boxes.append(empty_boxes)
continue
dets = all_dets[inds, :5].astype('float32')
keep = nms(dets, cfg.TEST.NMS_THRESH)
cls_boxes.append(dets[keep, :])
return cls_boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""SSD modules."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.modules.build import InferenceModule
from seetadet.utils.bbox import bbox_transform_inv
from seetadet.utils.bbox import clip_boxes
from seetadet.utils.blob import blob_vstack
from seetadet.utils.image import im_rescale
from seetadet.utils.nms import nms
@InferenceModule.register('ssd')
class SSDInference(InferenceModule):
"""SSD inference module."""
def __init__(self, model):
super(SSDInference, self).__init__(model)
self.forward_model = self.trace(
'forward_eval', lambda self, img:
self.forward({'img': img}))
@torch.no_grad()
def get_results(self, imgs):
"""Return the inference results."""
results = self.forward_bbox(imgs)
im_boxes = []
for scores, boxes in results:
with self.timers['misc'].tic_and_toc():
cls_boxes = get_cls_results(scores, boxes)
im_boxes.append(cls_boxes)
return [{'boxes': boxes} for boxes in im_boxes]
@torch.no_grad()
def forward_data(self, imgs):
"""Return the inference data."""
im_batch, im_scales = [], []
for img in imgs:
scaled_imgs, scales = im_rescale(
img, scales=cfg.TEST.SCALES, keep_ratio=False)
im_batch += scaled_imgs
im_scales += scales
im_batch = blob_vstack(im_batch, fill_value=cfg.MODEL.PIXEL_MEAN)
return im_batch, im_scales
@torch.no_grad()
def forward_bbox(self, imgs):
"""Run bbox inference."""
im_batch, im_scales = self.forward_data(imgs)
self.timers['im_detect'].tic()
inputs = {'img': torch.from_numpy(im_batch)}
outputs = self.forward_model(inputs['img'])
outputs = dict((k, outputs[k].numpy()) for k in outputs.keys())
anchors = self.model.bbox_head.targets.generator.grid_anchors
ims_per_batch, num_scales = len(imgs), len(cfg.TEST.SCALES)
results = [([], []) for _ in range(ims_per_batch)]
for i in range(ims_per_batch * num_scales):
index = i // num_scales
boxes = bbox_transform_inv(
anchors, outputs['bbox_pred'][i],
weights=cfg.SSD.BBOX_REG_WEIGHTS)
boxes[:, 0::2] /= im_scales[i][1]
boxes[:, 1::2] /= im_scales[i][0]
boxes = clip_boxes(boxes, imgs[index].shape)
results[index][0].append(outputs['cls_score'][i])
results[index][1].append(boxes)
results = [[np.vstack(x) for x in y] for y in results]
self.timers['im_detect'].toc(n=ims_per_batch)
return results
def get_cls_results(all_scores, all_boxes):
"""Return the categorical results."""
cls_boxes = [[]]
for j in range(1, len(cfg.MODEL.CLASSES)):
inds = np.where(all_scores[:, j] > cfg.TEST.SCORE_THRESH)[0]
scores, boxes = all_scores[inds, j], all_boxes[inds]
inds = np.argsort(-scores)[:cfg.SSD.PRE_NMS_TOPK]
scores, boxes = scores[inds], boxes[inds]
dets = np.hstack((boxes, scores[:, np.newaxis]))
dets = dets.astype('float32', copy=False)
keep = nms(dets, cfg.TEST.NMS_THRESH)
cls_boxes.append(dets[keep, :])
return cls_boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Operators."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from seetadet.core.engine.utils import load_library as _load_library
_load_library(os.path.join(os.path.dirname(__file__), '_C'))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Build for ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
from seetadet.ops.loss import GIoULoss
from seetadet.ops.loss import L1Loss
from seetadet.ops.loss import SmoothL1Loss
from seetadet.ops.loss import SigmoidFocalLoss
from seetadet.ops.normalization import FrozenBatchNorm2d
from seetadet.ops.normalization import TransposedLayerNorm
def build_loss(loss_type, reduction='sum', **kwargs):
if isinstance(loss_type, str):
loss_type = loss_type.lower()
if loss_type != 'smooth_l1':
kwargs.pop('beta', None)
loss_type = {
'l1': L1Loss,
'smooth_l1': SmoothL1Loss,
'giou': GIoULoss,
'cross_entroy': nn.CrossEntropyLoss,
'sigmoid_focal': SigmoidFocalLoss,
}[loss_type]
return loss_type(reduction=reduction, **kwargs)
def build_norm(dim, norm_type):
"""Build the normalization module."""
if isinstance(norm_type, str):
if len(norm_type) == 0:
return nn.Identity()
norm_type = {
'BN': nn.BatchNorm2d,
'FrozenBN': FrozenBatchNorm2d,
'SyncBN': nn.SyncBatchNorm,
'LN': TransposedLayerNorm,
'GN': lambda c: nn.GroupNorm(32, c),
'Affine': lambda c: FrozenBatchNorm2d(c, affine=True),
}[norm_type]
return norm_type(dim)
def build_activation(activation_type, inplace=False):
"""Build the activation module."""
if isinstance(activation_type, str):
if len(activation_type) == 0:
return nn.Identity()
activation_type = getattr(nn, activation_type)
activation = activation_type()
activation.inplace = inplace
return activation
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Convolution ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.torch import nn
from seetadet.ops.build import build_norm
class ConvNorm2d(nn.Sequential):
"""2d convolution followed by norm."""
def __init__(
self,
dim_in,
dim_out,
kernel_size,
stride=1,
padding=None,
dilation=1,
groups=1,
bias=True,
conv_type='Conv2d',
norm_type='',
activation_type='',
inplace=True,
):
super(ConvNorm2d, self).__init__()
if padding is None:
padding = kernel_size // 2
if conv_type == 'Conv2d':
layers = [nn.Conv2d(dim_in, dim_out,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=bias and (not norm_type))]
elif conv_type == 'SepConv2d':
layers = [nn.Conv2d(dim_in, dim_in,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=dim_in,
bias=False),
nn.Conv2d(dim_in, dim_out,
kernel_size=1,
bias=bias and (not norm_type))]
else:
raise ValueError('Unknown conv type: ' + conv_type)
if norm_type:
layers += [build_norm(dim_out, norm_type)]
if activation_type:
layers += [getattr(nn, activation_type)()]
layers[-1].inplace = inplace
for i, layer in enumerate(layers):
self.add_module(str(i), layer)
self.reset_parameters()
def reset_parameters(self):
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(
m.weight, mode='fan_out', nonlinearity='relu')
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Operator fusions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torch
from seetadet.core.registry import Registry
# Pass to fuse adjacent modules.
FUSIONS = Registry('fusions')
@FUSIONS.register([
'Conv2d+BatchNorm2d',
'Conv2d+FrozenBatchNorm2d',
'Conv2d+SyncBatchNorm',
'ConvTranspose2d+BatchNorm2d',
'ConvTranspose2d+FrozenBatchNorm2d',
'ConvTranspose2d+SyncBatchNorm',
'DepthwiseConv2d+BatchNorm2d',
'DepthwiseConv2d+FrozenBatchNorm2d',
'DepthwiseConv2d+SyncBatchNorm'])
def fuse_conv_bn(conv, bn):
"""Fuse Conv and BatchNorm."""
with torch.no_grad():
m = bn.running_mean
if conv.bias is not None:
m.sub_(conv.bias.float())
else:
delattr(conv, 'bias')
bn.forward = lambda x: x
t = bn.weight.div((bn.running_var + bn.eps).sqrt_())
conv._parameters['bias'] = bn.bias.sub(t * m)
t_conv_shape = [1, conv.out_channels] if conv.transposed else [0, 1]
t_conv_shape += [1] * len(conv.kernel_size)
if conv.weight.dtype == 'float16' and t.dtype == 'float32':
conv.bias.half_()
weight = conv.weight.float()
weight.mul_(t.reshape_(t_conv_shape)).half_()
conv.weight.copy_(weight)
else:
conv.weight.mul_(t.reshape_(t_conv_shape))
def get_fusion(*modules):
"""Return the fusion pass between modules."""
key = '+'.join(m.__class__.__name__ for m in modules)
return key, FUSIONS.try_get(key)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Loss ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import dragon
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
class GIoULoss(nn.Module):
"""GIoU loss."""
def __init__(self, reduction='sum', delta_weights=None):
super(GIoULoss, self).__init__()
self.reduction = reduction
self.delta_weights = delta_weights
def transform_inv(self, boxes, deltas):
widths = boxes[:, 2:3] - boxes[:, 0:1]
heights = boxes[:, 3:4] - boxes[:, 1:2]
ctr_x = boxes[:, 0:1] + 0.5 * widths
ctr_y = boxes[:, 1:2] + 0.5 * heights
dx, dy, dw, dh = torch.chunk(deltas, chunks=4, dim=1)
if self.delta_weights is not None:
wx, wy, ww, wh = self.delta_weights
dx, dy, dw, dh = dx / wx, dy / wy, dw / ww, dh / wh
pred_ctr_x = dx * widths + ctr_x
pred_ctr_y = dy * heights + ctr_y
pred_w = torch.exp(dw) * widths
pred_h = torch.exp(dh) * heights
x1 = pred_ctr_x - 0.5 * pred_w
y1 = pred_ctr_y - 0.5 * pred_h
x2 = pred_ctr_x + 0.5 * pred_w
y2 = pred_ctr_y + 0.5 * pred_h
return x1, y1, x2, y2
def forward_impl(self, input, target, anchor):
x1, y1, x2, y2 = self.transform_inv(anchor, input)
x1_, y1_, x2_, y2_ = self.transform_inv(anchor, target)
# Compute the independent area.
pred_area = (x2 - x1) * (y2 - y1)
target_area = (x2_ - x1_) * (y2_ - y1_)
# Compute the intersecting area.
x1_inter = torch.maximum(x1, x1_)
y1_inter = torch.maximum(y1, y1_)
x2_inter = torch.minimum(x2, x2_)
y2_inter = torch.minimum(y2, y2_)
w_inter = torch.clamp(x2_inter - x1_inter, min=0)
h_inter = torch.clamp(y2_inter - y1_inter, min=0)
area_inter = w_inter * h_inter
# Compute the enclosing area.
x1_enc = torch.minimum(x1, x1_)
y1_enc = torch.minimum(y1, y1_)
x2_enc = torch.maximum(x2, x2_)
y2_enc = torch.maximum(y2, y2_)
area_enc = (x2_enc - x1_enc) * (y2_enc - y1_enc) + 1.
# Compute the differentiable IoU metric.
area_union = pred_area + target_area - area_inter
iou = area_inter / (area_union + 1.)
iou_metric = iou - (area_enc - area_union) / area_enc
# Compute the reduced loss.
if self.reduction == 'sum':
return (1 - iou_metric).sum()
else:
return (1 - iou_metric).mean()
def forward(self, *inputs, **kwargs):
with dragon.variable_scope('IoULossVariable'):
return self.forward_impl(*inputs, **kwargs)
class L1Loss(nn.L1Loss):
"""L1 loss."""
def forward(self, input, target, *args):
return super(L1Loss, self).forward(input, target)
class SigmoidFocalLoss(nn.SigmoidFocalLoss):
"""Sigmoid focal loss."""
def __init__(self, reduction='sum'):
super(SigmoidFocalLoss, self).__init__(
alpha=cfg.MODEL.FOCAL_LOSS_ALPHA,
gamma=cfg.MODEL.FOCAL_LOSS_GAMMA,
start_index=1, # Foreground index
reduction=reduction)
class SmoothL1Loss(nn.SmoothL1Loss):
"""Smoothed l1 loss."""
def forward(self, input, target, *args):
return nn.functional.smooth_l1_loss(
input, target, beta=self.beta,
reduction=self.reduction)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Normalization ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
from dragon.vm import torch
from dragon.vm.torch import nn
from seetadet.core.config import cfg
from seetadet.core.engine.utils import get_device
class FrozenBatchNorm2d(nn.Module):
"""BatchNorm2d where statistics or affine parameters are fixed."""
def __init__(self, num_features, eps=1e-5, affine=False, inplace=True):
super(FrozenBatchNorm2d, self).__init__()
self.num_features = num_features
self.eps = eps
self.affine = affine
self.inplace = inplace and (not affine)
if self.affine:
self.weight = torch.nn.Parameter(torch.ones(num_features))
self.bias = torch.nn.Parameter(torch.zeros(num_features))
else:
self.register_buffer('weight', torch.ones(num_features))
self.register_buffer('bias', torch.zeros(num_features))
self.register_buffer('running_mean', torch.zeros(num_features))
self.register_buffer('running_var', torch.ones(num_features) - eps)
def extra_repr(self):
affine_str = '{num_features}, eps={eps}, affine={affine}' \
.format(**self.__dict__)
inplace_str = ', inplace' if self.inplace else ''
return affine_str + inplace_str
def forward(self, input):
return nn.functional.affine(
input, self.weight, self.bias,
dim=1, out=input if self.inplace else None)
def _load_from_state_dict(
self,
state_dict,
prefix,
strict,
missing_keys,
unexpected_keys,
error_msgs,
):
super(FrozenBatchNorm2d, self)._load_from_state_dict(
state_dict,
prefix,
strict,
missing_keys,
unexpected_keys,
error_msgs,
)
# Fuse the running stats into weight and bias.
# Note that this behavior will break the original stats
# into zero means and one stds.
with torch.no_grad():
self.running_var.float_().add_(self.eps).sqrt_()
self.weight.float_().div_(self.running_var)
self.bias.float_().sub_(self.running_mean.float_() * self.weight)
self.running_mean.zero_()
self.running_var.one_().sub_(self.eps)
class TransposedLayerNorm(nn.LayerNorm):
"""LayerNorm with pre-transposed spatial axes."""
def forward(self, input):
return nn.functional.layer_norm(
input.permute(0, 2, 3, 1), self.normalized_shape,
self.weight, self.bias, self.eps).permute(0, 3, 1, 2)
class L2Norm(nn.Module):
"""Parameterized L2 normalize."""
def __init__(self, num_features, init=20., eps=1e-5):
super(L2Norm, self).__init__()
self.eps = eps
self.weight = nn.Parameter(torch.Tensor(num_features).fill_(init))
def forward(self, input):
out = nn.functional.normalize(input, p=2, dim=1, eps=self.eps)
return nn.functional.affine(out, self.weight, dim=1)
class ToTensor(nn.Module):
"""Convert input to tensor."""
def __init__(self):
super(ToTensor, self).__init__()
self.device = torch.device('cpu')
self.tensor = torch.ones(1)
self.normalize = functools.partial(
nn.functional.channel_norm,
mean=cfg.MODEL.PIXEL_MEAN,
std=cfg.MODEL.PIXEL_STD,
dim=1, dims=(0, 3, 1, 2),
dtype=cfg.MODEL.PRECISION.lower())
def _apply(self, fn):
fn(self.tensor)
def forward(self, input, normalize=False):
if input is None:
return input
if not isinstance(input, torch.Tensor):
input = torch.from_numpy(input)
input = input.to(self.tensor.device)
if normalize and not input.is_floating_point():
input = self.normalize(input)
return input
def to_tensor(input, to_device=True):
"""Convert input to tensor."""
if input is None:
return input
if not isinstance(input, torch.Tensor):
input = torch.from_numpy(input)
if to_device:
input = input.to(device=get_device(cfg.GPU_ID))
return input
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm.onnx.core import helper
from dragon.vm.onnx.core.exporters import utils as export_util
@export_util.register('RetinaNetDecoder')
def retinanet_decoder_exporter(op_def, context):
node, const_tensors = export_util.translate(**locals())
node.op_type = 'ATen' # Currently not supported in ai.onnx.
helper.add_attribute(node, 'op_type', 'RetinaNetDecoder')
for arg in op_def.arg:
if arg.name == 'strides':
helper.add_attribute(node, 'strides', arg.ints)
elif arg.name == 'ratios':
helper.add_attribute(node, 'ratios', arg.floats)
elif arg.name == 'scales':
helper.add_attribute(node, 'scales', arg.floats)
elif arg.name == 'pre_nms_topk':
helper.add_attribute(node, 'pre_nms_topk', arg.i)
elif arg.name == 'score_thresh':
helper.add_attribute(node, 'score_thresh', arg.f)
return node, const_tensors
@export_util.register('RPNDecoder')
def rpn_decoder_exporter(op_def, context):
node, const_tensors = export_util.translate(**locals())
node.op_type = 'ATen' # Currently not supported in ai.onnx.
helper.add_attribute(node, 'op_type', 'RPNDecoder')
for arg in op_def.arg:
if arg.name == 'strides':
helper.add_attribute(node, 'strides', arg.ints)
elif arg.name == 'ratios':
helper.add_attribute(node, 'ratios', arg.floats)
elif arg.name == 'scales':
helper.add_attribute(node, 'scales', arg.floats)
elif arg.name == 'pre_nms_topk':
helper.add_attribute(node, 'pre_nms_topk', arg.i)
elif arg.name == 'post_nms_topk':
helper.add_attribute(node, 'post_nms_topk', arg.i)
elif arg.name == 'nms_thresh':
helper.add_attribute(node, 'nms_thresh', arg.f)
elif arg.name == 'min_level':
helper.add_attribute(node, 'min_level', arg.i)
elif arg.name == 'max_level':
helper.add_attribute(node, 'max_level', arg.i)
return node, const_tensors
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Vision ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from dragon.vm import torchvision
from dragon.vm.torch import nn
from dragon.vm.torch import autograd
class RoIPooler(nn.Module):
"""Resample RoI features into a fixed resolution."""
def __init__(self, pooler_type='RoIAlign', resolution=7, sampling_ratio=0):
super(RoIPooler, self).__init__()
if not isinstance(resolution, (tuple, list)):
resolution = (resolution, resolution)
self.pooler_type = pooler_type
self.resolution = resolution
self.sampling_ratio = sampling_ratio
def forward(self, input, boxes, spatial_scale=1.0):
if self.pooler_type == 'RoIPool':
return torchvision.ops.roi_pool(
input, boxes,
output_size=self.resolution,
spatial_scale=spatial_scale)
elif self.pooler_type == 'RoIAlign':
return torchvision.ops.roi_align(
input, boxes,
output_size=self.resolution,
spatial_scale=spatial_scale,
sampling_ratio=self.sampling_ratio,
aligned=False)
elif self.pooler_type == 'RoIAlignV2':
return torchvision.ops.roi_align(
input, boxes,
output_size=self.resolution,
spatial_scale=spatial_scale,
sampling_ratio=self.sampling_ratio,
aligned=True)
else:
raise NotImplementedError
class NonMaxSuppression(object):
"""Filter out boxes that have high IoU with selected ones."""
@staticmethod
def apply(input, iou_threshold=0.5):
return autograd.Function.apply(
'NonMaxSuppression', input.device, [input],
iou_threshold=float(iou_threshold))
autograd.Function.register(
'NonMaxSuppression', lambda **kwargs: {
'iou_threshold': kwargs.get('iou_threshold', 0.5),
})
class PasteMask(object):
"""Paste a set of masks on an image."""
@staticmethod
def apply(masks, boxes, output_size, mask_threshold=0.5):
if not isinstance(output_size, (tuple, list)):
output_size = (output_size, output_size)
return autograd.Function.apply(
'PasteMask', masks.device, [masks, boxes],
mask_threshold=float(mask_threshold),
num_sizes=len(output_size), sizes=output_size)
autograd.Function.register(
'PasteMask', lambda **kwargs: {
'mask_threshold': kwargs.get('mask_threshold', 0.5),
'sizes_desc': 'int64',
})
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Bounding-Box utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.bbox.helper import clip_boxes
from seetadet.utils.bbox.helper import clip_tiled_boxes
from seetadet.utils.bbox.helper import distribute_boxes
from seetadet.utils.bbox.helper import filter_empty_boxes
from seetadet.utils.bbox.helper import flip_boxes
from seetadet.utils.bbox.metrics import bbox_overlaps
from seetadet.utils.bbox.metrics import bbox_centerness
from seetadet.utils.bbox.transforms import bbox_transform
from seetadet.utils.bbox.transforms import bbox_transform_inv
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Helper functions for bounding box."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def clip_boxes(boxes, im_shape):
"""Clip the boxes."""
xmax, ymax = im_shape[1], im_shape[0]
boxes[:, (0, 2)] = np.maximum(np.minimum(boxes[:, (0, 2)], xmax), 0)
boxes[:, (1, 3)] = np.maximum(np.minimum(boxes[:, (1, 3)], ymax), 0)
return boxes
def clip_tiled_boxes(boxes, im_shape):
"""Clip the tiled boxes."""
xmax, ymax = im_shape[1], im_shape[0]
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], xmax), 0)
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], ymax), 0)
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], xmax), 0)
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], ymax), 0)
return boxes
def flip_boxes(boxes, width):
"""Flip the boxes horizontally."""
boxes_flipped = boxes.copy()
boxes_flipped[:, 0] = width - boxes[:, 2]
boxes_flipped[:, 2] = width - boxes[:, 0]
return boxes_flipped
def filter_empty_boxes(boxes):
"""Return the indices of non-empty boxes."""
ws = boxes[:, 2] - boxes[:, 0]
hs = boxes[:, 3] - boxes[:, 1]
return np.where((ws > 0) & (hs > 0))[0]
def distribute_boxes(boxes, lvl_min, lvl_max):
"""Return the fpn level of boxes."""
if len(boxes) == 0:
return []
ws = boxes[:, 2] - boxes[:, 0]
hs = boxes[:, 3] - boxes[:, 1]
s = np.sqrt(ws * hs)
s0 = 224 # default: 224
lvl0 = 4 # default: 4
lvls = np.floor(lvl0 + np.log2(s / s0 + 1e-6))
return np.clip(lvls, lvl_min, lvl_max)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Bounding-Box metrics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.bbox import cython_bbox
import numpy as np
def bbox_overlaps(boxes1, boxes2):
"""Return the overlaps between two group of boxes."""
boxes1 = np.ascontiguousarray(boxes1, dtype=np.float)
boxes2 = np.ascontiguousarray(boxes2, dtype=np.float)
return cython_bbox.bbox_overlaps(boxes1, boxes2)
def bbox_centerness(boxes1, boxes2):
"""Return centerness between two group of boxes."""
ctr_x = (boxes1[:, 2] + boxes1[:, 0]) / 2
ctr_y = (boxes1[:, 3] + boxes1[:, 1]) / 2
l = ctr_x - boxes2[:, 0]
t = ctr_y - boxes2[:, 1]
r = boxes2[:, 2] - ctr_x
b = boxes2[:, 3] - ctr_y
centerness = ((np.minimum(l, r) / np.maximum(l, r)) *
(np.minimum(t, b) / np.maximum(t, b)))
min_dist = np.stack([l, t, r, b], axis=1).min(axis=1)
keep_inds = np.where(min_dist > 0.01)[0]
discard_inds = np.where(min_dist <= 0.01)[0]
centerness[keep_inds] = np.sqrt(centerness[keep_inds])
centerness[discard_inds] = -1
return centerness, keep_inds, discard_inds
def boxes_area(boxes):
"""Return the area of boxes."""
return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Bounding-Box transforms."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
_DEFAULT_SCALE_CLIP = np.log(1000.0 / 16.0)
def bbox_transform(src_boxes, tgt_boxes, weights=(1., 1., 1., 1.)):
"""Return the bbox transformation deltas."""
src_widths = src_boxes[:, 2] - src_boxes[:, 0]
src_heights = src_boxes[:, 3] - src_boxes[:, 1]
src_ctr_x = src_boxes[:, 0] + 0.5 * src_widths
src_ctr_y = src_boxes[:, 1] + 0.5 * src_heights
tgt_widths = tgt_boxes[:, 2] - tgt_boxes[:, 0]
tgt_heights = tgt_boxes[:, 3] - tgt_boxes[:, 1]
tgt_ctr_x = tgt_boxes[:, 0] + 0.5 * tgt_widths
tgt_ctr_y = tgt_boxes[:, 1] + 0.5 * tgt_heights
(wx, wy, ww, wh), deltas = weights, []
deltas += [wx * (tgt_ctr_x - src_ctr_x) / src_widths]
deltas += [wy * (tgt_ctr_y - src_ctr_y) / src_heights]
deltas += [ww * np.log(tgt_widths / src_widths)]
deltas += [wh * np.log(tgt_heights / src_heights)]
return np.vstack(deltas).transpose()
def bbox_transform_inv(boxes, deltas, weights=(1., 1., 1., 1.)):
"""Return the boxes transformed from deltas."""
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0]
heights = boxes[:, 3] - boxes[:, 1]
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
wx, wy, ww, wh = weights
dx = deltas[:, 0::4] / wx
dy = deltas[:, 1::4] / wy
dw = deltas[:, 2::4] / ww
dh = deltas[:, 3::4] / wh
dw = np.minimum(dw, _DEFAULT_SCALE_CLIP)
dh = np.minimum(dh, _DEFAULT_SCALE_CLIP)
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, deltas.dtype)
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Blob utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def blob_vstack(arrays, fill_value=None, dtype=None, size=None, align=None):
"""Stack arrays in sequence vertically."""
if fill_value is None:
return np.vstack(arrays)
# Compute the max stack shape.
max_shape = np.max(np.stack([arr.shape for arr in arrays]), 0)
if size is not None and min(size) > 0:
max_shape[:len(size)] = size
if align is not None and min(align) > 0:
align_size = np.ceil(max_shape[:len(align)] / align)
max_shape[:len(align)] = align_size.astype('int64') * align
# Fill output with the given value.
output_dtype = dtype or arrays[0].dtype
output_shape = [len(arrays)] + list(max_shape)
output = np.empty(output_shape, output_dtype)
output[:] = fill_value
# Copy arrays.
for i, arr in enumerate(arrays):
copy_slices = (slice(0, d) for d in arr.shape)
output[(i,) + tuple(copy_slices)] = arr
return output
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Image utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import PIL.Image
import PIL.ImageEnhance
def im_resize(img, size=None, scale=None, mode='linear'):
"""Resize image by the scale or size."""
if size is None:
if not isinstance(scale, (tuple, list)):
scale = (scale, scale)
h, w = img.shape[:2]
size = int(h * scale[0] + .5), int(w * scale[1] + .5)
else:
if not isinstance(size, (tuple, list)):
size = (size, size)
mode = {'linear': PIL.Image.BILINEAR,
'nearest': PIL.Image.NEAREST}[mode]
img = PIL.Image.fromarray(img)
return np.array(img.resize(size[::-1], mode))
def im_rescale(img, scales, max_size=0, keep_ratio=True):
"""Rescale image to match the detecting scales."""
im_shape = img.shape
img_list, img_scales = [], []
if keep_ratio:
size_min = np.min(im_shape[:2])
size_max = np.max(im_shape[:2])
for target_size in scales:
im_scale = float(target_size) / float(size_min)
target_size_max = max_size if max_size > 0 else target_size
if np.round(im_scale * size_max) > target_size_max:
im_scale = float(target_size_max) / float(size_max)
img_list.append(im_resize(img, scale=im_scale))
img_scales.append((im_scale, im_scale))
else:
for target_size in scales:
h_scale = float(target_size) / im_shape[0]
w_scale = float(target_size) / im_shape[1]
img_list.append(im_resize(img, size=target_size))
img_scales.append((h_scale, w_scale))
return img_list, img_scales
def color_jitter(img, brightness=None, contrast=None, saturation=None):
"""Distort the color of image."""
def add_transform(transforms, type, range):
if range is not None:
if not isinstance(range, (tuple, list)):
range = (1. - range, 1. + range)
transforms.append((type, range))
transforms = []
contrast_first = np.random.rand() < 0.5
add_transform(transforms, PIL.ImageEnhance.Brightness, brightness)
if contrast_first:
add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
add_transform(transforms, PIL.ImageEnhance.Color, saturation)
if not contrast_first:
add_transform(transforms, PIL.ImageEnhance.Contrast, contrast)
for transform, jitter_range in transforms:
if isinstance(img, np.ndarray):
img = PIL.Image.fromarray(img)
img = transform(img)
img = img.enhance(np.random.uniform(*jitter_range))
return np.asarray(img)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Logging utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import inspect
import logging as _logging
import os
import sys as _sys
import threading
_logger = None
_logger_lock = threading.Lock()
def get_logger():
global _logger
# Use double-checked locking to avoid taking lock unnecessarily.
if _logger:
return _logger
_logger_lock.acquire()
try:
if _logger:
return _logger
logger = _logging.getLogger('seetadet')
logger.setLevel('INFO')
logger.propagate = False
logger._is_root = True
if True:
# Determine whether we are in an interactive environment.
_interactive = False
try:
# This is only defined in interactive shells.
if _sys.ps1:
_interactive = True
except AttributeError:
# Even now, we may be in an interactive shell with `python -i`.
_interactive = _sys.flags.interactive
# If we are in an interactive environment (like Jupyter), set loglevel
# to INFO and pipe the output to stdout.
if _interactive:
logger.setLevel('INFO')
_logging_target = _sys.stdout
else:
_logging_target = _sys.stderr
# Add the output handler.
_handler = _logging.StreamHandler(_logging_target)
_handler.setFormatter(_logging.Formatter('%(levelname)s %(message)s'))
logger.addHandler(_handler)
_logger = logger
return _logger
finally:
_logger_lock.release()
def _detailed_msg(msg):
file, lineno = inspect.stack()[:3][2][1:3]
return "{}:{}] {}".format(os.path.split(file)[-1], lineno, msg)
def log(level, msg, *args, **kwargs):
get_logger().log(level, _detailed_msg(msg), *args, **kwargs)
def debug(msg, *args, **kwargs):
if is_root():
get_logger().debug(_detailed_msg(msg), *args, **kwargs)
def error(msg, *args, **kwargs):
get_logger().error(_detailed_msg(msg), *args, **kwargs)
assert 0
def fatal(msg, *args, **kwargs):
get_logger().fatal(_detailed_msg(msg), *args, **kwargs)
assert 0
def info(msg, *args, **kwargs):
if is_root():
get_logger().info(_detailed_msg(msg), *args, **kwargs)
def warning(msg, *args, **kwargs):
if is_root():
get_logger().warning(_detailed_msg(msg), *args, **kwargs)
def get_verbosity():
"""Return how much logging output will be produced."""
return get_logger().getEffectiveLevel()
def set_verbosity(v):
"""Set the threshold for what messages will be logged."""
get_logger().setLevel(v)
def set_formatter(fmt=None, datefmt=None):
"""Set the formatter."""
handler = _logging.StreamHandler(_sys.stderr)
handler.setFormatter(_logging.Formatter(fmt, datefmt))
logger = get_logger()
logger.removeHandler(logger.handlers[0])
logger.addHandler(handler)
def set_root(is_root=True):
"""Set logger to the root."""
get_logger()._is_root = is_root
def is_root():
"""Return logger is the root."""
return get_logger()._is_root
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.mask.helper import encode_masks
from seetadet.utils.mask.helper import mask_from
from seetadet.utils.mask.helper import mask_to_polygons
from seetadet.utils.mask.helper import paste_masks
from seetadet.utils.mask.metrics import mask_overlap
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Helper functions for mask."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import copy
import cv2
import numpy as np
from pycocotools.mask import decode
from pycocotools.mask import encode
from pycocotools.mask import merge
from pycocotools.mask import frPyObjects
from seetadet.ops.normalization import to_tensor
from seetadet.ops.vision import PasteMask
from seetadet.utils.image import im_resize
def mask_from_buffer(buffer, size, box=None):
"""Return a binary mask from the buffer."""
if not isinstance(size, (tuple, list)):
size = (size, size)
rles = [{'counts': buffer, 'size': size}]
mask = decode(rles)
if mask.shape[2] != 1:
raise ValueError('Mask contains {} instances. '
'Merge them before compressing.'
.format(mask.shape[2]))
mask = mask[:, :, 0]
if box is not None:
box = np.round(box).astype('int64')
mask = mask[box[1]:box[3], box[0]:box[2]]
return mask
def mask_from_polygons(polygons, size, box=None):
"""Return a binary mask from the polygons."""
if not isinstance(size, (tuple, list)):
size = (size, size)
if box is not None:
polygons = copy.deepcopy(polygons)
w, h = box[2] - box[0], box[3] - box[1]
ratio_h = size[0] / max(h, 0.1)
ratio_w = size[1] / max(w, 0.1)
for p in polygons:
p[0::2] = p[0::2] - box[0]
p[1::2] = p[1::2] - box[1]
if ratio_h == ratio_w:
for p in polygons:
p *= ratio_h
else:
for p in polygons:
p[0::2] *= ratio_w
p[1::2] *= ratio_h
rles = frPyObjects(polygons, size[0], size[1])
return decode(merge(rles))
def mask_from_bitmap(bitmap, size, box=None):
"""Return a binary mask from the bitmap."""
if not isinstance(size, (tuple, list)):
size = (size, size)
if box is not None:
box = np.round(box).astype('int64')
bitmap = bitmap[box[1]:box[3], box[0]:box[2]]
return im_resize(bitmap, size, mode='nearest')
def mask_from(segm, size, box=None):
"""Return a binary mask from the segmentation object."""
if segm is None:
return None
elif isinstance(segm, list):
return mask_from_polygons(segm, size, box)
elif isinstance(segm, np.ndarray):
return mask_from_bitmap(segm, size, box)
elif isinstance(segm, bytes):
return mask_from_buffer(segm, size, box)
else:
raise TypeError('Unknown segmentation type: ' + type(segm))
def mask_to_polygons(mask):
"""Convert a binary mask to a set of polygons."""
mask = np.ascontiguousarray(mask)
res = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_NONE)
hierarchy = res[-1]
if hierarchy is None:
return []
contours = res[-2]
polygons = [x.flatten() for x in contours]
polygons = [x + 0.5 for x in polygons if len(x) >= 6]
return polygons
def encode_masks(masks):
"""Encode a set of masks to RLEs."""
rles = encode(np.asfortranarray(masks))
for rle in rles:
rle['counts'] = rle['counts'].decode()
return rles
def paste_masks(masks, boxes, img_size, threshold=0.5, channels_last=True):
"""Paste a set of masks on an image by resample."""
masks, boxes = to_tensor(masks), to_tensor(boxes[:, :4])
img_masks = PasteMask.apply(masks, boxes, img_size, threshold)
img_masks = img_masks.numpy().copy()
return img_masks.transpose((1, 2, 0)) if channels_last else img_masks
def paste_masks_old(masks, boxes, img_size, thresh=0.5):
"""Paste a set of masks on an image by resize."""
def scale_boxes(boxes, scale_factor=1.):
"""Scale the boxes."""
w = (boxes[:, 2] - boxes[:, 0]) * 0.5 * scale_factor
h = (boxes[:, 3] - boxes[:, 1]) * 0.5 * scale_factor
x_ctr = (boxes[:, 2] + boxes[:, 0]) * 0.5
y_ctr = (boxes[:, 3] + boxes[:, 1]) * 0.5
boxes_scaled = np.zeros(boxes.shape)
boxes_scaled[:, 0], boxes_scaled[:, 1] = x_ctr - w, y_ctr - h
boxes_scaled[:, 2], boxes_scaled[:, 3] = x_ctr + w, y_ctr + h
return boxes_scaled
num_boxes = boxes.shape[0]
assert masks.shape[0] == num_boxes
img_shape = list(img_size) + [num_boxes]
output = np.zeros(img_shape, 'uint8')
size = masks[0].shape[0]
scale_factor = (size + 2.) / size
boxes = scale_boxes(boxes, scale_factor).astype(np.int32)
padded_mask = np.zeros((size + 2, size + 2), 'float32')
for i in range(num_boxes):
box, mask = boxes[i, :4], masks[i]
padded_mask[1:-1, 1:-1] = mask[:, :]
w = max(box[2] - box[0], 1)
h = max(box[3] - box[1], 1)
mask = cv2.resize(padded_mask, (w, h))
mask = np.array(mask > thresh, 'uint8')
x1, y1 = max(box[0], 0), max(box[1], 0)
x2, y2 = min(box[2], img_size[1]), min(box[3], img_size[0])
mask = mask[y1 - box[1]:y2 - box[1], x1 - box[0]:x2 - box[0]]
output[y1:y2, x1:x2, i] = mask
return output
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Mask metrics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def mask_overlap(box1, box2, mask1, mask2):
"""Compute the overlap of two masks."""
x1 = max(box1[0], box2[0])
y1 = max(box1[1], box2[1])
x2 = min(box1[2], box2[2])
y2 = min(box1[3], box2[3])
if x1 > x2 or y1 > y2:
return 0
w = x2 - x1
h = y2 - y1
# Get masks in the intersection part.
start_ya = y1 - box1[1]
start_xa = x1 - box1[0]
inter_mask_a = mask1[start_ya: start_ya + h, start_xa:start_xa + w]
start_yb = y1 - box2[1]
start_xb = x1 - box2[0]
inter_mask_b = mask2[start_yb: start_yb + h, start_xb:start_xb + w]
assert inter_mask_a.shape == inter_mask_b.shape
inter = np.logical_and(inter_mask_b, inter_mask_a).sum()
union = mask1.sum() + mask2.sum() - inter
if union < 1.:
return 0.
return float(inter) / float(union)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Non-Maximum Suppression utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.nms.helper import gpu_nms
from seetadet.utils.nms.helper import nms
from seetadet.utils.nms.helper import soft_nms
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Helper functions of Non-Maximum Suppression."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.ops.normalization import to_tensor
from seetadet.ops.vision import NonMaxSuppression
try:
from seetadet.utils.nms.cython_nms import cpu_nms
from seetadet.utils.nms.cython_nms import cpu_soft_nms
except ImportError:
cpu_nms = cpu_soft_nms = print
def gpu_nms(dets, thresh):
"""Filter out the dets using GPU - NMS."""
if dets.shape[0] == 0:
return []
scores = dets[:, 4]
order = scores.argsort()[::-1]
sorted_dets = to_tensor(dets[order, :])
keep = NonMaxSuppression.apply(sorted_dets, iou_threshold=thresh)
return order[keep.numpy()]
def nms(dets, thresh):
"""Filter out the dets using NMS."""
if dets.shape[0] == 0:
return []
if cpu_nms is print:
raise ImportError('Failed to load <cython_nms> library.')
return cpu_nms(dets, thresh)
def soft_nms(dets, thresh, method='linear', sigma=0.5, score_thresh=0.001):
"""Filter out the dets using Soft - NMS."""
if dets.shape[0] == 0:
return []
if cpu_soft_nms is print:
raise ImportError('Failed to load <cython_nms> library.')
methods = {'hard': 0, 'linear': 1, 'gaussian': 2}
if method not in methods:
raise ValueError('Unknown soft nms method: ' + method)
return cpu_soft_nms(dets, thresh, methods[method], sigma, score_thresh)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Polygon utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.polygon.helper import crop_polygons
from seetadet.utils.polygon.helper import flip_polygons
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Helper functions for polygon."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import shapely.geometry as geometry
def flip_polygons(polygons, width):
"""Flip the polygons horizontally."""
for i, p in enumerate(polygons):
p_flipped = p.copy()
p_flipped[0::2] = width - p[0::2]
polygons[i] = p_flipped
return polygons
def crop_polygons(polygons, crop_box):
"""Crop the polygons."""
x, y = crop_box[:2]
crop_box = geometry.box(*crop_box).buffer(0.0)
crop_polygons = []
for p in polygons:
p = p.reshape((-1, 2))
p = geometry.Polygon(p).buffer(0.0)
if not p.is_valid:
continue
cropped = p.intersection(crop_box)
if cropped.is_empty:
continue
cropped = getattr(cropped, 'geoms', [cropped])
for new_p in cropped:
if not isinstance(new_p, geometry.Polygon) or not new_p.is_valid:
continue
coords = np.asarray(new_p.exterior.coords)[:-1]
coords[:, 0] -= x
coords[:, 1] -= y
crop_polygons.append(coords.flatten())
return crop_polygons
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Profiler utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.profiler.stats import SmoothedValue
from seetadet.utils.profiler.timer import Timer
from seetadet.utils.profiler.timer import get_progress
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Trackable statistics."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import numpy as np
class SmoothedValue(object):
"""Track values and provide smoothed report."""
def __init__(self, window_size=None):
self.deque = collections.deque(maxlen=window_size)
self.total = 0.0
self.count = 0
def update(self, value):
self.deque.append(value)
self.count += 1
self.total += value
def mean(self):
return np.mean(self.deque)
def median(self):
return np.median(self.deque)
def average(self):
return self.total / self.count
class ExponentialMovingAverage(object):
"""Track values and provide EMA report."""
def __init__(self, decay=0.9):
self.value = None
self.decay = decay
self.total = 0.0
self.count = 0
def update(self, value):
if self.value is None:
self.value = value
else:
self.value = (self.decay * self.value +
(1.0 - self.decay) * value)
self.total += value
self.count += 1
def global_average(self):
return self.total / self.count
def running_average(self):
return float(self.value)
def __float__(self):
return self.running_average()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Timing functions."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import contextlib
import datetime
import time
class Timer(object):
"""Simple timer."""
def __init__(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
def add_diff(self, diff, n=1, average=True):
self.total_time += diff
self.calls += n
self.average_time = self.total_time / self.calls
return self.average_time if average else self.diff
@contextlib.contextmanager
def tic_and_toc(self, n=1, average=True):
try:
yield self.tic()
finally:
self.toc(n, average)
def tic(self):
self.start_time = time.time()
return self
def toc(self, n=1, average=True):
self.diff = time.time() - self.start_time
return self.add_diff(self.diff, n, average)
def get_progress(timer, step, max_steps):
"""Return the progress information."""
eta_seconds = timer.average_time * (max_steps - step)
eta = str(datetime.timedelta(seconds=int(eta_seconds)))
progress = (step + 1.) / max_steps
return ('< PROGRESS: {:.2%} | SPEED: {:.3f}s / iter | ETA: {} >'
.format(progress, timer.average_time, eta))
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Visualization utilities."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from seetadet.utils.vis.colormap import colormap
from seetadet.utils.vis.visualizer import Visualizer
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Colormap for really neat visualizations."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def colormap(rgb=False):
color_list = np.array([
0.000, 0.447, 0.741,
0.850, 0.325, 0.098,
0.929, 0.694, 0.125,
0.494, 0.184, 0.556,
0.466, 0.674, 0.188,
0.301, 0.745, 0.933,
0.635, 0.078, 0.184,
0.300, 0.300, 0.300,
0.600, 0.600, 0.600,
1.000, 0.000, 0.000,
1.000, 0.500, 0.000,
0.749, 0.749, 0.000,
0.000, 1.000, 0.000,
0.000, 0.000, 1.000,
0.667, 0.000, 1.000,
0.333, 0.333, 0.000,
0.333, 0.667, 0.000,
0.333, 1.000, 0.000,
0.667, 0.333, 0.000,
0.667, 0.667, 0.000,
0.667, 1.000, 0.000,
1.000, 0.333, 0.000,
1.000, 0.667, 0.000,
1.000, 1.000, 0.000,
0.000, 0.333, 0.500,
0.000, 0.667, 0.500,
0.000, 1.000, 0.500,
0.333, 0.000, 0.500,
0.333, 0.333, 0.500,
0.333, 0.667, 0.500,
0.333, 1.000, 0.500,
0.667, 0.000, 0.500,
0.667, 0.333, 0.500,
0.667, 0.667, 0.500,
0.667, 1.000, 0.500,
1.000, 0.000, 0.500,
1.000, 0.333, 0.500,
1.000, 0.667, 0.500,
1.000, 1.000, 0.500,
0.000, 0.333, 1.000,
0.000, 0.667, 1.000,
0.000, 1.000, 1.000,
0.333, 0.000, 1.000,
0.333, 0.333, 1.000,
0.333, 0.667, 1.000,
0.333, 1.000, 1.000,
0.667, 0.000, 1.000,
0.667, 0.333, 1.000,
0.667, 0.667, 1.000,
0.667, 1.000, 1.000,
1.000, 0.000, 1.000,
1.000, 0.333, 1.000,
1.000, 0.667, 1.000,
0.167, 0.000, 0.000,
0.333, 0.000, 0.000,
0.500, 0.000, 0.000,
0.667, 0.000, 0.000,
0.833, 0.000, 0.000,
1.000, 0.000, 0.000,
0.000, 0.167, 0.000,
0.000, 0.333, 0.000,
0.000, 0.500, 0.000,
0.000, 0.667, 0.000,
0.000, 0.833, 0.000,
0.000, 1.000, 0.000,
0.000, 0.000, 0.167,
0.000, 0.000, 0.333,
0.000, 0.000, 0.500,
0.000, 0.000, 0.667,
0.000, 0.000, 0.833,
0.000, 0.000, 1.000,
0.000, 0.000, 0.000,
0.143, 0.143, 0.143,
0.286, 0.286, 0.286,
0.429, 0.429, 0.429,
0.571, 0.571, 0.571,
0.714, 0.714, 0.714,
0.857, 0.857, 0.857,
1.000, 1.000, 1.000]).astype(np.float32)
color_list = color_list.reshape((-1, 3)) * 255
if not rgb:
color_list = color_list[:, ::-1]
return color_list
# ------------------------------------------------------------
# Copyright (c) Facebook, Inc. and its affiliates.
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# Codes are based on:
#
# <https://github.com/facebookresearch/detectron2/blob/main/detectron2/utils/visualizer.py>
#
# ------------------------------------------------------------
"""Visualizer."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
import matplotlib.backends.backend_agg
import matplotlib.colors
import matplotlib.figure
import matplotlib.patches
import matplotlib.pyplot
import numpy as np
from seetadet.utils.mask import mask_from
from seetadet.utils.mask import mask_to_polygons
from seetadet.utils.mask import paste_masks
from seetadet.utils.vis.colormap import colormap
_SMALL_OBJECT_AREA_THRESH = 1000
class VisImage(object):
"""VisImage."""
def __init__(self, img, scale=1.0):
self.img = img
self.scale = scale
self.shape = (h, w) = img.shape[:2]
self.font_size = max(np.sqrt(h * w) // 90, 10 // scale)
self._setup_figure(img)
def _setup_figure(self, img):
fig = matplotlib.figure.Figure(frameon=False)
self.dpi = fig.get_dpi()
fig.set_size_inches((self.shape[1] * self.scale + 1e-2) / self.dpi,
(self.shape[0] * self.scale + 1e-2) / self.dpi)
self.canvas = matplotlib.backends.backend_agg.FigureCanvasAgg(fig)
ax = fig.add_axes([0.0, 0.0, 1.0, 1.0])
ax.axis('off')
self.fig = fig
self.ax = ax
self.ax.imshow(img)
def save(self, filepath):
cv2.imwrite(filepath, self.get_image())
def get_image(self, rgb=False):
canvas = self.canvas
s, (width, height) = canvas.print_to_buffer()
buffer = np.frombuffer(s, dtype='uint8')
img_rgba = buffer.reshape(height, width, 4)
img_rgb, _ = np.split(img_rgba, [3], axis=2)
img_rgb = img_rgb.astype('uint8', copy=False)
return img_rgb if rgb else img_rgb[:, :, ::-1]
class Visualizer(object):
""""Visualizer."""
def __init__(self, class_names=None, score_thresh=0.7):
self.class_names = class_names
self.score_thresh = score_thresh
self.colormap = colormap(rgb=True) / 255.
self.output = None
def _convert_from_dict_format(self, objects):
boxes, masks, labels = [], [], []
for obj in objects:
score = obj.get('score', 1.0)
name = obj.get('class', 'object')
if score < self.score_thresh:
continue
boxes.append(list(obj['bbox']) + [score])
labels.append('{} {:.0f}%'.format(name, score * 100))
if 'segmentation' in obj:
masks.append(mask_from(obj['segmentation']['counts'].encode(),
obj['segmentation']['size']))
boxes = np.array(boxes, 'float32') if len(boxes) > 0 else boxes
masks = np.stack(masks) if len(masks) > 0 else masks
return boxes, masks, labels
def _convert_from_cls_format(self, cls_boxes=None, cls_masks=None):
boxes, masks, labels = [], [], []
for i, name in enumerate(self.class_names):
if name == '__background__':
continue
if cls_boxes is not None and len(cls_boxes[i]) > 0:
boxes.append(cls_boxes[i])
scores = cls_boxes[i][:, -1].tolist()
labels += ['{} {:.0f}%'.format(name, s * 100) for s in scores]
if cls_masks is not None and len(cls_masks[i]):
masks.append(cls_masks[i])
boxes = np.concatenate(boxes) if len(boxes) > 0 else boxes
masks = np.concatenate(masks) if len(masks) > 0 else masks
return boxes, masks, labels
def overlay_instances(self, boxes, masks, labels):
"""Overlay instances."""
if boxes is None or len(boxes) == 0:
return self.output
# Filter instances.
keep = np.where(boxes[:, -1] > self.score_thresh)[0]
if len(keep) == 0:
return self.output
boxes, labels = boxes[keep], [labels[i] for i in keep]
masks = masks[keep] if len(masks) > 0 else []
# Paste masks.
if len(masks) > 0 and masks.shape[-2:] != self.output.shape[:2]:
masks = paste_masks(masks, boxes, self.output.shape[:2],
channels_last=False)
# Display in largest to smallest order to reduce occlusion.
if boxes.shape[1] == 5:
areas = (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])
elif boxes.shape[1] == 6:
areas = boxes[:, 2] * boxes[:, 3]
else:
raise ValueError('Excepted box4d or box5d.')
keep = np.argsort(-areas)
boxes, labels = boxes[keep], [labels[i] for i in keep]
masks = masks[keep] if len(masks) > 0 else []
colors = self.colormap[np.arange(len(boxes)) % len(self.colormap)]
for i, box in enumerate(boxes):
if boxes.shape[1] == 5:
self.draw_box(box, edge_color=colors[i])
self.draw_box_label(box, labels[i])
if len(masks) > 0:
polygons = mask_to_polygons(masks[i])
for p in polygons:
self.draw_polygon(p.reshape((-1, 2)), color=colors[i])
return self.output
def draw_instances(self, img, boxes, masks):
"""Draw instances."""
self.output = VisImage(img[:, :, ::-1])
assert len(boxes) == len(self.class_names)
boxes, masks, labels = self._convert_from_cls_format(boxes, masks)
self.overlay_instances(boxes, masks, labels)
return self.output
def draw_objects(self, img, objects):
"""Draw objects."""
self.output = VisImage(img[:, :, ::-1])
boxes, masks, labels = self._convert_from_dict_format(objects)
self.overlay_instances(boxes, masks, labels)
return self.output
def draw_box(self, box, alpha=0.5, edge_color='g', line_style='-'):
"""Draw box."""
x0, y0, x1, y1 = box[:4]
width, height = x1 - x0, y1 - y0
line_width = max(self.output.font_size / 4, 1)
self.output.ax.add_patch(
matplotlib.patches.Rectangle(
(x0, y0),
width,
height,
fill=False,
edgecolor=edge_color,
linewidth=line_width * self.output.scale,
alpha=alpha,
linestyle=line_style))
return self.output
def draw_box_label(self, box, label):
"""Draw box label."""
x0, y0, x1, y1 = box[:4]
text_pos = (x0, y0)
instance_area = (y1 - y0) * (x1 - x0)
if (instance_area < _SMALL_OBJECT_AREA_THRESH * self.output.scale
or y1 - y0 < 40 * self.output.scale):
if y1 >= self.output.shape[0] - 5:
text_pos = (x1, y0)
else:
text_pos = (x0, y1)
height_ratio = (y1 - y0) / np.sqrt(self.output.shape[0] * self.output.shape[1])
font_size = (np.clip((height_ratio - 0.02) / 0.08 + 1, 1.2, 2)
* 0.5 * self.output.font_size)
self.draw_text(label, text_pos, font_size=font_size)
return self.output
def draw_text(
self,
text,
position,
font_size=None,
color='w',
horizontal_alignment='left',
rotation=0,
):
"""Draw text."""
if not font_size:
font_size = self.output.font_size
color = np.maximum(list(matplotlib.colors.to_rgb(color)), 0.2)
color[np.argmax(color)] = max(0.8, np.max(color))
x, y = position
self.output.ax.text(
x,
y,
text,
size=font_size * self.output.scale,
family='sans-serif',
bbox={'facecolor': 'black', 'alpha': 0.8,
'pad': 0, 'edgecolor': 'none'},
verticalalignment='top',
horizontalalignment=horizontal_alignment,
color=color,
zorder=10,
rotation=rotation)
return self.output
def draw_polygon(self, segment, color, edge_color=None, alpha=0.5):
"""Draw polygon."""
edge_color = edge_color or color
edge_color = matplotlib.colors.to_rgb(edge_color) + (1,)
polygon = matplotlib.patches.Polygon(
segment,
fill=True,
facecolor=matplotlib.colors.to_rgb(color) + (alpha,),
edgecolor=edge_color,
linewidth=max(self.output.font_size // 15 * self.output.scale, 1))
self.output.ax.add_patch(polygon)
return self.output
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
version = '0.1.0a0'
git_version = 'None'
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Python setup script."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import shutil
import subprocess
import sys
import setuptools
import setuptools.command.build_py
import setuptools.command.install
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser()
parser.add_argument('--version', default=None)
args, unknown = parser.parse_known_args()
sys.argv = [sys.argv[0]] + unknown
args.git_version = None
args.long_description = ''
if args.version is None and os.path.exists('version.txt'):
with open('version.txt', 'r') as f:
args.version = f.read().strip()
if os.path.exists('.git'):
try:
git_version = subprocess.check_output(
['git', 'rev-parse', 'HEAD'], cwd='./')
args.git_version = git_version.decode('ascii').strip()
except (OSError, subprocess.CalledProcessError):
pass
if os.path.exists('README.md'):
with open(os.path.join('README.md'), encoding='utf-8') as f:
args.long_description = f.read()
return args
def build_extensions(parallel=4):
"""Prepare the package files."""
# Compile cxx sources.
py_exec = sys.executable
if subprocess.call(
'cd csrc/cxx && '
'{} setup.py build_ext -b ../../ -f --no-python-abi-suffix=0 -j {} &&'
'{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
) > 0:
raise RuntimeError('Failed to build the cxx sources.')
# Compile pyx sources.
if subprocess.call(
'cd csrc/pyx && '
'{} setup.py build_ext -b ../../ -f --cython-c-in-temp -j {} &&'
'{} setup.py clean'.format(py_exec, parallel, py_exec), shell=True,
) > 0:
raise RuntimeError('Failed to build the pyx sources.')
def clean_builds():
"""Clean the builds."""
for path in ['build', 'seeta_det.egg-info']:
if os.path.exists(path):
shutil.rmtree(path)
def find_packages(top):
"""Return the python sources installed to package."""
packages = []
for root, _, _ in os.walk(top):
if os.path.exists(os.path.join(root, '__init__.py')):
packages.append(root)
return packages
def find_package_data(top):
"""Return the external data installed to package."""
headers, libraries = [], []
if sys.platform == 'win32':
dylib_suffix = '.pyd'
elif sys.platform == 'darwin':
dylib_suffix = '.dylib'
else:
dylib_suffix = '.so'
for root, _, files in os.walk(top):
root = root[len(top + '/'):]
for file in files:
if file.endswith(dylib_suffix):
libraries.append(os.path.join(root, file))
return headers + libraries
class BuildPyCommand(setuptools.command.build_py.build_py):
"""Enhanced 'build_py' command."""
def build_packages(self):
with open('seetadet/version.py', 'w') as f:
f.write("from __future__ import absolute_import\n"
"from __future__ import division\n"
"from __future__ import print_function\n\n"
"version = '{}'\n"
"git_version = '{}'\n".format(args.version, args.git_version))
super(BuildPyCommand, self).build_packages()
def build_package_data(self):
parallel = 4
for k in ('build', 'install'):
v = self.get_finalized_command(k).parallel
parallel = max(parallel, (int(v) if v else v) or 1)
build_extensions(parallel=parallel)
self.package_data = {'seetadet': find_package_data('seetadet')}
super(BuildPyCommand, self).build_package_data()
class InstallCommand(setuptools.command.install.install):
"""Enhanced 'install' command."""
user_options = setuptools.command.install.install.user_options
user_options += [('parallel=', 'j', "number of parallel build jobs")]
def initialize_options(self):
self.parallel = None
super(InstallCommand, self).initialize_options()
self.old_and_unmanageable = True
args = parse_args()
setuptools.setup(
name='seeta-det',
version=args.version,
description='SeetaDet: A platform implementing popular object detection algorithms.',
long_description=args.long_description,
long_description_content_type='text/markdown',
url='https://github.seetatech.com/seetaresearch/seetadet',
author='SeetaTech',
license='BSD 2-Clause',
packages=find_packages('seetadet'),
cmdclass={'build_py': BuildPyCommand, 'install': InstallCommand},
install_requires=['opencv-python',
'Pillow>=7.1',
'pyyaml',
'prettytable',
'matplotlib',
'codewithgpu',
'shapely',
'Cython',
'pycocotools>=2.0.2'],
classifiers=['Development Status :: 5 - Production/Stable',
'Intended Audience :: Developers',
'Intended Audience :: Education',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: BSD License',
'Programming Language :: C++',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3 :: Only',
'Topic :: Scientific/Engineering',
'Topic :: Scientific/Engineering :: Mathematics',
'Topic :: Scientific/Engineering :: Artificial Intelligence'],
)
clean_builds()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Train a detection network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import dragon
import numpy
from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator
from seetadet.core.engine import train_engine
from seetadet.data.build import build_dataset
from seetadet.utils import logging
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Train a detection network')
parser.add_argument(
'--cfg',
dest='cfg_file',
default=None,
help='config file')
parser.add_argument(
'--exp_dir',
default='',
help='experiment dir')
parser.add_argument(
'--tensorboard',
action='store_true',
help='write metrics to tensorboard or not')
return parser.parse_args()
if __name__ == '__main__':
args = parse_args()
coordinator = Coordinator(args.cfg_file, exp_dir=args.exp_dir)
checkpoint, start_iter = coordinator.get_checkpoint()
cfg.TRAIN.WEIGHTS = checkpoint or cfg.TRAIN.WEIGHTS
# Setup the distributed environment.
world_rank = dragon.distributed.get_rank()
world_size = dragon.distributed.get_world_size()
if cfg.NUM_GPUS != world_size:
raise ValueError(
'Excepted staring of {} processes, got {}.'
.format(cfg.NUM_GPUS, world_size))
# Setup the logging modules.
logging.set_root(world_rank == 0)
# Select the GPU depending on the rank of process.
cfg.GPU_ID = [i for i in range(cfg.NUM_GPUS)][world_rank]
# Fix the random seed for reproducibility.
numpy.random.seed(cfg.RNG_SEED + world_rank)
dragon.random.set_seed(cfg.RNG_SEED)
# Inspect the dataset.
dataset_size = build_dataset(cfg.TRAIN.DATASET).size
logging.info('Dataset({}): {} images will be used to train.'
.format(cfg.TRAIN.DATASET, dataset_size))
# Run training.
logging.info('Checkpoints will be saved to `{:s}`'
.format(coordinator.path_at('checkpoints')))
with dragon.distributed.new_group(
ranks=[i for i in range(cfg.NUM_GPUS)],
verbose=True).as_default():
train_engine.run_train(
coordinator, start_iter,
enable_tensorboard=args.tensorboard)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Export a detection network into the onnx model."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import dragon.vm.torch as torch
import numpy as np
from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator
from seetadet.models.build import build_detector
from seetadet.ops import onnx as _ # noqa
from seetadet.utils import logging
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Export a detection network into the onnx model')
parser.add_argument(
'--cfg',
dest='cfg_file',
default=None,
help='config file')
parser.add_argument(
'--exp_dir',
default='',
help='experiment dir')
parser.add_argument(
'--model_dir',
default='',
help='model dir')
parser.add_argument(
'--gpu',
type=int,
default=0,
help='index of GPU to use')
parser.add_argument(
'--iter',
type=int,
default=None,
help='checkpoint of given step')
parser.add_argument(
'--input_shape',
nargs='+',
type=int,
default=(1, 512, 512, 3),
help='input image shape')
parser.add_argument(
'--opset',
type=int,
default=None,
help='opset version to export')
parser.add_argument(
'--check_model',
type=bool,
default=True,
help='check the model validation or not')
return parser.parse_args()
def find_weights(args, coordinator):
"""Return the weights for exporting."""
weights_list = []
if args.model_dir:
for file in os.listdir(args.model_dir):
if not file.endswith('.pkl'):
continue
weights_list.append(os.path.join(args.model_dir, file))
else:
checkpoint, _ = coordinator.get_checkpoint(args.iter)
weights_list.append(checkpoint)
return weights_list[0]
def get_dummay_inputs(args):
n, h, w, c = args.input_shape
im_batch = torch.zeros(n, h, w, c, dtype='uint8')
im_info = torch.tensor([[h, w, 1., 1.] for _ in range(n)], dtype='float32')
strides = [2 ** x for x in range(cfg.FPN.MIN_LEVEL, cfg.FPN.MAX_LEVEL + 1)]
strides = np.array(strides)[:, None]
grid_shapes = np.stack([[h, w]] * len(strides))
grid_shapes = (grid_shapes - 1) // strides + 1
grid_info = torch.tensor(grid_shapes, dtype='int64')
return {'img': im_batch, 'im_info': im_info, 'grid_info': grid_info}
if __name__ == '__main__':
args = parse_args()
logging.info('Called with args:\n' + str(args))
coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
logging.info('Using config:\n' + str(cfg))
# Run exporting.
weights = find_weights(args, coordinator)
weights_name = os.path.splitext(os.path.basename(weights))[0]
output_dir = args.model_dir or coordinator.path_at('exports')
logging.info('Exports will be saved to ' + output_dir)
detector = build_detector(args.gpu, weights)
inputs = get_dummay_inputs(args)
torch.onnx.export(
model=detector,
args=inputs,
f=os.path.join(output_dir, weights_name + '.onnx'),
verbose=True,
opset_version=args.opset,
enable_onnx_checker=args.check_model,
)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Serve a detection network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import collections
import os
import multiprocessing as mp
import time
import codewithgpu
import numpy as np
from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator
from seetadet.core.engine import test_engine
from seetadet.utils import logging
from seetadet.utils import profiler
from seetadet.utils.mask import encode_masks
from seetadet.utils.mask import paste_masks
from seetadet.utils.vis import Visualizer
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Serve a detection network')
parser.add_argument(
'--cfg',
dest='cfg_file',
default=None,
help='config file')
parser.add_argument(
'--exp_dir',
default='',
help='experiment dir')
parser.add_argument(
'--iter',
type=int,
default=None,
help='iteration of checkpoint')
parser.add_argument(
'--model_dir',
default='',
help='model dir')
parser.add_argument(
'--score_thresh',
type=float,
default=0.7,
help='score threshold for inference')
parser.add_argument(
'--batch_timeout',
type=float,
default=1,
help='timeout to wait for a batch')
parser.add_argument(
'--queue_size',
type=int,
default=512,
help='size of the memory queue')
parser.add_argument(
'--gpu',
nargs='+',
type=int,
default=None,
help='index of GPUs to use')
parser.add_argument(
'--deterministic',
action='store_true',
help='set cudnn deterministic or not')
parser.add_argument(
'--app',
default='gradio',
help='application framework')
parser.add_argument(
'--processes',
type=int,
default=1,
help='number of flask processes')
parser.add_argument(
'--port',
type=int,
default=5050,
help='listening port')
return parser.parse_args()
class ServingCommand(codewithgpu.ServingCommand):
"""Command to run serving."""
def __init__(self, output_queue, score_thresh=0.7, perf_every=100):
super(ServingCommand, self).__init__(app_library='flask')
self.output_queue = output_queue
self.output_dict = mp.Manager().dict()
self.score_thresh = score_thresh
self.perf_every = perf_every
self.classes = cfg.MODEL.CLASSES
self.max_dets = cfg.TEST.DETECTIONS_PER_IM
def make_objects(self, outputs):
"""Main the detection objects."""
boxes = outputs.pop('boxes')
masks = outputs.pop('masks', None)
objects = []
for j, name in enumerate(self.classes):
if name == '__background__':
continue
inds = np.where(boxes[j][:, 4] > self.score_thresh)[0]
if len(inds) == 0:
continue
for box in boxes[j][inds]:
objects.append({'bbox': box[:4].astype(float).tolist(),
'score': float(box[4]), 'class': name})
if masks is not None:
rles = encode_masks(paste_masks(
masks[j][inds], boxes[j][inds], outputs['im_shape'][:2]))
for i, rle in enumerate(rles[::-1]):
objects[-(i + 1)]['segmentation'] = rle
return objects
def run(self):
"""Main loop to make the serving outputs."""
count, timers = 0, collections.defaultdict(profiler.Timer)
while True:
count += 1
img_id, time_diffs, outputs = self.output_queue.get()
outputs = test_engine.filter_outputs(outputs, self.max_dets)
for name, diff in time_diffs.items():
timers[name].add_diff(diff)
self.output_dict[img_id] = self.make_objects(outputs)
if count % self.perf_every == 0:
logging.info('im_detect: {:d} [{:.3f}s + {:.3f}s]'
.format(count, timers['im_detect'].average_time,
timers['misc'].average_time))
def find_weights(args, coordinator):
"""Return the weights for serving."""
weights_list = []
if args.model_dir:
for file in os.listdir(args.model_dir):
if file.endswith('.pkl'):
weights_list.append(os.path.join(args.model_dir, file))
else:
checkpoint, _ = coordinator.get_checkpoint(args.iter)
weights_list.append(checkpoint)
return weights_list[0]
def build_flask_app(queues, command):
"""Build the flask application."""
import flask
app = flask.Flask('seetadet.serve')
logging._logging.getLogger('werkzeug').setLevel('ERROR')
debug_objects = os.environ.get('FLASK_DEBUG', False)
@app.route("/upload", methods=['POST'])
def upload():
img_id, img = command.get_image()
queues[img_id % len(queues)].put((img_id, img))
return flask.jsonify({'image_id': img_id})
@app.route("/get", methods=['POST'])
def get():
def try_get(retry_time=0.005):
try:
req = flask.request.get_json(force=True)
img_id = req['image_id']
except KeyError:
err_msg, img_id = 'Not found "image_id" in data.', ''
flask.abort(flask.Response(err_msg))
while img_id not in command.output_dict:
time.sleep(retry_time)
return img_id, command.output_dict.pop(img_id)
img_id, objects = try_get(retry_time=0.005)
msg = 'ImageId = %d, #Detects = %d' % (img_id, len(objects))
if debug_objects:
msg += (('\n * ' if len(objects) > 0 else '') +
('\n * '.join(str(obj) for obj in objects)))
logging.info(msg)
return flask.jsonify({'objects': objects})
return app
def build_gradio_app(queues, command):
"""Build the gradio application."""
import cv2
import gradio
visualizer = Visualizer(class_names=command.classes, score_thresh=0.0)
def upload_and_get(img_path):
with command.example_id.get_lock():
command.example_id.value += 1
img_id = command.example_id.value
img = cv2.imread(img_path)
queues[img_id % len(queues)].put((img_id, img))
while img_id not in command.output_dict:
time.sleep(0.005)
objects = command.output_dict.pop(img_id)
logging.info('ImageId = %d, #Detects = %d' % (img_id, len(objects)))
vis_img = visualizer.draw_objects(img, objects).get_image(rgb=True)
objects_list = [(i, obj['class'], round(obj['score'], 3),
str(np.round(obj['bbox'], 2).tolist()))
for i, obj in enumerate(objects)]
return vis_img, objects_list
app = gradio.Interface(
fn=upload_and_get,
inputs=gradio.Image(type='filepath', label='Image', show_label=False),
outputs=[gradio.Image(label='Visualization'),
gradio.Dataframe(headers=['Id', 'Category', 'Score', 'BBox'],
label='Objects')],
examples=['../data/images/' + x for x in os.listdir('../data/images')],
css=".h-60 {height: auto}", allow_flagging='never')
app.temp_dirs.add('../data/images')
return app
if __name__ == '__main__':
logging.set_formatter("%(asctime)s %(levelname)s %(message)s")
args = parse_args()
logging.info('Called with args:\n' + str(args))
coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
logging.info('Using config:\n' + str(cfg))
# Build actors.
weights = find_weights(args, coordinator)
devices = args.gpu if args.gpu else [cfg.GPU_ID]
num_devices = len(devices)
queues = [mp.Queue(args.queue_size) for _ in range(num_devices + 1)]
commands = [test_engine.InferenceCommand(
queues[i], queues[-1], kwargs={
'cfg': cfg,
'device': devices[i],
'weights': weights,
'deterministic': args.deterministic,
'batch_timeout': args.batch_timeout,
'verbose': i == 0,
}) for i in range(num_devices)]
commands += [ServingCommand(queues[-1])]
actors = [mp.Process(target=command.run) for command in commands]
for actor in actors:
actor.start()
# Build app.
if args.app == 'flask':
app = build_flask_app(queues[:-1], commands[-1])
app.run(port=args.port, threaded=args.processes == 1,
processes=args.processes)
elif args.app == 'gradio':
app = build_gradio_app(queues[:-1], commands[-1])
app.queue(concurrency_count=args.processes)
app.launch(server_port=args.port)
else:
raise ValueError('Unsupported application framework: ' + args.app)
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Test a detection network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import multiprocessing
import os
from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator
from seetadet.core.engine import test_engine
from seetadet.data.build import build_dataset
from seetadet.utils import logging
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Test a detection network')
parser.add_argument(
'--cfg',
dest='cfg_file',
default=None,
help='config file')
parser.add_argument(
'--exp_dir',
default='',
help='experiment dir')
parser.add_argument(
'--model_dir',
default='',
help='model dir')
parser.add_argument(
'--gpu',
nargs='+',
type=int,
default=None,
help='index of GPUs to use')
parser.add_argument(
'--iter',
nargs='+',
type=int,
default=None,
help='iteration step of checkpoints')
parser.add_argument(
'--last',
type=int,
default=1,
help='last N checkpoints')
parser.add_argument(
'--read_every',
type=int,
default=100,
help='read every-n images for testing')
parser.add_argument(
'--vis',
type=float,
default=0,
help='score threshold for visualization')
parser.add_argument(
'--precision',
default='',
help='compute precision')
parser.add_argument(
'--deterministic',
action='store_true',
help='set cudnn deterministic or not')
return parser.parse_args()
def find_weights(args, coordinator):
"""Return the weights for testing."""
weights_list = []
if args.model_dir:
for file in os.listdir(args.model_dir):
if not file.endswith('.pkl'):
continue
weights_list.append(os.path.join(args.model_dir, file))
return weights_list
if args.iter is not None:
for iter in args.iter:
checkpoint, _ = coordinator.get_checkpoint(iter, wait=True)
weights_list.append(checkpoint)
return weights_list
for i in range(1, args.last + 1):
checkpoint, _ = coordinator.get_checkpoint(last_idx=i)
if checkpoint is None:
break
weights_list.append(checkpoint)
return weights_list
if __name__ == '__main__':
args = parse_args()
logging.info('Called with args:\n' + str(args))
coordinator = Coordinator(args.cfg_file, args.exp_dir or args.model_dir)
cfg.MODEL.PRECISION = args.precision or cfg.MODEL.PRECISION
logging.info('Using config:\n' + str(cfg))
# Inspect dataset.
dataset_size = build_dataset(cfg.TEST.DATASET).size
logging.info('Dataset({}): {} images will be used to test.'
.format(cfg.TEST.DATASET, dataset_size))
# Run testing.
for weights in find_weights(args, coordinator):
weights_name = os.path.splitext(os.path.basename(weights))[0]
output_dir = coordinator.path_at('results/' + weights_name)
logging.info('Results will be saved to ' + output_dir)
vis_output_dir = None
if args.vis > 0:
vis_output_dir = coordinator.path_at('visualizations/' + weights_name)
logging.info('Visualizations will be saved to ' + vis_output_dir)
process = multiprocessing.Process(
target=test_engine.run_test,
kwargs={'test_cfg': cfg,
'weights': weights,
'output_dir': output_dir,
'devices': args.gpu,
'deterministic': args.deterministic,
'read_every': args.read_every,
'vis_thresh': args.vis,
'vis_output_dir': vis_output_dir})
process.start()
process.join()
# ------------------------------------------------------------
# Copyright (c) 2017-present, SeetaTech, Co.,Ltd.
#
# Licensed under the BSD 2-Clause License.
# You should have received a copy of the BSD 2-Clause License
# along with the software. If not, See,
#
# <https://opensource.org/licenses/BSD-2-Clause>
#
# ------------------------------------------------------------
"""Train a detection network."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
import os
import sys
import dragon
import numpy
from seetadet.core.config import cfg
from seetadet.core.coordinator import Coordinator
from seetadet.core.engine import train_engine
from seetadet.data.build import build_dataset
from seetadet.utils import logging
def parse_args():
"""Parse arguments."""
parser = argparse.ArgumentParser(
description='Train a detection network')
parser.add_argument(
'--cfg',
dest='cfg_file',
default=None,
help='config file')
parser.add_argument(
'--exp_dir',
default=None,
help='experiment dir')
parser.add_argument(
'--tensorboard',
action='store_true',
help='write metrics to tensorboard or not')
return parser.parse_args()
def run_distributed(args, coordinator):
"""Run distributed training."""
import subprocess
cmd = 'mpirun --allow-run-as-root -n {} --bind-to none '.format(cfg.NUM_GPUS)
cmd += '{} {}'.format(sys.executable, 'distributed/train.py')
cmd += ' --cfg {}'.format(os.path.abspath(args.cfg_file))
cmd += ' --exp_dir {}'.format(coordinator.exp_dir)
cmd += ' --tensorboard' if args.tensorboard else ''
return subprocess.call(cmd, shell=True)
if __name__ == '__main__':
args = parse_args()
logging.info('Called with args:\n' + str(args))
coordinator = Coordinator(args.cfg_file, args.exp_dir)
checkpoint, start_iter = coordinator.get_checkpoint()
cfg.TRAIN.WEIGHTS = checkpoint or cfg.TRAIN.WEIGHTS
logging.info('Using config:\n' + str(cfg))
if cfg.NUM_GPUS > 1:
# Run a distributed task.
run_distributed(args, coordinator)
else:
# Fix the random seed for reproducibility.
numpy.random.seed(cfg.RNG_SEED)
dragon.random.set_seed(cfg.RNG_SEED)
# Inspect the dataset.
dataset_size = build_dataset(cfg.TRAIN.DATASET).size
logging.info('Dataset({}): {} images will be used to train.'
.format(cfg.TRAIN.DATASET, dataset_size))
# Run training.
logging.info('Checkpoints will be saved to `{:s}`'
.format(coordinator.path_at('checkpoints')))
train_engine.run_train(coordinator, start_iter,
enable_tensorboard=args.tensorboard)
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!