-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Code of CVPR 2019 Paper: Region Proposal by Guided Anchoring (#594)
* add two stage w/o neck and w/ upperneck * add rpn r50 c4 * update c4 configs * fix * config update * update config * minor update * mask rcnn support c4 train and test * lr fix * cascade support upper_neck * add cascade c4 config * update config * update * update res_layer to new interface * refactoring * c4 configs update * refactoring * update rpn_c4 config * rename upper_neck as shared_head * update * update configs * update * update c4 configs * update according to commits * update * add ga rpn * test bug fix * test bug fix with loc_filter_thr is large * update configs * update configs * add ga_retinanet * ga test bug fix * update configs * update * init masked conv * update * update masked conv * update * support no ga_sampler * update * update * test with masked_conv * update comment * fix flake errors * fix flake 8 errors * refactor bounded iou loss * refactor ga_retina_head * update configs * refactor masked conv * fix flake8 error * refactor guided_anchor_head and ga_rpn_head * update configs * use_sigmoid_cls -> cls_sigmoid_loss; use_focal_loss -> cls_focal_loss * refactoring * cls_sigmoid_loss -> use_sigmoid_cls * fix flake8 error * add some docs * rename normalize to norm_cfg * update configs * add readme * update ga_faster config * update readme * update readme * rename configs as r50_caffe * merge master * refactor guided anchor target * update readme * update approx mas iou assigner * refactor guided anchor target * update docstring * refactor ga heads * fix flake8 error * update readme * update model url * update comments * refactor get anchors * update docstring * not use_loc_filter during training * add R-101 results * update to support build loss api * fix flake8 error * update readme with x-101 performances * update readme * add a link in project readme * refactor code about ga shape inside flags * update * update * add x101 config files * add ga_rpn r101 config * update some comments * add comments * add comments * update comments * fix flake8 error
- Loading branch information
1 parent
3cb84ac
commit 89022a2
Showing
32 changed files
with
2,983 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Region Proposal by Guided Anchoring | ||
|
||
## Introduction | ||
|
||
We provide config files to reproduce the results in the CVPR 2019 paper for [Region Proposal by Guided Anchoring](https://arxiv.org/abs/1901.03278). | ||
|
||
``` | ||
@inproceedings{wang2019region, | ||
title={Region Proposal by Guided Anchoring}, | ||
author={Jiaqi Wang and Kai Chen and Shuo Yang and Chen Change Loy and Dahua Lin}, | ||
booktitle={IEEE Conference on Computer Vision and Pattern Recognition}, | ||
year={2019} | ||
} | ||
``` | ||
|
||
## Results and Models | ||
|
||
The results on COCO 2017 val is shown in the below table. (results on test-dev are usually slightly higher than val). | ||
|
||
| Method | Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | AR 1000 | Download | | ||
| :----: | :-------------: | :-----: | :-----: | :------: | :-----------------: | :------------: | :-----: | :-------------------------------------------------------------------------------------------------------------------------------------------: | | ||
| GA-RPN | R-50-FPN | caffe | 1x | 5.0 | 0.55 | 13.3 | 68.5 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/guided_anchoring/ga_rpn_r50_caffe_fpn_1x_20190513-95e91886.pth) | | ||
| GA-RPN | R-101-FPN | caffe | 1x | - | - | - | 69.6 | - | | ||
| GA-RPN | X-101-32x4d-FPN | pytorch | 1x | - | - | - | 70.0 | - | | ||
| GA-RPN | X-101-64x4d-FPN | pytorch | 1x | - | - | - | 70.5 | - | | ||
|
||
|
||
| Method | Backbone | Style | Lr schd | Mem (GB) | Train time (s/iter) | Inf time (fps) | box AP | Download | | ||
| :------------: | :-------------: | :-----: | :-----: | :------: | :-----------------: | :------------: | :----: | :-------------------------------------------------------------------------------------------------------------------------------------------------: | | ||
| GA-Fast RCNN | R-50-FPN | caffe | 1x | 3.3 | 0.23 | 14.9 | 39.5 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/guided_anchoring/ga_fast_r50_caffe_fpn_1x_20190513-c5af9f8b.pth) | | ||
| GA-Faster RCNN | R-50-FPN | caffe | 1x | 5.1 | 0.64 | 9.6 | 39.9 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/guided_anchoring/ga_faster_r50_caffe_fpn_1x_20190513-a52b31fa.pth) | | ||
| GA-Faster RCNN | R-101-FPN | caffe | 1x | - | - | - | 41.5 | - | | ||
| GA-Faster RCNN | X-101-32x4d-FPN | pytorch | 1x | - | - | - | 42.9 | - | | ||
| GA-Faster RCNN | X-101-64x4d-FPN | pytorch | 1x | - | - | - | 43.9 | - | | ||
| GA-RetinaNet | R-50-FPN | caffe | 1x | 3.2 | 0.50 | 10.7 | 37.0 | [model](https://s3.ap-northeast-2.amazonaws.com/open-mmlab/mmdetection/models/guided_anchoring/ga_retinanet_r50_caffe_fpn_1x_20190513-29905101.pth) | | ||
| GA-RetinaNet | R-101-FPN | caffe | 1x | - | - | - | 38.9 | - | | ||
| GA-RetinaNet | X-101-32x4d-FPN | pytorch | 1x | - | - | - | 40.3 | - | | ||
| GA-RetinaNet | X-101-64x4d-FPN | pytorch | 1x | - | - | - | 40.8 | - | | ||
|
||
|
||
|
||
- In the Guided Anchoring paper, `score_thr` is set to 0.001 in Fast/Faster RCNN and 0.05 in RetinaNet for both baselines and Guided Anchoring. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
# model settings | ||
model = dict( | ||
type='FastRCNN', | ||
pretrained='open-mmlab://resnet50_caffe', | ||
backbone=dict( | ||
type='ResNet', | ||
depth=50, | ||
num_stages=4, | ||
out_indices=(0, 1, 2, 3), | ||
frozen_stages=1, | ||
norm_cfg=dict(type='BN', requires_grad=False), | ||
norm_eval=True, | ||
style='caffe'), | ||
neck=dict( | ||
type='FPN', | ||
in_channels=[256, 512, 1024, 2048], | ||
out_channels=256, | ||
num_outs=5), | ||
bbox_roi_extractor=dict( | ||
type='SingleRoIExtractor', | ||
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), | ||
out_channels=256, | ||
featmap_strides=[4, 8, 16, 32]), | ||
bbox_head=dict( | ||
type='SharedFCBBoxHead', | ||
num_fcs=2, | ||
in_channels=256, | ||
fc_out_channels=1024, | ||
roi_feat_size=7, | ||
num_classes=81, | ||
target_means=[0., 0., 0., 0.], | ||
target_stds=[0.05, 0.05, 0.1, 0.1], | ||
reg_class_agnostic=False, | ||
loss_cls=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), | ||
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))) | ||
# model training and testing settings | ||
train_cfg = dict( | ||
rcnn=dict( | ||
assigner=dict( | ||
type='MaxIoUAssigner', | ||
pos_iou_thr=0.6, | ||
neg_iou_thr=0.6, | ||
min_pos_iou=0.6, | ||
ignore_iof_thr=-1), | ||
sampler=dict( | ||
type='RandomSampler', | ||
num=256, | ||
pos_fraction=0.25, | ||
neg_pos_ub=-1, | ||
add_gt_as_proposals=True), | ||
pos_weight=-1, | ||
debug=False)) | ||
test_cfg = dict( | ||
rcnn=dict( | ||
score_thr=1e-3, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)) | ||
# dataset settings | ||
dataset_type = 'CocoDataset' | ||
data_root = 'data/coco/' | ||
img_norm_cfg = dict( | ||
mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) | ||
data = dict( | ||
imgs_per_gpu=2, | ||
workers_per_gpu=2, | ||
train=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_train2017.json', | ||
img_prefix=data_root + 'train2017/', | ||
img_scale=(1333, 800), | ||
img_norm_cfg=img_norm_cfg, | ||
size_divisor=32, | ||
num_max_proposals=300, | ||
proposal_file=data_root + 'proposals/ga_rpn_r50_fpn_1x_train2017.pkl', | ||
flip_ratio=0.5, | ||
with_mask=False, | ||
with_crowd=True, | ||
with_label=True), | ||
val=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
img_scale=(1333, 800), | ||
img_norm_cfg=img_norm_cfg, | ||
size_divisor=32, | ||
num_max_proposals=300, | ||
proposal_file=data_root + 'proposals/ga_rpn_r50_fpn_1x_val2017.pkl', | ||
flip_ratio=0, | ||
with_mask=False, | ||
with_crowd=True, | ||
with_label=True), | ||
test=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
img_scale=(1333, 800), | ||
img_norm_cfg=img_norm_cfg, | ||
size_divisor=32, | ||
num_max_proposals=300, | ||
proposal_file=data_root + 'proposals/ga_rpn_r50_fpn_1x_val2017.pkl', | ||
flip_ratio=0, | ||
with_mask=False, | ||
with_label=False, | ||
test_mode=True)) | ||
# optimizer | ||
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) | ||
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) | ||
# learning policy | ||
lr_config = dict( | ||
policy='step', | ||
warmup='linear', | ||
warmup_iters=500, | ||
warmup_ratio=1.0 / 3, | ||
step=[8, 11]) | ||
checkpoint_config = dict(interval=1) | ||
# yapf:disable | ||
log_config = dict( | ||
interval=50, | ||
hooks=[ | ||
dict(type='TextLoggerHook'), | ||
# dict(type='TensorboardLoggerHook') | ||
]) | ||
# yapf:enable | ||
# runtime settings | ||
total_epochs = 12 | ||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
work_dir = './work_dirs/ga_fast_rcnn_r50_caffe_fpn_1x' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,193 @@ | ||
# model settings | ||
model = dict( | ||
type='FasterRCNN', | ||
pretrained='open-mmlab://resnet50_caffe', | ||
backbone=dict( | ||
type='ResNet', | ||
depth=50, | ||
num_stages=4, | ||
out_indices=(0, 1, 2, 3), | ||
frozen_stages=1, | ||
norm_cfg=dict(type='BN', requires_grad=False), | ||
norm_eval=True, | ||
style='caffe'), | ||
neck=dict( | ||
type='FPN', | ||
in_channels=[256, 512, 1024, 2048], | ||
out_channels=256, | ||
num_outs=5), | ||
rpn_head=dict( | ||
type='GARPNHead', | ||
in_channels=256, | ||
feat_channels=256, | ||
octave_base_scale=8, | ||
scales_per_octave=3, | ||
octave_ratios=[0.5, 1.0, 2.0], | ||
anchor_strides=[4, 8, 16, 32, 64], | ||
anchor_base_sizes=None, | ||
anchoring_means=[.0, .0, .0, .0], | ||
anchoring_stds=[0.07, 0.07, 0.14, 0.14], | ||
target_means=(.0, .0, .0, .0), | ||
target_stds=[0.07, 0.07, 0.11, 0.11], | ||
loc_filter_thr=0.01, | ||
loss_loc=dict( | ||
type='FocalLoss', | ||
use_sigmoid=True, | ||
gamma=2.0, | ||
alpha=0.25, | ||
loss_weight=1.0), | ||
loss_shape=dict( | ||
type='IoULoss', style='bounded', beta=0.2, loss_weight=1.0), | ||
loss_cls=dict( | ||
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0), | ||
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0)), | ||
bbox_roi_extractor=dict( | ||
type='SingleRoIExtractor', | ||
roi_layer=dict(type='RoIAlign', out_size=7, sample_num=2), | ||
out_channels=256, | ||
featmap_strides=[4, 8, 16, 32]), | ||
bbox_head=dict( | ||
type='SharedFCBBoxHead', | ||
num_fcs=2, | ||
in_channels=256, | ||
fc_out_channels=1024, | ||
roi_feat_size=7, | ||
num_classes=81, | ||
target_means=[0., 0., 0., 0.], | ||
target_stds=[0.05, 0.05, 0.1, 0.1], | ||
reg_class_agnostic=False, | ||
loss_cls=dict( | ||
type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), | ||
loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))) | ||
# model training and testing settings | ||
train_cfg = dict( | ||
rpn=dict( | ||
ga_assigner=dict( | ||
type='ApproxMaxIoUAssigner', | ||
pos_iou_thr=0.7, | ||
neg_iou_thr=0.3, | ||
min_pos_iou=0.3, | ||
ignore_iof_thr=-1), | ||
ga_sampler=dict( | ||
type='RandomSampler', | ||
num=256, | ||
pos_fraction=0.5, | ||
neg_pos_ub=-1, | ||
add_gt_as_proposals=False), | ||
assigner=dict( | ||
type='MaxIoUAssigner', | ||
pos_iou_thr=0.7, | ||
neg_iou_thr=0.3, | ||
min_pos_iou=0.3, | ||
ignore_iof_thr=-1), | ||
sampler=dict( | ||
type='RandomSampler', | ||
num=256, | ||
pos_fraction=0.5, | ||
neg_pos_ub=-1, | ||
add_gt_as_proposals=False), | ||
allowed_border=-1, | ||
pos_weight=-1, | ||
center_ratio=0.2, | ||
ignore_ratio=0.5, | ||
debug=False), | ||
rpn_proposal=dict( | ||
nms_across_levels=False, | ||
nms_pre=2000, | ||
nms_post=2000, | ||
max_num=300, | ||
nms_thr=0.7, | ||
min_bbox_size=0), | ||
rcnn=dict( | ||
assigner=dict( | ||
type='MaxIoUAssigner', | ||
pos_iou_thr=0.6, | ||
neg_iou_thr=0.6, | ||
min_pos_iou=0.6, | ||
ignore_iof_thr=-1), | ||
sampler=dict( | ||
type='RandomSampler', | ||
num=256, | ||
pos_fraction=0.25, | ||
neg_pos_ub=-1, | ||
add_gt_as_proposals=True), | ||
pos_weight=-1, | ||
debug=False)) | ||
test_cfg = dict( | ||
rpn=dict( | ||
nms_across_levels=False, | ||
nms_pre=1000, | ||
nms_post=1000, | ||
max_num=300, | ||
nms_thr=0.7, | ||
min_bbox_size=0), | ||
rcnn=dict( | ||
score_thr=1e-3, nms=dict(type='nms', iou_thr=0.5), max_per_img=100)) | ||
# dataset settings | ||
dataset_type = 'CocoDataset' | ||
data_root = 'data/coco/' | ||
img_norm_cfg = dict( | ||
mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False) | ||
data = dict( | ||
imgs_per_gpu=2, | ||
workers_per_gpu=2, | ||
train=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_train2017.json', | ||
img_prefix=data_root + 'train2017/', | ||
img_scale=(1333, 800), | ||
img_norm_cfg=img_norm_cfg, | ||
size_divisor=32, | ||
flip_ratio=0.5, | ||
with_mask=False, | ||
with_crowd=True, | ||
with_label=True), | ||
val=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
img_scale=(1333, 800), | ||
img_norm_cfg=img_norm_cfg, | ||
size_divisor=32, | ||
flip_ratio=0, | ||
with_mask=False, | ||
with_crowd=True, | ||
with_label=True), | ||
test=dict( | ||
type=dataset_type, | ||
ann_file=data_root + 'annotations/instances_val2017.json', | ||
img_prefix=data_root + 'val2017/', | ||
img_scale=(1333, 800), | ||
img_norm_cfg=img_norm_cfg, | ||
size_divisor=32, | ||
flip_ratio=0, | ||
with_mask=False, | ||
with_label=False, | ||
test_mode=True)) | ||
# optimizer | ||
optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) | ||
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) | ||
# learning policy | ||
lr_config = dict( | ||
policy='step', | ||
warmup='linear', | ||
warmup_iters=500, | ||
warmup_ratio=1.0 / 3, | ||
step=[8, 11]) | ||
checkpoint_config = dict(interval=1) | ||
# yapf:disable | ||
log_config = dict( | ||
interval=50, | ||
hooks=[ | ||
dict(type='TextLoggerHook'), | ||
# dict(type='TensorboardLoggerHook') | ||
]) | ||
# yapf:enable | ||
# runtime settings | ||
total_epochs = 12 | ||
dist_params = dict(backend='nccl') | ||
log_level = 'INFO' | ||
work_dir = './work_dirs/ga_faster_rcnn_r50_caffe_fpn_1x' | ||
load_from = None | ||
resume_from = None | ||
workflow = [('train', 1)] |
Oops, something went wrong.