Device: single 2080ti with CUDA 10.2 and python3.8
pip install torch==1.10.1+cu102 torchvision==0.11.2+cu102 torchaudio==0.10.1 -f
python install
# python develop
pip install mmcv-full==1.4.0 -f
Following Swin Transformer for Object Detection, we use apex for mixed precision training by default. To install apex, run:
git clone
cd apex
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
goto your environment folder (eg. venv)
modify venv/lib/python3.8/site-packages/apex/amp/ line 97
- if cached_x.grad_fn.next_functions[1][0].variable is not x:
+ if cached_x.grad_fn.next_functions[0][0].variable is not x:
raise RuntimeError("x and cache[x] both require grad, but x is not "
"cache[x]'s parent. This is likely an error.")
put pretrained pth model in ckpt folder (AI-CUP/ckpt)
unzip competition models and configs folder and name it work_dirs (AI-CUP/work_dirs)
competition models
competition segmentation config
competition detection config
put data and annotations in data folder (AI-CUP/data)
annotations contain detection and segmentation
`-- OBJ_Train_Datasets
|-- Test_Images
| `-- test_images...
|-- Train_Annotations
| |-- json_segmentation_annotations...
| `-- xml_detection_annotations...
`-- Train_Images
`-- train_images...
then run python
coco and custom folder should appear in the directory
`-- OBJ_Train_Datasets
|-- same_with_above...
|-- coco
| |-- STAS_final.json
| |-- STAS_test.json
| |-- STAS_train.json
| `-- STAS_val.json
`-- custom
|-- STAS_final.pkl
|-- STAS_test.pkl
|-- STAS_train.pkl
`-- STAS_val.pkl
'images': [
'file_name': '00000395.jpg',
'height': 942,
'width': 1716,
'id': 00000395
'annotations': [
'segmentation': [[192.81,
'area': 1035.749,
'iscrowd': 0,
'image_id': 00000395,
'bbox': [192.81, 224.8, 74.73, 33.43],
'category_id': 0,
'id': 5555
'categories': [
{'id': 0, 'name': 'stas'},
'filename': '00000395.jpg',
'width': 1716,
'height': 942,
'ann': {
'bboxes': <np.ndarray, float32> (n, 4),
'labels': <np.ndarray, int64> (n, )
Please only use a single GPU for train
# First, train on the semantic segmentation annotations
# The original pretrained model must be placed in the ckpt folder first
python -m torch.distributed.launch tools/
--gpus 1 --deterministic --seed 123
--work-dir work_dirs/swin_coco
# Second, use the model trained from segmentation fintune on the object detection annotations
# You need to complete the previous training or download the competition model
python -m torch.distributed.launch tools/
--gpus 1 --deterministic --seed 123
--work-dir work_dirs/swin_custom_fine
Please only use a single GPU for inference
# You need to complete all previous training or download the competition model
python tools/
--out result.json
--show --show-dir ckpt
Original CBNet: See CBNet: A Novel Composite Backbone Network Architecture for Object Detection.
Origin CBNetV2 Github: See VDIGPKU CBNetV2
If you use our code/model, please consider to cite our paper CBNetV2: A Novel Composite Backbone Network Architecture for Object Detection.
title={CBNetV2: A Composite Backbone Network Architecture for Object Detection},
author={Tingting Liang and Xiaojie Chu and Yudong Liu and Yongtao Wang and Zhi Tang and Wei Chu and Jingdong Chen and Haibing Ling},
journal={arXiv preprint arXiv:2107.00420},