Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train error #46

Open
nuist-xinyu opened this issue May 30, 2019 · 56 comments
Open

train error #46

nuist-xinyu opened this issue May 30, 2019 · 56 comments

Comments

@nuist-xinyu
Copy link

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals
, std::allocator<CUDAStreamInternals
> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object
, _object
) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

@nuist-xinyu
Copy link
Author

@Duankaiwen 谢谢小哥哥

@Duankaiwen
Copy link
Owner

Can I see your full log?

@nuist-xinyu
Copy link
Author

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
No cache file found...
loading annotations into memory...
Done (t=25.19s)
creating index...
index created!
118287it [01:18, 1509.02it/s]
loading annotations into memory...
Done (t=20.50s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.08s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=20.29s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.57s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
No cache file found...
loading annotations into memory...
Done (t=1.26s)
creating index...
index created!
5000it [00:03, 1478.28it/s]
loading annotations into memory...
Done (t=0.61s)
creating index...
index created!
system config...
{'batch_size': 48,
'cache_dir': 'cache',
'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7fac46fc3870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7fac46fc38b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
start prefetching data...
shuffling indices...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals
, std::allocator<CUDAStreamInternals
> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object
, _object
) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

@Duankaiwen
Copy link
Owner

How many GPUs do you have?

@nuist-xinyu
Copy link
Author

i have put the val into the training set as you said,but this erro occurred
thank you help me @Duankaiwen

@nuist-xinyu
Copy link
Author

16G

@Duankaiwen
Copy link
Owner

How many GPUs, not GPU memory

@nuist-xinyu
Copy link
Author

sorrysorry 8g

@nuist-xinyu
Copy link
Author

only one ,2070

@Duankaiwen
Copy link
Owner

Duankaiwen commented May 30, 2019

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

@nuist-xinyu
Copy link
Author

thank you

@nuist-xinyu
Copy link
Author

best wish for you
i have try it

@nuist-xinyu
Copy link
Author

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 16, but expected 2) (scatter at torch/csrc/cuda/comm.cpp:135)
frame #0: + 0xc42a0b (0x7f94eb489a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x38a5cb (0x7f94eabd15cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object
, _object
) + 0x38f (0x7f94eafafa2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

@nuist-xinyu
Copy link
Author

Ah, I am crazy now.
Still not, God, help me.

@Duankaiwen
Copy link
Owner

Can I see your config/CenterNet-104.json?

@nuist-xinyu
Copy link
Author

{
"system": {
"dataset": "MSCOCO",
"batch_size": 48,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [6,6,6,6,6,6,6,6],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

@nuist-xinyu
Copy link
Author

I showed you the original, I also modified it.But still can't
do you know chinese

@Duankaiwen
Copy link
Owner

Can I see your own config/CenterNet-104.json?

@nuist-xinyu
Copy link
Author

I don't have my own config, this is my own download.

@nuist-xinyu
Copy link
Author

This is what I used for training.

@Duankaiwen
Copy link
Owner

You said you have modified it the config/CenterNet-104.json, and I want to know what does the modified file look like. The log shows that there are some errors in config/CenterNet-104.json. I need to know the detail of config/CenterNet-104.json to help you

@nuist-xinyu
Copy link
Author

{
"system": {
"dataset": "MSCOCO",
"batch_size": 2,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [2,2,2,2,2,2,2,2],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

@nuist-xinyu
Copy link
Author

This is what I changed after I modified it.

@Duankaiwen
Copy link
Owner

Modify 'chunk_sizes' to [2], not [2,2,2,2,2,2,2,2]

@nuist-xinyu
Copy link
Author

ok ok 谢谢你啊

@nuist-xinyu
Copy link
Author

You are really enthusiastic, thank you.

@nuist-xinyu
Copy link
Author

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=24.58s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=19.16s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.12s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.61s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
system config...
{'batch_size': 2,
'cache_dir': 'cache',
'chunk_sizes': [2],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f75971ee900>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f75971ee948>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward
preds = self.model(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward
return self.module(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward
return self._train(*xs, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train
inter = self.pre(image)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward
conv = self.conv(x)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

@nuist-xinyu
Copy link
Author

Hello, I follow what you said, but still can't

@Duankaiwen
Copy link
Owner

what's your version of cuda

@nuist-xinyu
Copy link
Author

cuda 10

@Duankaiwen
Copy link
Owner

Duankaiwen commented May 30, 2019

The version maybe high, try cuda 8.0 or cuda 9.0. See this: sangwoomo/instagan#4

@nuist-xinyu
Copy link
Author

Hello author, I am bothering you again. I really appreciate your help to me yesterday. I changed cuda to 9, but still can't train, but I can test it.

@nuist-xinyu
Copy link
Author

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=22.31s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=19.15s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.04s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.53s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
system config...
{'batch_size': 1,
'cache_dir': 'cache',
'chunk_sizes': [1],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7fd20efd0870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7fd20efd08b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward
preds = self.model(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward
return self.module(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward
return self._train(*xs, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train
inter = self.pre(image)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward
conv = self.conv(x)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

@nuist-xinyu
Copy link
Author

loading parameters at iteration: 480000
building neural network...
module_file: models.CenterNet-104
total parameters: 210062960
loading parameters...
loading model from cache/nnet/CenterNet-104/CenterNet-104_480000.pkl
locating kps: 0%| | 0/5000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
locating kps: 72%|██████████████████ | 3624/5000 [34:05<14:17, 1.60it/

this is test

@Duankaiwen
Copy link
Owner

How about now?

@nuist-xinyu
Copy link
Author

RuntimeError: Expected object of type CUDAByteType but found type CUDAFloatType for argument #0 'result' (checked_cast_tensor at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/ATen/Utils.h:30)
frame #0: + 0xf5cb33 (0x7f961bb32b33 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #1: at::CUDAFloatType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x26 (0x7f961bb361c6 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #2: torch::autograd::VariableType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x19c (0x7f96351b7f2c in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: at::Type::gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x118 (0x7f961bc8fdd8 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #4: pool_backward(at::Tensor, at::Tensor) + 0x435 (0x7f95ed4c3315 in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #5: + 0x112cb (0x7f95ed4cd2cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #6: + 0x1149e (0x7f95ed4cd49e in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x11e1c (0x7f95ed4cde1c in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #8: _PyCFunction_FastCallDict + 0x154 (0x555e9c1c2744 in python)
frame #9: + 0x19842c (0x555e9c24942c in python)
frame #10: _PyEval_EvalFrameDefault + 0x30a (0x555e9c26e38a in python)
frame #11: PyEval_EvalCodeEx + 0x329 (0x555e9c244289 in python)
frame #12: + 0x194094 (0x555e9c245094 in python)
frame #13: PyObject_Call + 0x3e (0x555e9c1c254e in python)
frame #14: _PyEval_EvalFrameDefault + 0x19ec (0x555e9c26fa6c in python)
frame #15: + 0x1918e4 (0x555e9c2428e4 in python)
frame #16: _PyFunction_FastCallDict + 0x1bc (0x555e9c243c4c in python)
frame #17: _PyObject_FastCallDict + 0x26f (0x555e9c1c2b0f in python)
frame #18: _PyObject_Call_Prepend + 0x63 (0x555e9c1c76a3 in python)
frame #19: PyObject_Call + 0x3e (0x555e9c1c254e in python)
frame #20: torch::autograd::PyFunction::apply(std::vector<torch::autograd::Variable, std::allocatortorch::autograd::Variable > const&) + 0x199 (0x7f9635197579 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: torch::autograd::Engine::evaluate_function(torch::autograd::FunctionTask&) + 0x1d1e (0x7f963518254e in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #22: torch::autograd::Engine::thread_main(torch::autograd::GraphTask*) + 0xe7 (0x7f9635182f17 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #23: torch::autograd::Engine::thread_init(int) + 0x72 (0x7f963517f822 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #24: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7f96351ad8aa in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #25: + 0xb8678 (0x7f96187ea678 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #26: + 0x76ba (0x7f96454606ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #27: clone + 0x6d (0x7f964519641d in /lib/x86_64-linux-gnu/libc.so.6)

This error occurs and still cannot run

@Duankaiwen Duankaiwen reopened this Jun 3, 2019
@hheavenknowss
Copy link

Hello,I have the same issue,and I have two gpus,do I need to change the config like above? if I need to,please tell me how. I have been troubled for 2 weeks,I'll be vrey appreciate if I can fix it

@Duankaiwen
Copy link
Owner

please show your log

@hheavenknowss
Copy link

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly,I‘ll put it later,thanks again

@hheavenknowss
Copy link

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly,I‘ll put it later,thanks again

loading all datasets...
using 4 threads
loading from cache file: cache/coco_hepaticvessel_001.pkl
No cache file found...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
49it [00:00, 36524.06it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
system config...
{'batch_size': 48,
'cache_dir': 'cache',
'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f5fd2cc5ab0>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f5fd2cc5af8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [512, 512],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 49
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]shuffling indices...
shuffling indices...

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/dfsdata/pengxf2_data/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/usr/local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/usr/local/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: Device index must be -1 or non-negative, got -1687419088 (Device at /pytorch/torch/lib/tmp_install/include/ATen/Device.h:47)
frame #0: + 0xc4964b (0x7f5f6517b64b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x39120b (0x7f5f648c320b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object
, _object
) + 0x38f (0x7f5f64ca166f in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

@Duankaiwen
Copy link
Owner

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

@hheavenknowss
Copy link

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

I'll try it thank you , and I've tried batch_size 8 and chunk_size [4,4] it worked, I wonder if most these issues are about batch_size and chun_size set?

@Duankaiwen
Copy link
Owner

Yes

@hheavenknowss
Copy link

Yes

Thank you for your answer and patience

@Duankaiwen
Copy link
Owner

No problem

@nuist-xinyu
Copy link
Author

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=2406.72s)
creating index...
index created!
Traceback (most recent call last):
File "train.py", line 193, in
training_dbs = [datasets[dataset](configs["db"], train_split) for _ in range(threads)]
File "train.py", line 193, in
training_dbs = [datasets[dataset](configs["db"], train_split) for _ in range(threads)]
File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 69, in init
self._load_coco_data()
File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 85, in _load_coco_data
data = json.load(f)
File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/json/init.py", line 296, in load
return loads(fp.read(),
File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

i'm sorry ,Disturb you again,when i run this code in 1080 (cuda9 and torch0.41),This happens. How can I solve it?

@Duankaiwen
Copy link
Owner

Duankaiwen commented Jun 4, 2019

Try this:
cd /data/coco/PythonAPI
make

@nuist-xinyu
Copy link
Author

Thank you for answering my question late at night. I did what you said, but this happened.
python setup.py build_ext --inplace
running build_ext
skipping 'pycocotools/_mask.c' Cython extension (up-to-date)
building 'pycocotools._mask' extension
creating build
creating build/common
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/pycocotools
gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c ../common/maskApi.c -o build/temp.linux-x86_64-3.6/../common/maskApi.o -Wno-cpp -Wno-unused-function -std=c99
../common/maskApi.c: In function ‘rleToBbox’:
../common/maskApi.c:141:31: warning: ‘xp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if(j%2==0) xp=x; else if(xp<x) { ys=0; ye=h-1; }
^
gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -Wno-cpp -Wno-unused-function -std=c99
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pycocotools
gcc -pthread -shared -B /home/zq/anaconda3/compiler_compat -L/home/zq/anaconda3/lib -Wl,-rpath=/home/zq/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/../common/maskApi.o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -o build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so -> pycocotools
rm -rf build

@nuist-xinyu
Copy link
Author

This error occurred when I ran the program.

zq@zq-G1-SNIPER-B7:~/辛宇/CenterNet-master$ python train.py CornerNet
Traceback (most recent call last):
File "train.py", line 18, in
from nnet.py_factory import NetworkFactory
File "/home/zq/辛宇/CenterNet-master/nnet/py_factory.py", line 8, in
from models.py_utils.data_parallel import DataParallel
File "/home/zq/辛宇/CenterNet-master/models/py_utils/init.py", line 6, in
from ._cpools import TopPool, BottomPool, LeftPool, RightPool
File "/home/zq/辛宇/CenterNet-master/models/py_utils/_cpools/init.py", line 8, in
import top_pool, bottom_pool, left_pool, right_pool
ImportError: /home/zq/.local/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/top_pool.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN5torch4barfEPKcz

@Duankaiwen
Copy link
Owner

what's your torch version?

@nuist-xinyu
Copy link
Author

in this computer

import torch
print(torch.version)
0.5.0a0+ce8e8fe

@lolongcovas
Copy link

please, check here

@Orange-Ocean-hh
Copy link

Traceback (most recent call last):
File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/data/data/fxy/CenterNet-master/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/data/data/fxy/CenterNet-master/sample/coco.py", line 99, in kp_detection
image, detections = random_crop(image, detections, rand_scales, input_size, border=border)
File "/data/data/fxy/CenterNet-master/sample/utils.py", line 57, in random_crop
image_height, image_width = image.shape[0:2]
AttributeError: 'NoneType' object has no attribute 'shape'
Hello,sorry to disturb.How can I fix the error?I'm not sure if there is any problem about 'shape'.

@WuChannn
Copy link

@Duankaiwen hello, kaiwen, could you please show where to specify the ids of gpu used? or the code will use all the gpus automatically? thank you

@Duankaiwen
Copy link
Owner

@WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size'

@WuChannn
Copy link

WuChannn commented Aug 11, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants