train error #46

nuist-xinyu · 2019-05-30T08:14:45Z

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals, std::allocator<CUDAStreamInternals> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object, _object) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

nuist-xinyu · 2019-05-30T08:16:04Z

@Duankaiwen 谢谢小哥哥

Duankaiwen · 2019-05-30T08:24:53Z

Can I see your full log?

nuist-xinyu · 2019-05-30T08:31:53Z

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
No cache file found...
loading annotations into memory...
Done (t=25.19s)
creating index...
index created!
118287it [01:18, 1509.02it/s]
loading annotations into memory...
Done (t=20.50s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.08s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=20.29s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.57s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
No cache file found...
loading annotations into memory...
Done (t=1.26s)
creating index...
index created!
5000it [00:03, 1478.28it/s]
loading annotations into memory...
Done (t=0.61s)
creating index...
index created!
system config...
{'batch_size': 48,
'cache_dir': 'cache',
'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7fac46fc3870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7fac46fc38b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
start prefetching data...
shuffling indices...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals, std::allocator<CUDAStreamInternals> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #13: THPFunction_apply(_object, _object) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Duankaiwen · 2019-05-30T08:35:09Z

How many GPUs do you have?

nuist-xinyu · 2019-05-30T08:35:17Z

i have put the val into the training set as you said,but this erro occurred
thank you help me @Duankaiwen

nuist-xinyu · 2019-05-30T08:35:28Z

16G

Duankaiwen · 2019-05-30T08:37:12Z

How many GPUs, not GPU memory

nuist-xinyu · 2019-05-30T08:37:39Z

sorrysorry 8g

nuist-xinyu · 2019-05-30T08:37:58Z

only one ，2070

Duankaiwen · 2019-05-30T08:41:12Z

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

nuist-xinyu · 2019-05-30T08:41:48Z

thank you

nuist-xinyu · 2019-05-30T08:42:16Z

best wish for you
i have try it

nuist-xinyu · 2019-05-30T09:03:41Z

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: given chunk sizes don't sum up to the tensor's size (sum(chunk_sizes) == 16, but expected 2) (scatter at torch/csrc/cuda/comm.cpp:135)
frame #0: + 0xc42a0b (0x7f94eb489a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x38a5cb (0x7f94eabd15cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object, _object) + 0x38f (0x7f94eafafa2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

nuist-xinyu · 2019-05-30T09:04:21Z

Ah, I am crazy now.
Still not, God, help me.

Duankaiwen · 2019-05-30T09:06:28Z

Can I see your config/CenterNet-104.json?

nuist-xinyu · 2019-05-30T09:07:25Z

{
"system": {
"dataset": "MSCOCO",
"batch_size": 48,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [6,6,6,6,6,6,6,6],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

nuist-xinyu · 2019-05-30T09:12:15Z

I showed you the original, I also modified it.But still can't
do you know chinese

Duankaiwen · 2019-05-30T09:14:02Z

Can I see your own config/CenterNet-104.json?

nuist-xinyu · 2019-05-30T09:14:56Z

I don't have my own config, this is my own download.

nuist-xinyu · 2019-05-30T09:15:22Z

This is what I used for training.

Duankaiwen · 2019-05-30T09:20:48Z

You said you have modified it the config/CenterNet-104.json, and I want to know what does the modified file look like. The log shows that there are some errors in config/CenterNet-104.json. I need to know the detail of config/CenterNet-104.json to help you

nuist-xinyu · 2019-05-30T09:23:45Z

{
"system": {
"dataset": "MSCOCO",
"batch_size": 2,
"sampling_function": "kp_detection",

    "train_split": "trainval",
    "val_split": "minival",

    "learning_rate": 0.00025,
    "decay_rate": 10,

    "val_iter": 500,

    "opt_algo": "adam",
    "prefetch_size": 6,

    "max_iter": 480000,
    "stepsize": 450000,
    "snapshot": 5000,

    "chunk_sizes": [2,2,2,2,2,2,2,2],

    "data_dir": "./data"
},

"db": {
    "rand_scale_min": 0.6,
    "rand_scale_max": 1.4,
    "rand_scale_step": 0.1,
    "rand_scales": null,

    "rand_crop": true,
    "rand_color": true,

    "border": 128,
    "gaussian_bump": true,

    "input_size": [511, 511],
    "output_sizes": [[128, 128]],

    "test_scales": [1],

    "top_k": 70,
    "categories": 80,
    "kp_categories": 1,
    "ae_threshold": 0.5,
    "nms_threshold": 0.5,

    "max_per_image": 100
}

}

nuist-xinyu · 2019-05-30T09:24:13Z

This is what I changed after I modified it.

Duankaiwen · 2019-05-30T09:24:53Z

Modify 'chunk_sizes' to [2], not [2,2,2,2,2,2,2,2]

nuist-xinyu · 2019-05-30T09:25:52Z

ok ok 谢谢你啊

nuist-xinyu · 2019-05-30T09:26:47Z

You are really enthusiastic, thank you.

nuist-xinyu · 2019-05-30T10:38:26Z

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=24.58s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=19.16s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.12s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.61s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
system config...
{'batch_size': 2,
'cache_dir': 'cache',
'chunk_sizes': [2],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f75971ee900>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f75971ee948>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward
preds = self.model(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward
return self.module(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward
return self._train(*xs, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train
inter = self.pre(image)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward
conv = self.conv(x)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

nuist-xinyu · 2019-05-30T10:39:56Z

Hello, I follow what you said, but still can't

Duankaiwen · 2019-05-30T12:51:20Z

what's your version of cuda

nuist-xinyu · 2019-05-30T12:54:57Z

cuda 10

Duankaiwen · 2019-05-30T13:04:58Z

The version maybe high, try cuda 8.0 or cuda 9.0. See this: sangwoomo/instagan#4

nuist-xinyu · 2019-05-31T09:04:53Z

Hello author, I am bothering you again. I really appreciate your help to me yesterday. I changed cuda to 9, but still can't train, but I can test it.

nuist-xinyu · 2019-05-31T09:05:00Z

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=22.31s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=19.15s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=18.04s)
creating index...
index created!
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=23.53s)
creating index...
index created!
loading from cache file: cache/coco_minival2014.pkl
loading annotations into memory...
Done (t=0.78s)
creating index...
index created!
system config...
{'batch_size': 1,
'cache_dir': 'cache',
'chunk_sizes': [1],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7fd20efd0870>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7fd20efd08b8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [511, 511],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 118287
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 68, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 20, in forward
preds = self.model(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 32, in forward
return self.module(*xs, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 289, in forward
return self._train(*xs, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/kp.py", line 193, in _train
inter = self.pre(image)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/utils.py", line 14, in forward
conv = self.conv(x)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp:663

nuist-xinyu · 2019-05-31T09:06:43Z

loading parameters at iteration: 480000
building neural network...
module_file: models.CenterNet-104
total parameters: 210062960
loading parameters...
loading model from cache/nnet/CenterNet-104/CenterNet-104_480000.pkl
locating kps: 0%| | 0/5000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead.
warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
locating kps: 72%|██████████████████ | 3624/5000 [34:05<14:17, 1.60it/

this is test

Duankaiwen · 2019-05-31T13:27:19Z

How about now？

nuist-xinyu · 2019-06-01T06:30:35Z

RuntimeError: Expected object of type CUDAByteType but found type CUDAFloatType for argument #0 'result' (checked_cast_tensor at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/ATen/Utils.h:30)
frame #0: + 0xf5cb33 (0x7f961bb32b33 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #1: at::CUDAFloatType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x26 (0x7f961bb361c6 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #2: torch::autograd::VariableType::s_gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x19c (0x7f96351b7f2c in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #3: at::Type::gt_out(at::Tensor&, at::Tensor const&, at::Tensor const&) const + 0x118 (0x7f961bc8fdd8 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #4: pool_backward(at::Tensor, at::Tensor) + 0x435 (0x7f95ed4c3315 in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #5: + 0x112cb (0x7f95ed4cd2cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #6: + 0x1149e (0x7f95ed4cd49e in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #7: + 0x11e1c (0x7f95ed4cde1c in /home/xinyu/anaconda3/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/right_pool.cpython-36m-x86_64-linux-gnu.so)
frame #8: _PyCFunction_FastCallDict + 0x154 (0x555e9c1c2744 in python)
frame #9: + 0x19842c (0x555e9c24942c in python)
frame #10: _PyEval_EvalFrameDefault + 0x30a (0x555e9c26e38a in python)
frame #11: PyEval_EvalCodeEx + 0x329 (0x555e9c244289 in python)
frame #12: + 0x194094 (0x555e9c245094 in python)
frame #13: PyObject_Call + 0x3e (0x555e9c1c254e in python)
frame #14: _PyEval_EvalFrameDefault + 0x19ec (0x555e9c26fa6c in python)
frame #15: + 0x1918e4 (0x555e9c2428e4 in python)
frame #16: _PyFunction_FastCallDict + 0x1bc (0x555e9c243c4c in python)
frame #17: _PyObject_FastCallDict + 0x26f (0x555e9c1c2b0f in python)
frame #18: _PyObject_Call_Prepend + 0x63 (0x555e9c1c76a3 in python)
frame #19: PyObject_Call + 0x3e (0x555e9c1c254e in python)
frame #20: torch::autograd::PyFunction::apply(std::vector<torch::autograd::Variable, std::allocatortorch::autograd::Variable > const&) + 0x199 (0x7f9635197579 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #21: torch::autograd::Engine::evaluate_function(torch::autograd::FunctionTask&) + 0x1d1e (0x7f963518254e in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #22: torch::autograd::Engine::thread_main(torch::autograd::GraphTask*) + 0xe7 (0x7f9635182f17 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #23: torch::autograd::Engine::thread_init(int) + 0x72 (0x7f963517f822 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #24: torch::autograd::python::PythonEngine::thread_init(int) + 0x2a (0x7f96351ad8aa in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #25: + 0xb8678 (0x7f96187ea678 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/lib/../../../../libstdc++.so.6)
frame #26: + 0x76ba (0x7f96454606ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #27: clone + 0x6d (0x7f964519641d in /lib/x86_64-linux-gnu/libc.so.6)

This error occurs and still cannot run

hheavenknowss · 2019-06-03T09:24:45Z

Hello,I have the same issue,and I have two gpus,do I need to change the config like above? if I need to,please tell me how. I have been troubled for 2 weeks,I'll be vrey appreciate if I can fix it

Duankaiwen · 2019-06-03T09:37:14Z

please show your log

hheavenknowss · 2019-06-03T09:41:48Z

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly，I‘ll put it later,thanks again

hheavenknowss · 2019-06-03T13:35:54Z

please show your log

Thank you for replying this fast ,but I have some wrong with my environment suddenly，I‘ll put it later,thanks again

loading all datasets...
using 4 threads
loading from cache file: cache/coco_hepaticvessel_001.pkl
No cache file found...
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
49it [00:00, 36524.06it/s]
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
loading from cache file: cache/coco_hepaticvessel_001.pkl
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
system config...
{'batch_size': 48,
'cache_dir': 'cache',
'chunk_sizes': [6, 6, 6, 6, 6, 6, 6, 6],
'config_dir': 'config',
'data_dir': './data',
'data_rng': <mtrand.RandomState object at 0x7f5fd2cc5ab0>,
'dataset': 'MSCOCO',
'decay_rate': 10,
'display': 5,
'learning_rate': 0.00025,
'max_iter': 480000,
'nnet_rng': <mtrand.RandomState object at 0x7f5fd2cc5af8>,
'opt_algo': 'adam',
'prefetch_size': 6,
'pretrain': None,
'result_dir': 'results',
'sampling_function': 'kp_detection',
'snapshot': 5000,
'snapshot_name': 'CenterNet-104',
'stepsize': 450000,
'test_split': 'testdev',
'train_split': 'trainval',
'val_iter': 500,
'val_split': 'minival',
'weight_decay': False,
'weight_decay_rate': 1e-05,
'weight_decay_type': 'l2'}
db config...
{'ae_threshold': 0.5,
'border': 128,
'categories': 80,
'data_aug': True,
'gaussian_bump': True,
'gaussian_iou': 0.7,
'gaussian_radius': -1,
'input_size': [512, 512],
'kp_categories': 1,
'lighting': True,
'max_per_image': 100,
'merge_bbox': False,
'nms_algorithm': 'exp_soft_nms',
'nms_kernel': 3,
'nms_threshold': 0.5,
'output_sizes': [[128, 128]],
'rand_color': True,
'rand_crop': True,
'rand_pushes': False,
'rand_samples': False,
'rand_scale_max': 1.4,
'rand_scale_min': 0.6,
'rand_scale_step': 0.1,
'rand_scales': array([0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2, 1.3]),
'special_crop': False,
'test_scales': [1],
'top_k': 70,
'weight_exp': 8}
len of db: 49
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
start prefetching data...
shuffling indices...
building model...
module_file: models.CenterNet-104
start prefetching data...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
shuffling indices...
total parameters: 210062960
setting learning rate to: 0.00025
training start...
0%| | 0/480000 [00:00<?, ?it/s]shuffling indices...
shuffling indices...

Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/dfsdata/pengxf2_data/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/dfsdata/pengxf2_data/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/usr/local/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/usr/local/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: Device index must be -1 or non-negative, got -1687419088 (Device at /pytorch/torch/lib/tmp_install/include/ATen/Device.h:47)
frame #0: + 0xc4964b (0x7f5f6517b64b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0x39120b (0x7f5f648c320b in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

frame #12: THPFunction_apply(_object, _object) + 0x38f (0x7f5f64ca166f in /usr/local/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)

Exception in thread Thread-4:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 383, in _recv
raise EOFError
EOFError

Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "train.py", line 51, in pin_memory
data = data_queue.get()
File "/usr/local/lib/python3.6/multiprocessing/queues.py", line 113, in get
return _ForkingPickler.loads(res)
File "/usr/local/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 151, in rebuild_storage_fd
fd = df.detach()
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 57, in detach
with _resource_sharer.get_connection(self._id) as conn:
File "/usr/local/lib/python3.6/multiprocessing/resource_sharer.py", line 87, in get_connection
c = Client(address, authkey=process.current_process().authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 493, in Client
answer_challenge(c, authkey)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 732, in answer_challenge
message = connection.recv_bytes(256) # reject large message
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer

Duankaiwen · 2019-06-03T13:37:58Z

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

hheavenknowss · 2019-06-03T13:56:07Z

Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2]

I'll try it thank you , and I've tried batch_size 8 and chunk_size [4,4] it worked, I wonder if most these issues are about batch_size and chun_size set?

Duankaiwen · 2019-06-03T13:59:20Z

Yes

hheavenknowss · 2019-06-03T14:04:25Z

Yes

Thank you for your answer and patience

Duankaiwen · 2019-06-03T14:05:19Z

No problem

nuist-xinyu · 2019-06-04T12:43:47Z

loading all datasets...
using 4 threads
loading from cache file: cache/coco_trainval2014.pkl
loading annotations into memory...
Done (t=2406.72s)
creating index...
index created!
Traceback (most recent call last):
File "train.py", line 193, in
training_dbs = [datasets[dataset](configs["db"], train_split) for _ in range(threads)]
File "train.py", line 193, in
training_dbs = [datasets[dataset](configs["db"], train_split) for _ in range(threads)]
File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 69, in init
self._load_coco_data()
File "/home/zq/辛宇/CenterNet-master/db/coco.py", line 85, in _load_coco_data
data = json.load(f)
File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/json/init.py", line 296, in load
return loads(fp.read(),
File "/home/zq/anaconda3/envs/CenterNet/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

i'm sorry ,Disturb you again，when i run this code in 1080 （cuda9 and torch0.41），This happens. How can I solve it?

Duankaiwen · 2019-06-04T15:18:27Z

Try this:
cd /data/coco/PythonAPI
make

nuist-xinyu · 2019-06-05T05:34:46Z

Thank you for answering my question late at night. I did what you said, but this happened.
python setup.py build_ext --inplace
running build_ext
skipping 'pycocotools/_mask.c' Cython extension (up-to-date)
building 'pycocotools._mask' extension
creating build
creating build/common
creating build/temp.linux-x86_64-3.6
creating build/temp.linux-x86_64-3.6/pycocotools
gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c ../common/maskApi.c -o build/temp.linux-x86_64-3.6/../common/maskApi.o -Wno-cpp -Wno-unused-function -std=c99
../common/maskApi.c: In function ‘rleToBbox’:
../common/maskApi.c:141:31: warning: ‘xp’ may be used uninitialized in this function [-Wmaybe-uninitialized]
if(j%2==0) xp=x; else if(xp<x) { ys=0; ye=h-1; }
^
gcc -pthread -B /home/zq/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/zq/anaconda3/lib/python3.6/site-packages/numpy/core/include -I../common -I/home/zq/anaconda3/include/python3.6m -c pycocotools/_mask.c -o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -Wno-cpp -Wno-unused-function -std=c99
creating build/lib.linux-x86_64-3.6
creating build/lib.linux-x86_64-3.6/pycocotools
gcc -pthread -shared -B /home/zq/anaconda3/compiler_compat -L/home/zq/anaconda3/lib -Wl,-rpath=/home/zq/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/../common/maskApi.o build/temp.linux-x86_64-3.6/pycocotools/_mask.o -o build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.6/pycocotools/_mask.cpython-36m-x86_64-linux-gnu.so -> pycocotools
rm -rf build

nuist-xinyu · 2019-06-05T05:36:00Z

This error occurred when I ran the program.

zq@zq-G1-SNIPER-B7:~/辛宇/CenterNet-master$ python train.py CornerNet
Traceback (most recent call last):
File "train.py", line 18, in
from nnet.py_factory import NetworkFactory
File "/home/zq/辛宇/CenterNet-master/nnet/py_factory.py", line 8, in
from models.py_utils.data_parallel import DataParallel
File "/home/zq/辛宇/CenterNet-master/models/py_utils/init.py", line 6, in
from ._cpools import TopPool, BottomPool, LeftPool, RightPool
File "/home/zq/辛宇/CenterNet-master/models/py_utils/_cpools/init.py", line 8, in
import top_pool, bottom_pool, left_pool, right_pool
ImportError: /home/zq/.local/lib/python3.6/site-packages/cpools-0.0.0-py3.6-linux-x86_64.egg/top_pool.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN5torch4barfEPKcz

Duankaiwen · 2019-06-05T05:43:34Z

what's your torch version?

nuist-xinyu · 2019-06-05T08:51:59Z

in this computer

import torch
print(torch.version)
0.5.0a0+ce8e8fe

lolongcovas · 2019-09-01T14:39:33Z

please, check here

Orange-Ocean-hh · 2020-05-29T08:13:16Z

Traceback (most recent call last):
File "train.py", line 43, in prefetch_data
data, ind = sample_data(db, ind, data_aug=data_aug)
File "/data/data/fxy/CenterNet-master/sample/coco.py", line 199, in sample_data
return globals()[system_configs.sampling_function](db, k_ind, data_aug, debug)
File "/data/data/fxy/CenterNet-master/sample/coco.py", line 99, in kp_detection
image, detections = random_crop(image, detections, rand_scales, input_size, border=border)
File "/data/data/fxy/CenterNet-master/sample/utils.py", line 57, in random_crop
image_height, image_width = image.shape[0:2]
AttributeError: 'NoneType' object has no attribute 'shape'
Hello,sorry to disturb.How can I fix the error?I'm not sure if there is any problem about 'shape'.

WuChannn · 2020-08-11T07:15:44Z

@Duankaiwen hello, kaiwen, could you please show where to specify the ids of gpu used? or the code will use all the gpus automatically? thank you

Duankaiwen · 2020-08-11T14:09:49Z

@WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size'

WuChannn · 2020-08-11T15:39:35Z

Ok, get it. Thanks ------------------ Original ------------------From: Kaiwen Duan <[email protected]>Date: Tue,Aug 11,2020 10:10 PMTo: Duankaiwen/CenterNet <[email protected]>Cc: WuChannn <[email protected]>, Mention <[email protected]>Subject: Re: [Duankaiwen/CenterNet] train error (#46) @WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size' —You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe. [ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "#46 (comment)", "url": "#46 (comment)", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

Duankaiwen closed this as completed May 30, 2019

Duankaiwen reopened this Jun 3, 2019

train error #46

train error #46

Comments

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019 • edited Loading

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019

nuist-xinyu commented May 30, 2019

Duankaiwen commented May 30, 2019 • edited Loading

nuist-xinyu commented May 31, 2019

nuist-xinyu commented May 31, 2019

nuist-xinyu commented May 31, 2019

Duankaiwen commented May 31, 2019

nuist-xinyu commented Jun 1, 2019

hheavenknowss commented Jun 3, 2019

Duankaiwen commented Jun 3, 2019

hheavenknowss commented Jun 3, 2019

hheavenknowss commented Jun 3, 2019

Duankaiwen commented Jun 3, 2019

hheavenknowss commented Jun 3, 2019

Duankaiwen commented Jun 3, 2019

hheavenknowss commented Jun 3, 2019

Duankaiwen commented Jun 3, 2019

nuist-xinyu commented Jun 4, 2019

Duankaiwen commented Jun 4, 2019 • edited Loading

nuist-xinyu commented Jun 5, 2019

nuist-xinyu commented Jun 5, 2019

Duankaiwen commented Jun 5, 2019

nuist-xinyu commented Jun 5, 2019

lolongcovas commented Sep 1, 2019

Orange-Ocean-hh commented May 29, 2020

WuChannn commented Aug 11, 2020

Duankaiwen commented Aug 11, 2020

WuChannn commented Aug 11, 2020 via email

Duankaiwen commented May 30, 2019 •

edited

Loading

Duankaiwen commented May 30, 2019 •

edited

Loading

Duankaiwen commented Jun 4, 2019 •

edited

Loading