-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
train error #46
Comments
@Duankaiwen 谢谢小哥哥 |
Can I see your full log? |
loading all datasets... |
How many GPUs do you have? |
i have put the val into the training set as you said,but this erro occurred |
16G |
How many GPUs, not GPU memory |
sorrysorry 8g |
only one ,2070 |
Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2] |
thank you |
best wish for you |
Traceback (most recent call last): |
Ah, I am crazy now. |
Can I see your config/CenterNet-104.json? |
{
} |
I showed you the original, I also modified it.But still can't |
Can I see your own config/CenterNet-104.json? |
I don't have my own config, this is my own download. |
This is what I used for training. |
You said you have modified it the config/CenterNet-104.json, and I want to know what does the modified file look like. The log shows that there are some errors in config/CenterNet-104.json. I need to know the detail of config/CenterNet-104.json to help you |
{
} |
This is what I changed after I modified it. |
Modify 'chunk_sizes' to [2], not [2,2,2,2,2,2,2,2] |
ok ok 谢谢你啊 |
You are really enthusiastic, thank you. |
loading all datasets... Traceback (most recent call last): |
Hello, I follow what you said, but still can't |
what's your version of cuda |
cuda 10 |
The version maybe high, try cuda 8.0 or cuda 9.0. See this: sangwoomo/instagan#4 |
Hello author, I am bothering you again. I really appreciate your help to me yesterday. I changed cuda to 9, but still can't train, but I can test it. |
loading all datasets... Traceback (most recent call last): |
loading parameters at iteration: 480000 this is test |
How about now? |
RuntimeError: Expected object of type CUDAByteType but found type CUDAFloatType for argument #0 'result' (checked_cast_tensor at /opt/conda/conda-bld/pytorch_1525909934016/work/aten/src/ATen/Utils.h:30) This error occurs and still cannot run |
Hello,I have the same issue,and I have two gpus,do I need to change the config like above? if I need to,please tell me how. I have been troubled for 2 weeks,I'll be vrey appreciate if I can fix it |
please show your log |
Thank you for replying this fast ,but I have some wrong with my environment suddenly,I‘ll put it later,thanks again |
loading all datasets... Traceback (most recent call last): Exception in thread Thread-4: Exception in thread Thread-3: |
Modify 'batch_size' to 3 and 'chunk_sizes' to [3] in config/CenterNet-104.json. If out of memory, then modify 'batch_size' to 2 and 'chunk_sizes' to [2] |
I'll try it thank you , and I've tried batch_size 8 and chunk_size [4,4] it worked, I wonder if most these issues are about batch_size and chun_size set? |
Yes |
Thank you for your answer and patience |
No problem |
loading all datasets... i'm sorry ,Disturb you again,when i run this code in 1080 (cuda9 and torch0.41),This happens. How can I solve it? |
Try this: |
Thank you for answering my question late at night. I did what you said, but this happened. |
This error occurred when I ran the program. zq@zq-G1-SNIPER-B7:~/辛宇/CenterNet-master$ python train.py CornerNet |
what's your torch version? |
in this computer
|
please, check here |
Traceback (most recent call last): |
@Duankaiwen hello, kaiwen, could you please show where to specify the ids of gpu used? or the code will use all the gpus automatically? thank you |
@WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size' |
Ok, get it. Thanks ------------------ Original ------------------From: Kaiwen Duan <[email protected]>Date: Tue,Aug 11,2020 10:10 PMTo: Duankaiwen/CenterNet <[email protected]>Cc: WuChannn <[email protected]>, Mention <[email protected]>Subject: Re: [Duankaiwen/CenterNet] train error (#46)
@WuChannn Specifying the gpu ids is not supported, but you can specify the 'chunk_sizes' and the 'batch_size' in config/CenterNet-xxx.json, where the length of 'chunk_sizes' denotes the number of gpus you will use, the item in 'chunk_sizes' denote the batch size for each gpu. And the sum(chunk_sizes) should be equal to the 'batch_size'
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or unsubscribe.
[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "#46 (comment)",
"url": "#46 (comment)",
"name": "View Issue"
},
"description": "View this Issue on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]
|
Traceback (most recent call last):
File "train.py", line 203, in
train(training_dbs, validation_db, args.start_iter)
File "train.py", line 138, in train
training_loss, focal_loss, pull_loss, push_loss, regr_loss = nnet.train(**training)
File "/home/xinyu/CenterNet-master/nnet/py_factory.py", line 82, in train
loss_kp = self.network(xs, ys)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, kwargs)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 66, in forward
inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids, self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/data_parallel.py", line 77, in scatter
return scatter_kwargs(inputs, kwargs, device_ids, dim=self.dim, chunk_sizes=self.chunk_sizes)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 30, in scatter_kwargs
inputs = scatter(inputs, target_gpus, dim, chunk_sizes) if inputs else []
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 25, in scatter
return scatter_map(inputs)
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 18, in scatter_map
return list(zip(map(scatter_map, obj)))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 20, in scatter_map
return list(map(list, zip(map(scatter_map, obj))))
File "/home/xinyu/CenterNet-master/models/py_utils/scatter_gather.py", line 15, in scatter_map
return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 87, in forward
outputs = comm.scatter(input, ctx.target_gpus, ctx.chunk_sizes, ctx.dim, streams)
File "/home/xinyu/anaconda3/lib/python3.6/site-packages/torch/cuda/comm.py", line 142, in scatter
return tuple(torch._C._scatter(tensor, devices, chunk_sizes, dim, streams))
RuntimeError: CUDA error (10): invalid device ordinal (check_status at /opt/conda/conda-bld/pytorch_1532502421238/work/aten/src/ATen/cuda/detail/CUDAHooks.cpp:36)
frame #0: torch::cuda::scatter(at::Tensor const&, at::ArrayRef, at::optional<std::vector<long, std::allocator > > const&, long, at::optional<std::vector<CUDAStreamInternals, std::allocator<CUDAStreamInternals> > > const&) + 0x4e1 (0x7fac77038871 in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #1: + 0xc42a0b (0x7fac77040a0b in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #2: + 0x38a5cb (0x7fac767885cb in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
frame #13: THPFunction_apply(_object, _object) + 0x38f (0x7fac76b66a2f in /home/xinyu/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so)
The text was updated successfully, but these errors were encountered: