-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cuDNN Error while training -map #7153
Comments
I had a similar issue when training yolov4-tiny on custom dataset of 4 classes as instructed in README: I'm using Debian 10 Linux on Tesla P100 on GCP with:
I've tried the solutions mentioned in #6836 but none of it works. Always crashed at the same iteration. |
I updated CUDA to 11.2 with driver 460.27.04 and still didn't work |
I was able to fix the bug with a new VM instance installed with driver version 418.87.01, CUDA 10.1, and cuDNN 7.6.5. I've seen people either downgrade from CUDA 11.x to 10.x or upgrade from 9.x to 10.x. @niemiaszek maybe try downgrading your CUDA. |
So can we confirm, that there is this issue only with CUDA 11, but with CUDA 10 it works well? Also, check that you are using the cuDNN-version corresponding to your CUDA-version. There are cuDNN for CUDA 10 and cuDNN for CUDA 11. |
Yes, I used compatible versions of cuDNN when upgrading/downgrading my CUDA. And for this specific issue on (when calculating mAP with -map flag enabled):
Yes, I think this issue occurs mainly with CUDA 11 + cuDNN 8.x Same error mentioned in #6836 with:
But I've also seen the same problem with older CUDA here:
|
Exactly the same error when training with @AlexeyAB Hi Alexey, I wanna ask is there any difference when calculating mAP between Device: GTX 1650, 1650 SUPER, 1660 SUPER |
@AlexeyAB
and i then have tried install opencv 3.4.10. make clean . make It is not useful.
it happened when i use -test. |
|
Same error here. Nvidia RTX 30 series require CUDA 11. |
I may have found a workaround. I set the CUDA_VISIBLE_DEVICES=0 and set max_batches=9000 in yolo.cfg. Not sure which setting allowed train to work with -map. Bash script:
|
I had the same problem, just removed |
Yes, try to use |
Got me too- RTX 2070, cuda 11.4 calculation mAP (mean average precision)... cuDNN Error: CUDNN_STATUS_BAD_PARAM edit: removing -map, hopefully will work. mildly annoying tho :/ |
@AlexeyAB Hi Alexey! Firstly, thanks for all your work, amazing!
Our environment settings are:
Our GPU is a GeForce RTX 2080 Ti with:
As many other users reported, the Does anyone have any solution other than downgrading the CUDA version or manually iterating the training and validation (saving every 1000 iterations + evaluating mAP + launch training again)? It is important for us having an automatic training with mAP evaluation. Many thanks for your attention and suggestions! |
@pablogago11 Stuck with the same issue. My specifications are Cuda 11.4 & cuDNN 8.2.4, Ubuntu 20.04. |
Thanks to @khsily suggestion, I doubt that this issue also has something to do with |
Yes, this problem also occurred when I used the darknet yolov7-tiny.cfg. My configuration is CUDA10.2 and cudn8.05. I did not try to adjust cudnn, but adjusted batch=64 and subdivisions=64. Finally, I calculated the map through 1000 iterations. |
I had the same problem. |
I have same problem and fixed with correct CUDA and CUDNN version installation, and modify Makefile `# GeForce RTX4090 ARCH= -gencode arch=compute_89,code=[sm_89,compute_89] and finally works fine. Maybe this issue depends with the GPU Architecture |
I was able to solve this issue by reducing the input resolution of the images in training from the .cfg file. From 576 x 576 to 416 x 416. |
I got cuDNN Error: CUDNN_STATUS_BAD_PARAM on darknet/src/convolutional_kernels.cu : () : line: 533. Same issue as in pjreddie repo issue. There is indepth description of my setup
The text was updated successfully, but these errors were encountered: