Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

problem of validation set #316

Open
wenjun90 opened this issue May 19, 2020 · 26 comments
Open

problem of validation set #316

wenjun90 opened this issue May 19, 2020 · 26 comments

Comments

@wenjun90
Copy link

Hi @zylo117,

image

I do training with my dataset 2700 image train and 300 valid.
python train.py -c 2 -p mydataset --batch_size 8 --lr 2.5e-4 --num_epochs 200
--load_weights /path/to/your/weights/efficientdet-d0.pth

The ratio and scale achor sont optimiser par code kmeans-anchor-ratios.

The AP after 50 epoch:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.050
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.140
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.022
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.059
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.196
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.269
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.269

The AP after 150 epoch so bad:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.059
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.160
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.031
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.059
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.072
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.213
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.290
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.290

What is the problem according you?

Thanks

@zylo117
Copy link
Owner

zylo117 commented May 19, 2020

the loss is higher than expected. either you have lots of classes or it's still underfitting.

try smaller network like d0 or increase lr but be aware of overfitting

@wenjun90
Copy link
Author

Hi @zylo117

My problem is the detection for 4 class. I tried the efficientdet-d0 in framework detectron2 achive mAP=41. But in detectron2, it perform with d1,d2 not realy good.

I wanna try your framework. But I trained from yesterday in 10 hour with 200 epoche. With your tutorial perfom well but, my dataset, I don't know how to modify.

Thanks

@zylo117
Copy link
Owner

zylo117 commented May 19, 2020

try increasing lr

@zylo117
Copy link
Owner

zylo117 commented May 19, 2020

hold on, did you load d0 weights into a d2 model? of course the result is bad. it's basically training from scratch

@wenjun90
Copy link
Author

Hi @zylo117 ,
I have training with lr 1e-2 and 1e-3 but the result is still not good.

@zylo117
Copy link
Owner

zylo117 commented May 19, 2020

@wenjun90 #316 (comment)

@akb46mayu
Copy link

Hi @wenjun90 , did your learning rate drop? Can you share your learning rate plot with me (or just the number). For all of my experiments on customized data the learning rate never drops, so i just wanted to look at yours.

@zylo117
Copy link
Owner

zylo117 commented May 19, 2020

lr only drops on plateaus

@akb46mayu
Copy link

Hi @zylo117 , i see. Thanks a lot!

@wenjun90
Copy link
Author

Hi @zylo117 ,
In your framework, how to get the anchor generator like that:?
AnchorGenerator',
'SIZES': [[32, 40.31747359663594, 50.79683366298238],
[64, 80.63494719327188, 101.59366732596476],
[128, 161.26989438654377, 203.18733465192952],
[256, 322.53978877308754, 406.37466930385904],
[512, 645.0795775461751, 812.7493386077181]],
'ASPECT_RATIOS': [[0.5, 1.0, 2.0]]

Thank you so much

@zylo117
Copy link
Owner

zylo117 commented May 20, 2020

try these two
#308 (comment)
https://github.com/Cli98/anchor_computation_tool

but I haven't tested them yet, so use it at your own risk.

@wenjun90
Copy link
Author

wenjun90 commented May 28, 2020

Hi @zylo117
I tried the COCO dataset and get this results with d0 backbone
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.311
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.487
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.330
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.119
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.361
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.493
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.268
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.406
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.434
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.153
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.524
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.650

It's good for coco dataset and your tutorial but my own dataset is not good. :(

@zylo117
Copy link
Owner

zylo117 commented May 28, 2020

great, I think you have done so well on coco.
It would be great if you can share your hyperparameters or how you train d0.

@wenjun90
Copy link
Author

wenjun90 commented May 28, 2020

Hi @zylo117,
I trained with this command:
python train.py -c 0 -p coco --head_only True --lr 1e-3 --batch_size 32 --load_weights weights/efficientdet-d0.pth --num_epochs 40

I set 40 epochs and I trained last night, Each epoch need 30 minutes for training. This morning, I stopped with 14 epoch and eval very well with these metrics. I try check time of prediction on gpu Tesla V100 with 21 FPS and on cpu with 3.3 FPS.

With my dataset, I really don't know how set these hyperparameter. I hope receive your advice.
My dataset is the text document with size image 1600x2400.

My objectif is to detect the region of text block in document like that:
https://miro.medium.com/max/1200/1*gAx3-sIpo09bPDCZ2fI_kw.png

Thank you very much!

@zylo117
Copy link
Owner

zylo117 commented May 28, 2020

set head only False to train the rest of the layers

@wenjun90
Copy link
Author

thank @zylo117 . I set False to train with my small dataset 3000 images, result is better but still smaller than faster rcnn. With rcnn, I get AP60, but Efficientdet-d0 only 35.

@zylo117
Copy link
Owner

zylo117 commented May 29, 2020

do you mean AP50 or AP50:95?
Are you comparing under the same metric?
There shouldn't be such a gap.
But then again, faster rcnn should be better than d0 if I remember correctly.

@wenjun90
Copy link
Author

wenjun90 commented May 29, 2020

Hi @zylo117

This is my result EfficientDet-d0 by your framework:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.343
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.487
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.395
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.343
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.283
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.465
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.471
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.471

This is my results by EfficientDet-d0 in framwork Detectron2

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.403
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.643
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.456
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.053
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.404
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.355
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.618
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.642
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.175
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.643

This is my results by Faster RCNN in detectron2

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.609
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.802
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.703
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.026
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.610
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.490
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.758
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.762
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.050
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.763

@zylo117
Copy link
Owner

zylo117 commented May 29, 2020

are these two efficientdet the same anchors?
can you share the links?
And as you can, at least D2 or D3 can be close to fasterrcnn

@akb46mayu
Copy link

Hi,
I think you can try an overfitting experiment quickly, just train and validate on the same dataset to see whether you can overfit them.
1 I think 40 epochs may not be enough, I train 10 images from scratch and found 300 epochs at least can bring some reasonable detections, although not perfect.
2 Try on a small number of images, and see whether the current anchor settings can help you detect smaller objects. If not, then you should make the anchors smaller.
3 The image domain in coco and your text images are totally different, your objects are obviously smaller than the average size of objects on coco. The default settings of the anchors may not help GT boxes to be matched.
4 If you only want to detect the text data, train from scratch may be another option if you have enough gpu and 300 epochs time to kill. To my knowledge, if enough training epochs are guaranteed, train from scratch may be better than fine-tuning on efficient det.
5 another option is to force the input image to 1920 or sth larger by using effdet d0 or d1.

@wenjun90
Copy link
Author

wenjun90 commented May 30, 2020

Hi @zylo117,
I tried with d2, score mAP to 44.5.

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.445
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.608
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.511
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.445
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.359
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.518
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.521
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.521

I change the ratio anchor too, In general, my result with d2 still less than faster rcnn (mAP=60, baseline), but one things positive in this cas is time of prediction on cpu correspondant as in paper. (0.3s d0, 0.7s d1 and 1.2s d2).

Initial I set anchor ratio as default: [(1,1), (0.7,1.4),(1.4,0.7)].
then I change Anchor ratio by optimal values based on code kmean_computation anchor,
then I set other ratio [(1,1),(0.5,2),(2,0.5)]
The result is not so different in both of three cas.
I tried change lr from 1e-3 to 1e-4, I trained with 200 epoch, but overfitting at 50 epochs.
My dataset 2500 train and 450 test.
python train.py -c 2 --head_only False --batch_size 4 --num_workers 4 --lr 1e-3 --num_epochs 200 --project mydata --save_interval 1000 --load_weights weights/efficientdet-d2.pth

In framework detectron2, backbone efficientdet d0, d1, I trained, there is no too different to mAP but time of prediction d1 is so long with 3s on cpu.

Thank you your help.

@zylo117
Copy link
Owner

zylo117 commented May 30, 2020

I think training effdet with a larger batchsize can help improving

@wenjun90
Copy link
Author

wenjun90 commented May 30, 2020

Hi @zylo117 ,
I show you here my results of training on my dataset with backbone d0->d3 after 20 epoch.

batchsize = 4, learning rate = 1e-3, anchor ratio optimal = '[(1.3, 0.8), (2.1, 0.5), (3.1, 0.3)]'
python train.py -c 3 --head_only False --batch_size 4 --num_workers 4 --lr 1e-3 --num_epochs 20 --project data_cv --save_interval 1000 --load_weights weights/efficientdet-d3.pth

D0:
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.136
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.275
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.125
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.136
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.131
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.324
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.366
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.366

D1:

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.213
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.378
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.218
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.213
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.187
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.394
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.426
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.426

D2

Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.208
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.389
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.209
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.208
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.182
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.342
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.368
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.368

D3
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.170
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.316
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.162
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.170
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.175
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.368
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.416
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.416

When I set batch size = 16 for d0. I get the result is better at 40 epoch:
D0
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.379
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.544
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.441
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.379
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.294
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.495
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.510
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.510

@wenjun90
Copy link
Author

Hi @zylo117 ,
For information:
With 1 gpu testa v100
d0 maximum batch size: 16
d1 maximum batch size 8, if I set 16 --> out of memory
d2 maximum batch size 4 if I set 8 --> out of memory

This is normal?

@zylo117
Copy link
Owner

zylo117 commented May 31, 2020

Yes, if you are v100 32G. But I remember there is a v100 16G, then no.

@satheeshkatipomu
Copy link

Hi @wenjun90,

How you are getting AP, is it embedded in train.py or you are stopping training and then running coco_eval.py using checkpoint?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants