problem of validation set #316

wenjun90 · 2020-05-19T08:36:36Z

Hi @zylo117,

I do training with my dataset 2700 image train and 300 valid.
python train.py -c 2 -p mydataset --batch_size 8 --lr 2.5e-4 --num_epochs 200
--load_weights /path/to/your/weights/efficientdet-d0.pth

The ratio and scale achor sont optimiser par code kmeans-anchor-ratios.

What is the problem according you?

Thanks

zylo117 · 2020-05-19T10:01:30Z

the loss is higher than expected. either you have lots of classes or it's still underfitting.

try smaller network like d0 or increase lr but be aware of overfitting

wenjun90 · 2020-05-19T10:18:34Z

Hi @zylo117

My problem is the detection for 4 class. I tried the efficientdet-d0 in framework detectron2 achive mAP=41. But in detectron2, it perform with d1,d2 not realy good.

I wanna try your framework. But I trained from yesterday in 10 hour with 200 epoche. With your tutorial perfom well but, my dataset, I don't know how to modify.

Thanks

zylo117 · 2020-05-19T11:49:58Z

try increasing lr

zylo117 · 2020-05-19T11:51:04Z

hold on, did you load d0 weights into a d2 model? of course the result is bad. it's basically training from scratch

wenjun90 · 2020-05-19T12:55:19Z

Hi @zylo117 ,
I have training with lr 1e-2 and 1e-3 but the result is still not good.

zylo117 · 2020-05-19T14:07:24Z

@wenjun90 #316 (comment)

akb46mayu · 2020-05-19T15:30:13Z

Hi @wenjun90 , did your learning rate drop? Can you share your learning rate plot with me (or just the number). For all of my experiments on customized data the learning rate never drops, so i just wanted to look at yours.

zylo117 · 2020-05-19T16:34:10Z

lr only drops on plateaus

akb46mayu · 2020-05-19T16:44:34Z

Hi @zylo117 , i see. Thanks a lot!

wenjun90 · 2020-05-20T08:09:08Z

Hi @zylo117 ,
In your framework, how to get the anchor generator like that:?
AnchorGenerator',
'SIZES': [[32, 40.31747359663594, 50.79683366298238],
[64, 80.63494719327188, 101.59366732596476],
[128, 161.26989438654377, 203.18733465192952],
[256, 322.53978877308754, 406.37466930385904],
[512, 645.0795775461751, 812.7493386077181]],
'ASPECT_RATIOS': [[0.5, 1.0, 2.0]]

Thank you so much

zylo117 · 2020-05-20T08:36:05Z

try these two
#308 (comment)
https://github.com/Cli98/anchor_computation_tool

but I haven't tested them yet, so use it at your own risk.

wenjun90 · 2020-05-28T07:55:43Z

It's good for coco dataset and your tutorial but my own dataset is not good. :(

zylo117 · 2020-05-28T08:29:46Z

great, I think you have done so well on coco.
It would be great if you can share your hyperparameters or how you train d0.

wenjun90 · 2020-05-28T09:04:56Z

Hi @zylo117,
I trained with this command:
python train.py -c 0 -p coco --head_only True --lr 1e-3 --batch_size 32 --load_weights weights/efficientdet-d0.pth --num_epochs 40

I set 40 epochs and I trained last night, Each epoch need 30 minutes for training. This morning, I stopped with 14 epoch and eval very well with these metrics. I try check time of prediction on gpu Tesla V100 with 21 FPS and on cpu with 3.3 FPS.

With my dataset, I really don't know how set these hyperparameter. I hope receive your advice.
My dataset is the text document with size image 1600x2400.

My objectif is to detect the region of text block in document like that:
https://miro.medium.com/max/1200/1*gAx3-sIpo09bPDCZ2fI_kw.png

Thank you very much!

zylo117 · 2020-05-28T11:48:32Z

set head only False to train the rest of the layers

wenjun90 · 2020-05-29T07:24:08Z

thank @zylo117 . I set False to train with my small dataset 3000 images, result is better but still smaller than faster rcnn. With rcnn, I get AP60, but Efficientdet-d0 only 35.

zylo117 · 2020-05-29T07:31:22Z

do you mean AP50 or AP50:95?
Are you comparing under the same metric?
There shouldn't be such a gap.
But then again, faster rcnn should be better than d0 if I remember correctly.

wenjun90 · 2020-05-29T09:36:03Z

Hi @zylo117

This is my result EfficientDet-d0 by your framework:

This is my results by EfficientDet-d0 in framwork Detectron2

This is my results by Faster RCNN in detectron2

zylo117 · 2020-05-29T14:52:42Z

are these two efficientdet the same anchors?
can you share the links?
And as you can, at least D2 or D3 can be close to fasterrcnn

akb46mayu · 2020-05-29T16:05:51Z

Hi,
I think you can try an overfitting experiment quickly, just train and validate on the same dataset to see whether you can overfit them.
1 I think 40 epochs may not be enough, I train 10 images from scratch and found 300 epochs at least can bring some reasonable detections, although not perfect.
2 Try on a small number of images, and see whether the current anchor settings can help you detect smaller objects. If not, then you should make the anchors smaller.
3 The image domain in coco and your text images are totally different, your objects are obviously smaller than the average size of objects on coco. The default settings of the anchors may not help GT boxes to be matched.
4 If you only want to detect the text data, train from scratch may be another option if you have enough gpu and 300 epochs time to kill. To my knowledge, if enough training epochs are guaranteed, train from scratch may be better than fine-tuning on efficient det.
5 another option is to force the input image to 1920 or sth larger by using effdet d0 or d1.

wenjun90 · 2020-05-30T12:53:36Z

Hi @zylo117,
I tried with d2, score mAP to 44.5.

I change the ratio anchor too, In general, my result with d2 still less than faster rcnn (mAP=60, baseline), but one things positive in this cas is time of prediction on cpu correspondant as in paper. (0.3s d0, 0.7s d1 and 1.2s d2).

Initial I set anchor ratio as default: [(1,1), (0.7,1.4),(1.4,0.7)].
then I change Anchor ratio by optimal values based on code kmean_computation anchor,
then I set other ratio [(1,1),(0.5,2),(2,0.5)]
The result is not so different in both of three cas.
I tried change lr from 1e-3 to 1e-4, I trained with 200 epoch, but overfitting at 50 epochs.
My dataset 2500 train and 450 test.
python train.py -c 2 --head_only False --batch_size 4 --num_workers 4 --lr 1e-3 --num_epochs 200 --project mydata --save_interval 1000 --load_weights weights/efficientdet-d2.pth

In framework detectron2, backbone efficientdet d0, d1, I trained, there is no too different to mAP but time of prediction d1 is so long with 3s on cpu.

Thank you your help.

zylo117 · 2020-05-30T13:53:52Z

I think training effdet with a larger batchsize can help improving

wenjun90 · 2020-05-30T19:18:00Z

Hi @zylo117 ,
I show you here my results of training on my dataset with backbone d0->d3 after 20 epoch.

batchsize = 4, learning rate = 1e-3, anchor ratio optimal = '[(1.3, 0.8), (2.1, 0.5), (3.1, 0.3)]'
python train.py -c 3 --head_only False --batch_size 4 --num_workers 4 --lr 1e-3 --num_epochs 20 --project data_cv --save_interval 1000 --load_weights weights/efficientdet-d3.pth

D1:

D2

wenjun90 · 2020-05-31T09:58:41Z

Hi @zylo117 ,
For information:
With 1 gpu testa v100
d0 maximum batch size: 16
d1 maximum batch size 8, if I set 16 --> out of memory
d2 maximum batch size 4 if I set 8 --> out of memory

This is normal?

zylo117 · 2020-05-31T10:26:21Z

Yes, if you are v100 32G. But I remember there is a v100 16G, then no.

satheeshkatipomu · 2020-08-05T14:36:52Z

Hi @wenjun90,

How you are getting AP, is it embedded in train.py or you are stopping training and then running coco_eval.py using checkpoint?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

problem of validation set #316

problem of validation set #316

wenjun90 commented May 19, 2020

zylo117 commented May 19, 2020

wenjun90 commented May 19, 2020

zylo117 commented May 19, 2020

zylo117 commented May 19, 2020 •

edited

Loading

wenjun90 commented May 19, 2020

zylo117 commented May 19, 2020

akb46mayu commented May 19, 2020

zylo117 commented May 19, 2020

akb46mayu commented May 19, 2020

wenjun90 commented May 20, 2020

zylo117 commented May 20, 2020

wenjun90 commented May 28, 2020 •

edited

Loading

zylo117 commented May 28, 2020

wenjun90 commented May 28, 2020 •

edited

Loading

zylo117 commented May 28, 2020

wenjun90 commented May 29, 2020

zylo117 commented May 29, 2020 •

edited

Loading

wenjun90 commented May 29, 2020 •

edited

Loading

zylo117 commented May 29, 2020

akb46mayu commented May 29, 2020

wenjun90 commented May 30, 2020 •

edited

Loading

zylo117 commented May 30, 2020

wenjun90 commented May 30, 2020 •

edited

Loading

wenjun90 commented May 31, 2020

zylo117 commented May 31, 2020

satheeshkatipomu commented Aug 5, 2020

problem of validation set #316

problem of validation set #316

Comments

wenjun90 commented May 19, 2020

zylo117 commented May 19, 2020

wenjun90 commented May 19, 2020

zylo117 commented May 19, 2020

zylo117 commented May 19, 2020 • edited Loading

wenjun90 commented May 19, 2020

zylo117 commented May 19, 2020

akb46mayu commented May 19, 2020

zylo117 commented May 19, 2020

akb46mayu commented May 19, 2020

wenjun90 commented May 20, 2020

zylo117 commented May 20, 2020

wenjun90 commented May 28, 2020 • edited Loading

zylo117 commented May 28, 2020

wenjun90 commented May 28, 2020 • edited Loading

zylo117 commented May 28, 2020

wenjun90 commented May 29, 2020

zylo117 commented May 29, 2020 • edited Loading

wenjun90 commented May 29, 2020 • edited Loading

zylo117 commented May 29, 2020

akb46mayu commented May 29, 2020

wenjun90 commented May 30, 2020 • edited Loading

zylo117 commented May 30, 2020

wenjun90 commented May 30, 2020 • edited Loading

wenjun90 commented May 31, 2020

zylo117 commented May 31, 2020

satheeshkatipomu commented Aug 5, 2020

zylo117 commented May 19, 2020 •

edited

Loading

wenjun90 commented May 28, 2020 •

edited

Loading

wenjun90 commented May 28, 2020 •

edited

Loading

zylo117 commented May 29, 2020 •

edited

Loading

wenjun90 commented May 29, 2020 •

edited

Loading

wenjun90 commented May 30, 2020 •

edited

Loading

wenjun90 commented May 30, 2020 •

edited

Loading