Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about Training Settings #3

Open
LittleQuteSweetie opened this issue Jul 1, 2021 · 6 comments
Open

Question about Training Settings #3

LittleQuteSweetie opened this issue Jul 1, 2021 · 6 comments

Comments

@LittleQuteSweetie
Copy link

Hello, thanks for sharing the code !
We are highly interested in this great work and planing to run it on our custom dataset, and we want to have a further understanding about the training settings. It is said in the paper that

we train our proposed calibration network on two Nvidia GP100 GPU with batch size 120 and total epochs 120

, and it seems in the train_with_sacred.py that the number of GPU and the batch size are both doubled, yet the initial learning rate is not, could this be a problem in re-implementation? And we find it would take about 100 hours to train the model from scratch on two Tesla V100 GPU, and would like to know how long does it take when trained on your 4 GPUs, thanks!

@realshijy
Copy link

Same question here, is it possible to share the training log? Thanks!

@liyang159357
Copy link

I feel the paper was faked

@ccfendouing
Copy link

I also have the same question.Can you tell me the lr and the batchsize and gpu num?

@chenpengxin
Copy link

chenpengxin commented May 30, 2022

I tried with batch_size=240, max_epoch=120(120 for the first and 50 for others) using two quadro gv100. It tooks 1 hour per epoch. Remember there are 5 weights to be trained. So it will take 120 + 4*50 =320 hours = 13 days to train the model!

@StiphyJay
Copy link

I tried with batch_size=240, max_epoch=120(120 for the first and 50 for others) using two quadro gv100. It tooks 1 hour per epoch. Remember there are 5 weights to be trained. So it will take 120 + 4*50 =320 hours = 13 days to train the model!

Another problem. In your pretrained model, the epoch in every model is 300. So do you really train the model with 120epoch to get the paper reported result?

@liyang159357
Copy link

liyang159357 commented Oct 9, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants