-
Notifications
You must be signed in to change notification settings - Fork 332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lr_scheduler affect the actual learning rate #9
Comments
It's a feature. Update: |
Thank you for reply. I have tried comparing Adam with Adabound on lstm language model. Found that adabound indeed makes learning process more stable, however, the leaning speed and converge rate is slower than Adam. The initial params is shown on screenshot. Should I try higher lr (current is 0.008) next, or any better suggestion? Thank you. P.S. I also used lr_scheduler, and do scheduler.step(valid_loss) evey 1/5 epoch.
|
Indeed, there's no guarantee that AdaBound would be faster than Adam, and we never claim that. Regarding your specific situation, maybe you can try a lower transformation speed from Adam to SGD, viz. lower gamma value. Personally, I would regard this optimizer more like SGD than Adam. Its strength is that it can quickly achieve a relatively small loss, and we may directly fine-tune an SGD at the final stage. As we know, tuning SGD is not that easy, therefore we shouldn't expect a perfect result coming that easy either. Lastly, we did find that SGD is worse than Adam in some NLP tasks and are still finding why. In this case, I am afraid AdaBound may not outperform Adam. :( |
However lr_scheduler may change param_group['lr'] during training, therefore the final_lr, lower_bound, upper_bound will also be affected.
Should I not use lr_scheduler and let AbaBound adapts the params to transform from Adam to SGD?
Thank you very much!
The text was updated successfully, but these errors were encountered: