You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
this is not a sensible issues. Of course you can create problems adaptive optimizers are not good at, there's no free lunch in this miserable world! This repo ist for adabound and not a general discussion of adaptive optimizers.
Hi, LeanderK. Thanks for your comment. You may misunderstand my purpose. There's no free lunch in this world indeed, so there's also no free lunch between exploration & exploitation for optimizers. Sometimes, adaptive methods actually bring good convergence speed in early stage but get worse optimization results in end training stage. I did not impugn adabound or ANY adaptive methods, just gave some my suggests: if you are going to train a NN, please first try SGD with fine-tuned hyper-parameters in order to save your EXPENSIVE GPU time.
The link:
"On the Convergence of Adam and Beyond"
"The Marginal Value of Adaptive Gradient Methods in Machine Learning"
I tested three methods in a very simple problem, and got the result as above.
Code are printed here:
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import adabound
class Net(nn.Module):
DIM = 30
epochs = 1000
xini = (torch.ones(1, DIM) * 100)
opti = (torch.zeros(1, DIM) * 100)
lr = 0.01
net = Net(DIM)
objfun = nn.MSELoss()
loss_adab = []
loss_adam = []
loss_sgd = []
for epoch in range(epochs):
lr = 0.01
net = Net(DIM)
objfun = nn.MSELoss()
for epoch in range(epochs):
lr = 0.001
net = Net(DIM)
objfun = nn.MSELoss()
for epoch in range(epochs):
plt.figure()
plt.plot(loss_adab, label='adabound')
plt.plot(loss_adam, label='adam')
plt.plot(loss_sgd, label='SGD')
plt.yscale('log')
plt.xlabel('epochs')
plt.ylabel('Log(loss)')
plt.legend()
plt.savefig('camp.png', dpi=600)
plt.show()
The text was updated successfully, but these errors were encountered: