You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My reproduced result is 72 but paper result is 84.
And I found that compared to FT, using Lora converges slower and more instable. So can you provide more details on MeZO(LORA), especially how many iterations are required to converge using Lora?
The text was updated successfully, but these errors were encountered:
It seems that you are not using the correct hyperparameter. Please check Table 15 in our paper for the hyperparameter used for all the RoBERTa experiments. Also, all the reported results are aggregated via grid search over multiple hyperparameter combinations as demonstrated in Table 15.
Hi,
I use transformers==4.28.1 and torch==2.1.0
I run the following command:
My reproduced result is 72 but paper result is 84.
And I found that compared to FT, using Lora converges slower and more instable. So can you provide more details on MeZO(LORA), especially how many iterations are required to converge using Lora?
The text was updated successfully, but these errors were encountered: