Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA) #39

Open
Liu-M-H opened this issue Oct 15, 2024 · 3 comments
Open

Comments

@Liu-M-H
Copy link

Liu-M-H commented Oct 15, 2024

Hi,
I use transformers==4.28.1 and torch==2.1.0

I run the following command:

TASK=SNLI K=512 SEED=42 BS=64 LR=1e-4 EPS=1e-3 STEP=50000 MODEL=roberta-large EXTRA_TAG=lora bash mezo.sh --apply_lora --lora_r 8 --lora_alpha 16

My reproduced result is 72 but paper result is 84.

And I found that compared to FT, using Lora converges slower and more instable. So can you provide more details on MeZO(LORA), especially how many iterations are required to converge using Lora?

@gaotianyu1350
Copy link
Member

It seems that you are not using the correct hyperparameter. Please check Table 15 in our paper for the hyperparameter used for all the RoBERTa experiments. Also, all the reported results are aggregated via grid search over multiple hyperparameter combinations as demonstrated in Table 15.

@Liu-M-H
Copy link
Author

Liu-M-H commented Oct 16, 2024

Thanks for your reply!
Would you mind telling me the minimum number of iterations required for RoBERTa experiments with MeZO(LoRA)?

@gaotianyu1350
Copy link
Member

All the RoBERTa experiments were run with 100K steps. Using fewer than that may lead to very different results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants