Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA) #39

Liu-M-H · 2024-10-15T05:30:53Z

Hi,
I use transformers==4.28.1 and torch==2.1.0

I run the following command:

TASK=SNLI K=512 SEED=42 BS=64 LR=1e-4 EPS=1e-3 STEP=50000 MODEL=roberta-large EXTRA_TAG=lora bash mezo.sh --apply_lora --lora_r 8 --lora_alpha 16

My reproduced result is 72 but paper result is 84.

And I found that compared to FT, using Lora converges slower and more instable. So can you provide more details on MeZO(LORA), especially how many iterations are required to converge using Lora?

gaotianyu1350 · 2024-10-16T14:39:01Z

It seems that you are not using the correct hyperparameter. Please check Table 15 in our paper for the hyperparameter used for all the RoBERTa experiments. Also, all the reported results are aggregated via grid search over multiple hyperparameter combinations as demonstrated in Table 15.

Liu-M-H · 2024-10-16T15:09:27Z

Thanks for your reply!
Would you mind telling me the minimum number of iterations required for RoBERTa experiments with MeZO(LoRA)?

gaotianyu1350 · 2024-10-18T14:38:02Z

All the RoBERTa experiments were run with 100K steps. Using fewer than that may lead to very different results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA) #39

Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA) #39

Liu-M-H commented Oct 15, 2024

gaotianyu1350 commented Oct 16, 2024

Liu-M-H commented Oct 16, 2024

gaotianyu1350 commented Oct 18, 2024

Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA) #39

Cannot reproduce the results for Roberta-large on SNLI with MeZO(LORA) #39

Comments

Liu-M-H commented Oct 15, 2024

gaotianyu1350 commented Oct 16, 2024

Liu-M-H commented Oct 16, 2024

gaotianyu1350 commented Oct 18, 2024