-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
为什么LOMO并没有火起来呢? #47
Comments
太多训练细节了,不能即插即用=。= |
个人感觉需要更加充分和更具有说服力的实验结果。 |
请问微调之后的效果如何? @Flywolfs |
Gradient checkpointing will also reduce the memory usage of forward activation, but the author does not seem to have any explanation or comparison. We are all giants standing on the shoulders of giants. |
We discussed gradient checkpointing in section 2 as related work. and compared at Table 1 on LOMO paper. |
个人感觉全参数FT还是会比LoRA这种Adapter的效果要好的,那为什么LOMO没有火起来呢?个人已经试过2张24GB的显卡用LOMO FT一个7B的BLOOM,感觉整体流程还蛮丝滑的,为什么在各个平台搜不到太多用LOMO的人呢,好奇怪。
The text was updated successfully, but these errors were encountered: