为什么LOMO并没有火起来呢？ #47

Flywolfs · 2023-07-21T03:55:56Z

个人感觉全参数FT还是会比LoRA这种Adapter的效果要好的，那为什么LOMO没有火起来呢？个人已经试过2张24GB的显卡用LOMO FT一个7B的BLOOM，感觉整体流程还蛮丝滑的，为什么在各个平台搜不到太多用LOMO的人呢，好奇怪。

tzjtatata · 2023-07-21T09:51:02Z

太多训练细节了，不能即插即用=。=
里面很多超参数，我想迁移到别的模型上去简直头皮发麻。。。什么gradient overflow，什么gradient checkpointing，什么param_coordinator...
不知道哪些该迁移哪些不该。

yangjianxin1 · 2023-08-18T07:07:32Z

个人感觉需要更加充分和更具有说服力的实验结果。
目前论文中比较了zero shot、lora和lomo的实验效果，但是缺少adam或者adamw的实验效果。
此前想尝试，但是成本过高，加上缺少lomo和adam的实验比较，所以没有继续研究。

yangjianxin1 · 2023-08-18T07:08:12Z

个人感觉全参数FT还是会比LoRA这种Adapter的效果要好的，那为什么LOMO没有火起来呢？个人已经试过2张24GB的显卡用LOMO FT一个7B的BLOOM，感觉整体流程还蛮丝滑的，为什么在各个平台搜不到太多用LOMO的人呢，好奇怪。

请问微调之后的效果如何？ @Flywolfs

misonsky · 2024-01-04T15:11:58Z

太多训练细节了，不能即插即用=。= 里面很多超参数，我想迁移到别的模型上去简直头皮发麻。。。什么gradient overflow，什么gradient checkpointing，什么param_coordinator... 不知道哪些该迁移哪些不该。

Gradient checkpointing will also reduce the memory usage of forward activation, but the author does not seem to have any explanation or comparison. We are all giants standing on the shoulders of giants.

KaiLv69 · 2024-01-11T11:49:35Z

太多训练细节了，不能即插即用=。= 里面很多超参数，我想迁移到别的模型上去简直头皮发麻。。。什么gradient overflow，什么gradient checkpointing，什么param_coordinator... 不知道哪些该迁移哪些不该。

Gradient checkpointing will also reduce the memory usage of forward activation, but the author does not seem to have any explanation or comparison. We are all giants standing on the shoulders of giants.

We discussed gradient checkpointing in section 2 as related work. and compared at Table 1 on LOMO paper.

misonsky mentioned this issue Jan 4, 2024

Serious conclusion: LOMO does not significantly reduce GPU memory usage！ #72

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

为什么LOMO并没有火起来呢？ #47

为什么LOMO并没有火起来呢？ #47

Flywolfs commented Jul 21, 2023

tzjtatata commented Jul 21, 2023

yangjianxin1 commented Aug 18, 2023

yangjianxin1 commented Aug 18, 2023

misonsky commented Jan 4, 2024

KaiLv69 commented Jan 11, 2024

为什么LOMO并没有火起来呢？ #47

为什么LOMO并没有火起来呢？ #47

Comments

Flywolfs commented Jul 21, 2023

tzjtatata commented Jul 21, 2023

yangjianxin1 commented Aug 18, 2023

yangjianxin1 commented Aug 18, 2023

misonsky commented Jan 4, 2024

KaiLv69 commented Jan 11, 2024