Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

能否介绍更多关于english-chinese alignment dataset的细节呢? #8

Open
nlp4whp opened this issue Apr 4, 2023 · 1 comment

Comments

@nlp4whp
Copy link

nlp4whp commented Apr 4, 2023

比如您是先通过english-chinese alignment dataset做"预微调",让llama更适应中文,然后在用lora做指令微调么?

另外您构建english-chinese alignment dataset的思路是什么呢,中英对齐的数据格式为何是`en to cn 这样?

最后感谢您的开源工作

@muzhi1991
Copy link

我也有同样的问题,作者是直接用english-chinese alignment dataset先做预微调嘛?这个用多少设备,batch size是多少?训练多少了epoch?
有对比一下直接用alpaca-chinese-dataset来做指令微调的效果嘛?
感谢作者!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants