Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何训练一个垂直领域的模型 #122

Open
lyw02 opened this issue Jan 22, 2025 · 1 comment
Open

如何训练一个垂直领域的模型 #122

lyw02 opened this issue Jan 22, 2025 · 1 comment

Comments

@lyw02
Copy link

lyw02 commented Jan 22, 2025

比如我有一个数据集,使用自然语言描述数据结构,模型的任务是还原出数据结构。
是否需要从头构建tokenizer和预训练数据集呢,以及tokenizer和预训练数据集是否要完全基于我的数据集构建呢,望解惑。

@jingyaogong
Copy link
Owner

tokenizer 在任意数据集中都不需要重新构建

minimind2 将会给出新的数据集格式,可以用于构建自己的垂直任务

未来几天很快发布

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants