We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
推理时,不需要补全,所以计算时间与输入token长度有关?而训练时,则是满token的时间?我理解的对吗
The text was updated successfully, but these errors were encountered:
☑️ 推理时,不需要补全,所以计算时间与输入token长度有关 ☑️ 而训练时,则是满token的时间。
但是需要补充的几点: 1、推理时,如果没有kv_cache,则推理时间随token长度二次增长;有kv_cache的情况下,推理耗时「几乎」和token长度无关。(之所以用「几乎」,是因为随着长度增长内存读写开销也要考虑) 2、训练时确实是padding满最大max-len长度的tokens计算。只不过相较于推理还多出了「反向传播」、「优化器更新」、「loss计算」等时间,相同长度tokens的训练是推理时间的n倍。
Sorry, something went wrong.
有kv_cache的情况下,推理耗时「几乎」和token长度无关? 1,2,3推理出4之后,这个4也要同1,2,3做自注意力,然后1,2,3,4推理出5.依次这样,做自注意力还是和token长度有关,所以推理耗时也同token长度有关?谢谢 @jingyaogong
No branches or pull requests
推理时,不需要补全,所以计算时间与输入token长度有关?而训练时,则是满token的时间?我理解的对吗
The text was updated successfully, but these errors were encountered: