Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

训练时,输入的token会补全到最大长度.那是不是意味着attention计算时,是按照满token的时间复杂度计算的 #118

Open
lightning0016 opened this issue Jan 20, 2025 · 2 comments

Comments

@lightning0016
Copy link

推理时,不需要补全,所以计算时间与输入token长度有关?而训练时,则是满token的时间?我理解的对吗

@jingyaogong
Copy link
Owner

☑️ 推理时,不需要补全,所以计算时间与输入token长度有关
☑️ 而训练时,则是满token的时间。

但是需要补充的几点:
1、推理时,如果没有kv_cache,则推理时间随token长度二次增长;有kv_cache的情况下,推理耗时「几乎」和token长度无关。(之所以用「几乎」,是因为随着长度增长内存读写开销也要考虑)
2、训练时确实是padding满最大max-len长度的tokens计算。只不过相较于推理还多出了「反向传播」、「优化器更新」、「loss计算」等时间,相同长度tokens的训练是推理时间的n倍。

@cqcracked
Copy link

有kv_cache的情况下,推理耗时「几乎」和token长度无关? 1,2,3推理出4之后,这个4也要同1,2,3做自注意力,然后1,2,3,4推理出5.依次这样,做自注意力还是和token长度有关,所以推理耗时也同token长度有关?谢谢
@jingyaogong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants