We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
在cts vstore_global case中,当thread num不够32的时候,会在0x90000000写0,导致输出memory部分数据跟golden不匹配。 这是因为硬件在处理的时候mask为全F,这样32个线程都在处理数据,而最后的几个线程因为没有数据,导致计算到的地址偏移为0,就会重复的去对0x90000000写0.
解决办法: 通过vblt的方法,将thread id与global size进行对比,以生成正确的mask。 或者在硬件层面,对于初始的mask值,需要根据global_size - num_thread *N得到
需要进行进一步讨论,讨论后再决定采用什么方法进行修改。
The text was updated successfully, but these errors were encountered:
No branches or pull requests
在cts vstore_global case中,当thread num不够32的时候,会在0x90000000写0,导致输出memory部分数据跟golden不匹配。
这是因为硬件在处理的时候mask为全F,这样32个线程都在处理数据,而最后的几个线程因为没有数据,导致计算到的地址偏移为0,就会重复的去对0x90000000写0.
解决办法:
通过vblt的方法,将thread id与global size进行对比,以生成正确的mask。
或者在硬件层面,对于初始的mask值,需要根据global_size - num_thread *N得到
需要进行进一步讨论,讨论后再决定采用什么方法进行修改。
The text was updated successfully, but these errors were encountered: