Splitting tokenization part to reduce time #432

yusufcakmakk · 2023-12-01T09:49:25Z

yusufcakmakk
Dec 1, 2023

Hi all,

I have a suggestion for the tokenization part in here. When we start the process, and we have different block sizes, it calculates all tokens again and then grouping them. Instead, the tokenization step can be used once and then grouped based on block size.

What do you think about this improvement?

iMountTai · 2023-12-08T04:35:55Z

iMountTai
Dec 8, 2023
Collaborator

This can be achieved by modifying here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splitting tokenization part to reduce time #432

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Splitting tokenization part to reduce time #432

yusufcakmakk Dec 1, 2023

Replies: 1 comment

iMountTai Dec 8, 2023 Collaborator

yusufcakmakk
Dec 1, 2023

iMountTai
Dec 8, 2023
Collaborator