Skip to content

Commit

Permalink
Use _AVG_CHAR_PER_TOKEN in mode=prefix
Browse files Browse the repository at this point in the history
  • Loading branch information
bhavnicksm committed Dec 24, 2024
1 parent 370966d commit 3aae73f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion src/chonkie/refinery/overlap.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ def _prefix_overlap_token_exact(self, chunk: Chunk) -> Optional[Context]:
return None

# Take 6x context_size characters to ensure enough tokens
char_window = min(len(chunk.text), self.context_size * 6)
char_window = min(len(chunk.text), int(self.context_size * self._AVG_CHAR_PER_TOKEN))
text_portion = chunk.text[-char_window:]

# Get exact token boundaries
Expand Down

0 comments on commit 3aae73f

Please sign in to comment.