You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code chunking iterator works in terms of byte offsets. But the mask_comments function (and ´mask_sub_scopes` too, probably) construct a new string based on differences of byte offsets.
That means that if a comment or string contains non-ASCII characters, the "masked-out" string will have more characters than the original source, since multi-byte characters got replaced by multiple single-byte characters (spaces).
I can't say if this causes problems such as wrong indices in the matches...
A related problem is that if a character literal contains an escape sequence, the "masked-out" version of the code is not valid Rust anymore, since e.g. '\'' gets masked to ' ' (with two spaces). Again, I can't say if that causes an actual problem.
The text was updated successfully, but these errors were encountered:
Good catch.
I think internally it makes sense if we hold the byte offset in the file, so replacing a multibyte char with the same number of spaces is the right thing to do from the perspective of masking.
I'm not sure what is best to do about character literals with an escape sequence - I assume the parser balks at multi space char literals (I haven't tested). Maybe we should replace with a 1 space char literal and an extra space after the closing quote?.
The code chunking iterator works in terms of byte offsets. But the
mask_comments
function (and ´mask_sub_scopes` too, probably) construct a new string based on differences of byte offsets.That means that if a comment or string contains non-ASCII characters, the "masked-out" string will have more characters than the original source, since multi-byte characters got replaced by multiple single-byte characters (spaces).
I can't say if this causes problems such as wrong indices in the matches...
A related problem is that if a character literal contains an escape sequence, the "masked-out" version of the code is not valid Rust anymore, since e.g.
'\''
gets masked to' '
(with two spaces). Again, I can't say if that causes an actual problem.The text was updated successfully, but these errors were encountered: