-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Correct error message for invalid bytes in multiline strings and comments #21459
Conversation
40d541c
to
1cb5cbe
Compare
Looking at this again while rebasing, I realized handling invalid bytes inside comments was pretty straightforward. I put the associated changes in a separate commit so it can be reverted in case I took the wrong approach. |
2a699a1
to
5fa6cf3
Compare
This doesn't need to be split into an error and a note like the previous implementation was, |
I thought the error/note implementation was helpful because the note shows exactly where the (potentially invisible) invalid byte is in the token. Do you think the gain in readability is worth the tradeoff in losing this information? If so I'm happy to try to rework this soon (hopefully later today/tomorrow). |
You could have the error point at the invalid byte. I don't think the start of the token (so |
e119df4
to
e3f0259
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A further improvement would be adding an unterminated string/char literal
error for string and char literals where the invalid byte is EOF/NL.
e3f0259
to
ea9020a
Compare
comments, and character literals
ea9020a
to
7e166a2
Compare
That's a good idea. I implemented this locally, would it be a better to push those changes to this branch or open a followup PR once this is merged? |
I think a follow up would be better. |
This change provides a correct error message when an invalid byte is found inside of a multiline string line, comments, and doc comments.
Inside multiline strings
Taking the example from #20900:Before:
After:
Doc comments
/// Some <TAB>comment
Before:
After:
Comments
// Some <TAB>comment
Before:
After:
I'm not 100% confident in the set of invalid bytes I added in
lowerAstErrors
, as it's just based off of those listed in themultiline_string_literal_line
case ofTokenizer.next()
. I think this part of the PR in particular could benefit from a close second check. :)I initially thought that the similar problem with invalid bytes inside comments (mentioned here), but there isn't aSee followup comment..comment
variant (only.doc_comment
and.container_doc_comment
) ofToken.Tag
to report the expected token as. Another variant toAst.Error.Tag
could be added to handle this case, but I wanted to check before doing so.One minor nit/ bikeshed question: The rendered error message for this case reads
error: expected 'a string literal', found invalid bytes
. Since this error is triggered by the first single invalid byte encountered by the tokenizer, should this message use "byte" instead?