fix: Correct error message for invalid bytes in multiline strings and comments #21459

WillLillis · 2024-09-20T03:37:45Z

This change provides a correct error message when an invalid byte is found inside of a multiline string line, comments, and doc comments.

Inside multiline strings

Taking the example from #20900:

const foo =
    \\const S = struct {
    \\<TAB>// hello
    \\}
;

Before:

src/main.zig:4:1: error: expected ';' after statement
    \\ // hello
^

After:

src/main.zig:4:5: error: expected 'a string literal', found invalid bytes
    \\ // hello
    ^~~~~~~~~~~
src/main.zig:4:7: note: invalid byte: '\t'
    \\ // hello
      ^~~~~~~~~~~

Doc comments

/// Some <TAB>comment

Before:

src/main.zig:1:1: error: expected type expression, found 'invalid token'
/// Some  comment
^~~~~~~~~~~~~~~~~

After:

src/main.zig:1:1: error: expected 'a document comment', found invalid bytes
/// Some  comment
^~~~~~~~~~~~~~~~~
src/main.zig:1:10: note: invalid byte: '\t'
/// Some  comment
         ^~~~~~~~~~~~~~~~~

Comments

// Some <TAB>comment

Before:

src/main.zig:1:1: error: expected type expression, found 'invalid token'
// Some  comment
^~~~~~~~~~~~~~~~~

After:

src/main.zig:1:1: error: expected 'a comment', found invalid bytes
// Some  comment
^~~~~~~~~~~~~~~~~
src/main.zig:1:9: note: invalid byte: '\t'
// Some  comment
        ^~~~~~~~~~~~~~~~~

I'm not 100% confident in the set of invalid bytes I added in lowerAstErrors, as it's just based off of those listed in the multiline_string_literal_line case of Tokenizer.next(). I think this part of the PR in particular could benefit from a close second check. :)

I initially thought that the similar problem with invalid bytes inside comments (mentioned here), but there isn't a .comment variant (only .doc_comment and .container_doc_comment) of Token.Tag to report the expected token as. Another variant to Ast.Error.Tag could be added to handle this case, but I wanted to check before doing so. See followup comment.

One minor nit/ bikeshed question: The rendered error message for this case reads error: expected 'a string literal', found invalid bytes. Since this error is triggered by the first single invalid byte encountered by the tokenizer, should this message use "byte" instead?

WillLillis · 2025-01-27T06:47:16Z

Looking at this again while rebasing, I realized handling invalid bytes inside comments was pretty straightforward. I put the associated changes in a separate commit so it can be reverted in case I took the wrong approach.

Vexu · 2025-01-27T13:43:29Z

This doesn't need to be split into an error and a note like the previous implementation was, "comment/string contains invalid byte '<byte>'" would be more readable IMO.

WillLillis · 2025-01-27T15:03:48Z

This doesn't need to be split into an error and a note like the previous implementation was, "comment/string contains invalid byte '<byte>'" would be more readable IMO.

I thought the error/note implementation was helpful because the note shows exactly where the (potentially invisible) invalid byte is in the token. Do you think the gain in readability is worth the tradeoff in losing this information? If so I'm happy to try to rework this soon (hopefully later today/tomorrow).

Vexu · 2025-01-27T15:11:33Z

You could have the error point at the invalid byte. I don't think the start of the token (so //, \\ or ") is that necessary since it's always on the same line.

lib/std/zig/Ast.zig

lib/std/zig/AstGen.zig

Vexu

A further improvement would be adding an unterminated string/char literal error for string and char literals where the invalid byte is EOF/NL.

lib/std/zig/AstGen.zig

comments, and character literals

WillLillis · 2025-02-05T02:41:15Z

A further improvement would be adding an unterminated string/char literal error for string and char literals where the invalid byte is EOF/NL.

That's a good idea. I implemented this locally, would it be a better to push those changes to this branch or open a followup PR once this is merged?

Vexu · 2025-02-05T09:11:07Z

I think a follow up would be better.

WillLillis force-pushed the invalid_byte branch from 40d541c to 1cb5cbe Compare January 27, 2025 06:45

WillLillis changed the title ~~fix: Correct error message for invalid bytes in multiline strings~~ fix: Correct error message for invalid bytes in multiline strings and comments Jan 27, 2025

WillLillis force-pushed the invalid_byte branch 3 times, most recently from 2a699a1 to 5fa6cf3 Compare January 27, 2025 11:54

Vexu reviewed Jan 29, 2025

View reviewed changes

lib/std/zig/Ast.zig Outdated Show resolved Hide resolved

lib/std/zig/AstGen.zig Outdated Show resolved Hide resolved

WillLillis force-pushed the invalid_byte branch from e119df4 to e3f0259 Compare January 29, 2025 22:49

Vexu approved these changes Feb 4, 2025

View reviewed changes

lib/std/zig/AstGen.zig Outdated Show resolved Hide resolved

WillLillis force-pushed the invalid_byte branch from e3f0259 to ea9020a Compare February 5, 2025 01:48

fix: Correct error message for invalid bytes in strings,

7e166a2

comments, and character literals

WillLillis force-pushed the invalid_byte branch from ea9020a to 7e166a2 Compare February 5, 2025 02:19

Vexu merged commit cf059ee into ziglang:master Feb 5, 2025
10 checks passed

WillLillis deleted the invalid_byte branch February 5, 2025 09:14

jiacai2050 mentioned this pull request Feb 5, 2025

增加 Zig 字符串中，对 TAB CR 的处理说明 zigcc/zig-course#208

Closed

WillLillis mentioned this pull request Feb 6, 2025

improve error message for unterminated string and character literals #22783

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Correct error message for invalid bytes in multiline strings and comments #21459

fix: Correct error message for invalid bytes in multiline strings and comments #21459

WillLillis commented Sep 20, 2024 •

edited

Loading

WillLillis commented Jan 27, 2025

Vexu commented Jan 27, 2025

WillLillis commented Jan 27, 2025

Vexu commented Jan 27, 2025

Vexu left a comment

WillLillis commented Feb 5, 2025

Vexu commented Feb 5, 2025

fix: Correct error message for invalid bytes in multiline strings and comments #21459

fix: Correct error message for invalid bytes in multiline strings and comments #21459

Conversation

WillLillis commented Sep 20, 2024 • edited Loading

WillLillis commented Jan 27, 2025

Vexu commented Jan 27, 2025

WillLillis commented Jan 27, 2025

Vexu commented Jan 27, 2025

Vexu left a comment

Choose a reason for hiding this comment

WillLillis commented Feb 5, 2025

Vexu commented Feb 5, 2025

WillLillis commented Sep 20, 2024 •

edited

Loading