Validate literals at a later stage #12449

Vexu · 2022-08-14T20:24:59Z

Most people would recognize 0x213g as a malformed hex literal and the same goes for '' being an empty char literal but currently Zig validates these kinds of error in the tokenizer, giving out terrible, vague errors like expected expression, found 'invalid bytes' and preventing formatting.

It would instead be better to have the tokenizer only validate things that actually prevent the tokens from being valid such as newlines in string literals (which should also give a proper error message instead of the same invalid bytes one) and leave everything else to AstGen.

The text was updated successfully, but these errors were encountered:

moosichu · 2023-02-17T09:49:44Z

I'm keen to take a look at this once #12661 has been merged, but currently blocked on waiting for that or otherwise the work might potentially conflict there.

Closes ziglang#12449 and ziglang#13809. Generate .invalid tokens only in severe cases (illegal line break or null). This allows us to continue parsing a lot more often, allowing for more and better error messages. The numeric literals mentioned in ziglang#12449 already had this treatment, this commit applies it to char literals and identifiers. In error messages, count Unicode codepoints to line up the source highlight. Render tabs as four spaces, Zig's default indentation.

Vexu added enhancement Solving this issue will likely involve adding new logic or components to the codebase. frontend Tokenization, parsing, AstGen, Sema, and Liveness. error message This issue points out an error message that is unhelpful and should be improved. labels Aug 14, 2022

Vexu added this to the 0.11.0 milestone Aug 14, 2022

Vexu mentioned this issue Aug 30, 2022

Cases of retokenization invalidly assume that the tokenizer is stateless leading to crashes in various edge cases #12674

Closed

moosichu mentioned this issue Aug 30, 2022

Scan from line start when finding tag in tokenizer #12692

Closed

Vexu mentioned this issue Aug 31, 2022

Validate number literals in AstGen #12699

Merged

moosichu mentioned this issue Feb 17, 2023

Correctly handle carriage return characters according to the spec #12661

Merged

andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023

tau-dev linked a pull request Jul 12, 2024 that will close this issue

Accept more illegal characters in the lexer, report them in AstGen #20596

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate literals at a later stage #12449

Validate literals at a later stage #12449

Vexu commented Aug 14, 2022

moosichu commented Feb 17, 2023

Validate literals at a later stage #12449

Validate literals at a later stage #12449

Comments

Vexu commented Aug 14, 2022

moosichu commented Feb 17, 2023