Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate literals at a later stage #12449

Open
Vexu opened this issue Aug 14, 2022 · 1 comment · May be fixed by #20596
Open

Validate literals at a later stage #12449

Vexu opened this issue Aug 14, 2022 · 1 comment · May be fixed by #20596
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. error message This issue points out an error message that is unhelpful and should be improved. frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Milestone

Comments

@Vexu
Copy link
Member

Vexu commented Aug 14, 2022

Most people would recognize 0x213g as a malformed hex literal and the same goes for '' being an empty char literal but currently Zig validates these kinds of error in the tokenizer, giving out terrible, vague errors like expected expression, found 'invalid bytes' and preventing formatting.

It would instead be better to have the tokenizer only validate things that actually prevent the tokens from being valid such as newlines in string literals (which should also give a proper error message instead of the same invalid bytes one) and leave everything else to AstGen.

@Vexu Vexu added enhancement Solving this issue will likely involve adding new logic or components to the codebase. frontend Tokenization, parsing, AstGen, Sema, and Liveness. error message This issue points out an error message that is unhelpful and should be improved. labels Aug 14, 2022
@Vexu Vexu added this to the 0.11.0 milestone Aug 14, 2022
@moosichu
Copy link
Contributor

I'm keen to take a look at this once #12661 has been merged, but currently blocked on waiting for that or otherwise the work might potentially conflict there.

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
tau-dev added a commit to tau-dev/zig that referenced this issue Jul 12, 2024
Closes ziglang#12449 and ziglang#13809.
Generate .invalid tokens only in severe cases (illegal line break or
null). This allows us to continue parsing a lot more often, allowing for
more and better error messages.

The numeric literals mentioned in ziglang#12449 already had this treatment,
this commit applies it to char literals and identifiers.

In error messages, count Unicode codepoints to line up the source
highlight. Render tabs as four spaces, Zig's default indentation.
tau-dev added a commit to tau-dev/zig that referenced this issue Jul 12, 2024
Closes ziglang#12449 and ziglang#13809.
Generate .invalid tokens only in severe cases (illegal line break or
null). This allows us to continue parsing a lot more often, allowing for
more and better error messages.

The numeric literals mentioned in ziglang#12449 already had this treatment,
this commit applies it to char literals and identifiers.

In error messages, count Unicode codepoints to line up the source
highlight. Render tabs as four spaces, Zig's default indentation.
tau-dev added a commit to tau-dev/zig that referenced this issue Jul 12, 2024
Closes ziglang#12449 and ziglang#13809.
Generate .invalid tokens only in severe cases (illegal line break or
null). This allows us to continue parsing a lot more often, allowing for
more and better error messages.

The numeric literals mentioned in ziglang#12449 already had this treatment,
this commit applies it to char literals and identifiers.

In error messages, count Unicode codepoints to line up the source
highlight. Render tabs as four spaces, Zig's default indentation.
tau-dev added a commit to tau-dev/zig that referenced this issue Jul 13, 2024
Closes ziglang#12449 and ziglang#13809.
Generate .invalid tokens only in severe cases (illegal line break or
null). This allows us to continue parsing a lot more often, allowing for
more and better error messages.

The numeric literals mentioned in ziglang#12449 already had this treatment,
this commit applies it to char literals and identifiers.

In error messages, count Unicode codepoints to line up the source
highlight. Render tabs as four spaces, Zig's default indentation.
tau-dev added a commit to tau-dev/zig that referenced this issue Jul 14, 2024
Closes ziglang#12449 and ziglang#13809.
Generate .invalid tokens only in severe cases (illegal line break or
null). This allows us to continue parsing a lot more often, allowing for
more and better error messages.

The numeric literals mentioned in ziglang#12449 already had this treatment,
this commit applies it to char literals and identifiers.

In error messages, count Unicode codepoints to line up the source
highlight. Render tabs as four spaces, Zig's default indentation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase. error message This issue points out an error message that is unhelpful and should be improved. frontend Tokenization, parsing, AstGen, Sema, and Liveness.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants