-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly handle carriage return characters according to the spec #12661
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a test, you'll have to use string literals in test/compile_errors.zig
.
b289c7b
to
fd960f9
Compare
582b344
to
4af4034
Compare
I've added a couple of tests that should hopefully do the trick, one to check the case that should work and the other to test the case that should error. |
cd43521
to
8fc2537
Compare
65f116d
to
965779a
Compare
965779a
to
4ee22ca
Compare
7889cfc
to
18dade0
Compare
18dade0
to
2bc3a48
Compare
2bc3a48
to
eb3fc7d
Compare
CI checks are failing |
Yeah - I haven't had the time to address those test issues for the the past while, but should hopefully be able to at the end of this week if that's ok :) Sorry for the delay on this one! |
eb3fc7d
to
601398d
Compare
Things should be fixed now! Ended up being something really simple, but only just had the time to get to it now. |
Are there any further changes desired here? Just double checking as I might have some time available to act on feedback this month if needed. |
601398d
to
4e09283
Compare
Just resolved the merge conflicts that I saw had cropped-up (hopefully 🤞) |
4e09283
to
2ed3ce2
Compare
This resolves a crash that can occur for invalid bytes like carriage returns that are valid characters when not parsed from within literals. There are potentially other edge cases this could resolve as well, as the calling code for this function didn't account for any potential 'pending_invalid_tokens' that could be queued up by the tokenizer from within another state.
Follow the guidance of ziglang#38: > However CR directly before NL is interpreted as only a newline and not part of the multiline string. zig fmt will delete the CR. Zig fmt already had code for deleting carriage returns, but would still crash - now it no longer does so. Carriage returns encountered before line-feeds are now appropriately removed on program compilation as well.
Previous commit was much less strict about this, this more closely matches the desired spec of only allow CR characters in a CRLF pair, but not otherwise.
Missed this comment from ziglang/zig-spec#83: > CR used as whitespace, whether directly preceding NL or stray, is still unambiguously whitespace. It is accepted by the grammar and replaced by the canonical whitespace by zig fmt.
2ed3ce2
to
219d4e5
Compare
Taking guidance from: ziglang/zig-spec#38, carriage returns are now allowed preceding line-feeds when found in doc-comments or multiline strings. The CRLF is also only interpreted as a newline in this case. However - they are otherwise rejected when found in any other context except when used as plain whitespace.
This resolves #11414 and resolves #12674. It replaces #12692 which is a part of this PR.
Ideally the error reporting should be improved by #12449 sometime in the future, but I think I might take a look at that once this has been merged.