Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update float syntax rules and formatting #756

Merged
merged 4 commits into from
Dec 18, 2024

Conversation

DaelonSuzuka
Copy link
Collaborator

Per @AlfishSoftware's feedback or #746, this PR removes the float literal rule I copied from the MagicPython grammar and (hopefully!) fixes the existing float rules.

@AlfishSoftware
Copy link

AlfishSoftware commented Nov 21, 2024

The best idea would be to take the Godot source as a reference, but sadly it's difficult to interpret the tokenizer code into a regex. But looking at the error messages, it disallows multiple adjacent _ (though it weirdly allows trailing _; probably an oversight on their end).

You should probably replace all [0-9][0-9_]* with \d+(?:_\d+)* -- or \d+(?:_\d+)*_? if you want to be super accurate and allow even a trailing underscore. Also I think line 270 might also match no mantissa digits, like just a . or .e1.

I tried to come up with one taking GDShader as a base and modifying it. I think these regexes work for float literals:

  • \d+(?:_\d+)*_?[eE][-+]?\d+(?:_\d+)*_? (no dot, exponent mandatory)
  • (?:(?:\d+(?:_\d+)*_?)?[.]\d+(?:_\d+)*_?|\d+(?:_\d+)*_?[.])(?:[eE][-+]?\d+(?:_\d+)*_?)?
    (with dot: digits are mandatory in at least one of both sides, exponent is optional)

You can inspect and test them here (both are joined with a |): https://regex101.com/r/6VUN0B/1

Note that I didn't include word boundaries, as I suppose you already have them around all identifiers and keywords.


If you use this, remember to do a similar replacement also on the integer literals (binary, decimal and hex) to disallow consecutive _ on them as well:

  • Binary: 0b[01]+(?:_[01]+)*_?
  • Decimal: \d+(?:_\d+)*_?
  • Hex: 0x\h+(?:_\h+)*_? (\h is valid in Oniguruma regex, equivalent to [0-9a-fA-F])

@DaelonSuzuka
Copy link
Collaborator Author

@AlfishSoftware Great info, thanks!

Hex: 0x\h+(?:\h+)*? (\h is valid in Oniguruma regex, equivalent to [0-9a-fA-F])

Very cool, I've always tried to stick to very basic regex to hopefully avoid implementation specific details, but you've convinced me to actually read the Oniguruma docs.

Also I think line 270 might also match no mantissa digits, like just a . or .e1.

Good catch, it does!

image

The best idea would be to take the Godot source as a reference
<snip>
disallows multiple adjacent _

I'm conflicted about this because, imo, the goal of the textmate grammar is not to strictly enforce the language/interpreter rules. The exact language rules aren't formally specified, and that's a moving target, and I'm not abandoning support to Godot 3-specific syntax. Therefore, the grammar's target must be the union of all the valid syntax of all versions of Godot 3, all versions of Godot 4, plus some margin to allow for mistakes.

It's also intended to be used in conjunction with the Godot LSP, which will yell about consecutive underscores anyways:

image

@AlfishSoftware
Copy link

AlfishSoftware commented Nov 22, 2024

I'm not abandoning support to Godot 3-specific syntax.

Ah, you're completely right. I didn't test it in Godot 3 at all, just assumed it was the same. Plus, there's no harm in allowing adjacent underscores as it doesn't conflict with any other syntax. The _ in Godot 3 seems to be allowed even after ., e, b and x, so those regexes must be right (except for that dot-only case).

I've always tried to stick to very basic regex to hopefully avoid implementation specific details

You're right, that's actually a good idea, as there's stuff like porting to GitHub syntax, etc. IIRC, there's not that much difference in Oniguruma; what stands out to me is mostly \h and \G (which it think can be useful in sub-patterns, if I understand correctly).

@AlfishSoftware
Copy link

AlfishSoftware commented Nov 22, 2024

There's no harm in allowing it (and the LSP will raise an error anyway), but just in case you missed it, I think it's also matching cases with just _ after e without a number. E.g.: .0e_, 0.e_, 0.0e_, 0e_

@DaelonSuzuka DaelonSuzuka merged commit aee83dd into godotengine:master Dec 18, 2024
4 checks passed
@DaelonSuzuka DaelonSuzuka deleted the fix-sci-notation-rules branch January 4, 2025 02:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants