-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow name-char as first character of unquoted-literal #990
Conversation
@@ -45,7 +45,7 @@ | |||
"src": "{|2006-01-02T15:04:06| :datetime}" | |||
}, | |||
{ | |||
"src": "{|2006-01-02T15:04:06| :datetime year=numeric month=|2-digit|}" | |||
"src": "{|2006-01-02T15:04:06| :datetime year=numeric month=2-digit}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change the parsers now need a pretty big lookahead.
Before you it was enough to look at the first character:
|
=> quoted string
0
-9
or '-' > try to get a number literal
starting char => literal
anything else => Error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that change? Anywhere that literal is valid, single-character lookahead still suffices.
|
→ quoted-literal$
→ variable:
→ function*
→ key#
or/
→ (markup)- name-char → unquoted-literal
- anything else → error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @gibson042.
It is trivial to check the first character. A number literal can only start with one of 11 characters: - or 0..9. All of these are contained in name-char
:
name-char = name-start / DIGIT / "-" / "."
/ %xB7 / %x300-36F / %x203F-2040
So the first character is enough to send you down the path.
Of course, once you start down the path to any of these outcomes, you could end up with something bogus
|
||
```abnf | ||
literal = quoted-literal / unquoted-literal | ||
quoted-literal = "|" *(quoted-char / escaped-char) "|" | ||
unquoted-literal = name / number-literal | ||
number-literal = ["-"] (%x30 / (%x31-39 *DIGIT)) ["." 1*DIGIT] [%i"e" ["-" / "+"] 1*DIGIT] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another step towards making everything a string :-(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that comment is a confusion about thinking that quoted literal mean string, when they are completely separate.
The main syntax shouldn't talk about what format literal numbers are, or what format literal dates are, or what format literal units are; that's up to the functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
number-literal
rule is still needed for number functions, hence its re-definition in the Number Operands section. Note that I've left out the exponenetial part from it, as the+
character is no longer valid without quoting.
This seems like a rather large drawback... don't we still want to allow something like {|1e+12| :currency currency=JPY}
? Or for that matter, just allow +
to be unquoted like e.g. unquoted-literal = 1*(name-char / "+" / …)
.
This change also removes the need to quote
2-digit
, a valid literal value for some:datetime
options. Previously, it was the only default function option value that needed quoting.
That's nice.
We've... stalled on expanding number literals to non-integer values. I think this is something that will be wanted, but attempts to address this so far have not succeeded. I do hope that they will in the future. Note that the So, yeah, we could consider allowing the ASCII character Is the meta-goal to allow any literal that does not contain whitespace, bidi formatting characters, or syntax-meaningful "sigils" to be unquoted? If that's the case, maybe we restore a small production for |
Is there really that much utility in supporting literal numerical values with exponents? I would claim that this is exceedingly rare, and could be left out initially. If someone does present a real-world use case, we could extend support later for |
It needs more than one if I still want to keep the number-literal: But it looks like there is no appetite for that. I think it is a mistake, but it also looks like I am in minority. |
@eemeli mentions:
I agree that it's rare and note that we have already left it out. However, it would be good if we had enough foresight to minimize changes of this sort to the syntax in the future. Allowing @mihnita noted:
I think this is a good callout, from the point of view that the current syntax suggests we accept I think that moving to a strictly character production would allow single-lookahead to capture |
Exponential notation seems important to me for big numbers and small numbers, such as are encountered in finance and physics. It's also present in most if not all of the languages in which one can already NumberFormat (cf. MDN Given all that, I would ask what is the benefit of not supporting exponential notation in Number Operands? If it is just "it should be possible to express any number operand as an unquoted literal", then I think the same arguments apply to "+" for exponents as applied to "-" for "2-digit". But for the record, I don't consider quoting to be particularly special or onerous, and would be fine if it is required for
@aphillips what do you mean by "have already left it out"? Exponential notation is certainly a part of the current
Well, yeah. Numeric literals were never valid names, and expanding |
As this PR makes it so that a Pre-emptively parsing the operand of |
@gibson042 asked:
I mean that, while |
As discussed and requested, I've re-included exponents in |
@eemeli Ready for review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very slight nit, otherwise LGTM
Co-authored-by: Addison Phillips <[email protected]>
It'd be nice to get this merged so I can have the other literal PR based off of head. |
Based on previous discussion in the 2025-02-10 call, merging this change. |
Fixes #724
As discussed today, our
name
definition is a slightly restricted variant of the XMLNCName
, which has not been updated to account for developments in the last 15 years or so that have e.g. included combining marks and the ALM in thename-start
range.So as the
name
production is already messy,unquoted-literal
is not really made much worse by allowingname-char
as its first character. Doing so also allows us to drop thenumber-literal
rule from the syntax.The
number-literal
rule is still needed for number functions, hence its re-definition in the Number Operands section. Note that I've left out the exponenetial part from it, as the+
character is no longer valid without quoting.This change also removes the need to quote
2-digit
, a valid literal value for some:datetime
options. Previously, it was the only default function option value that needed quoting.Filing initially as a draft, as I want to implement this to make sure it works as expected, and to make sure that all test suite changes have been accounted for.