Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DESIGN] Number selection design refinements #859

Merged
merged 9 commits into from
Nov 4, 2024
120 changes: 110 additions & 10 deletions exploration/number-selection.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Selection on Numerical Values

Status: **Accepted**
Status: **Accepted** (moving back to **Proposed**)

<details>
<summary>Metadata</summary>
Expand Down Expand Up @@ -53,6 +53,21 @@ Both JS and ICU PluralRules implementations provide for determining the plural c
of a range based on its start and end values.
Range-based selectors are not initially considered here.

In <a href="https://github.com/unicode-org/message-format-wg/pull/842">PR #842</a>
@eemeli points out a number of gaps or infelicities in the current specification
and there was extensive discussion of how to address these gaps.

The `key` for exact numeric match in a variant has to be a string.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this results from the requirements.

In some cases the key has to be a string, in other cases it is enough to be a number.
So the whole section below is only one option: IF we consider the keys to be stings, then ...

The idea that the key can be a number sometimes is not considered.

But it would be natural to map "...foo {}..." and "...|foo| {}..." in syntax to strings, and "...123 {}..." and "...|123| {}..." in syntax to numbers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key has to be a string because the message is a string. The next line addresses this: if the key is a string, then the format of the string has to be clear so that it can be related to a number.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key has to be a string because the message is a string

I don't see how that one results from the other.
What says that keys and messages should be the same type?
And even if there is something, nothing stops us from changing it.

The format of such strings, therefore, has to be specified if messages are to be portable and interoperable.
In LDML45 Tech Preview we selected JSON's number serialization as a source for `key` values.
The JSON serialization is ambiguous, in that a given number value might be serialized validly in more than one way:
```
123
123.0
1.23E2
... etc...
```

## Use-Cases

As a user, I want to write messages that use the correct plural for
Expand All @@ -68,13 +83,71 @@ As a user, I want to write messages that mix exact matching and
either plural or ordinal selection in a single message.
> For example:
>```
>.match {$numRemaining}
>0 {{You have no more chances remaining (exact match)}}
>1 {{You have one more chance remaining (exact match)}}
>.match $numRemaining
>0 {{You have no more chances remaining (exact match)}}
>1 {{You have one more chance remaining (exact match)}}
>one {{You have {$numRemaining} chance remaining (plural)}}
> * {{You have {$numRemaining} chances remaining (plural)}}
>* {{You have {$numRemaining} chances remaining (plural)}}
>```

As a user, I want the selector to match the options specified:
```
.local $num = {123.123 :number maximumFractionDigits=2 minimumFractionDigits=2}
.match $num
123.12 {{This matches}}
120 {{This does not match}}
123.123 {{This does not match}}
1.23123E2 {{Does this match?}}
* {{ ... }}
```

Note that badly written keys just don't match, but we want users to be able to intuit whether a given set of keys will work or not.

```
.local $num = {123.456 :integer}
.match $num
123.456 {{Should not match?}}
123 {{Should match}}
123.0 {{Should not match?}}
* {{ ... }}
```

There can be complications, which we might need to define. Consider:

```
.local $num = {123.002 :number maximumFractionDigits=1 minimumFractionDigits=0}
.match $num
123.002 {{Should not match?}}
123.0 {{Does minimumFractionDigits make this not match?}}
123 {{Does minimumFractionDigits make this match?}}
* {{ ... }}
```

As an implementer, I am concerned about the cost of incorporating _options_ into the selector.
This might be accomplished by building a "second formatter".
Some implementations, such as ICU4J's, might use interfaces like `FormattedNumber` to feed the selector.
Implementations might also apply options by modifying the number value of the _operand_
(or shadowing the options effect on the value)

As a user, I want to be able to perform exact match using arbitrary digit numeric types where they are available.

As an implementer, I do **not** want to be required to provide or implement arbitrary precision
numeric types not available in my platform.
Programming/runtime environments vary widely in support of these types.
MF2 should not prevent the implementation using, for example, `BigDecimal` or `BigInt` types
and permit their use in MF2 messages.
MF2 should not _require_ implementations to support such types where they do not exist.
The problem of numeric type precision,
which is implementation dependent,
should not affect how message `key` values are specified.

> For example:
>```
>.local $num = {11111111111111.11111111111111 :number}
>.match $num
>11111111111111.11111111111111 {{This works on some implementations.}}
>* {{... but not on others? ...}}
>```

## Requirements

Expand Down Expand Up @@ -278,7 +351,8 @@ but can cause problems in target locales that the original developer is not cons
> considering other locale's need for a `one` plural:
>
> ```
> .match {$var}
> .input {$var :integer}
> .match $var
> 1 {{You have one last chance}}
> one {{You have {$var} chance remaining}} // needed by languages such as Polish or Russian
> // such locales typically require other keywords
Expand All @@ -292,6 +366,12 @@ but can cause problems in target locales that the original developer is not cons
When implementing `style=percent`, the numeric value of the operand
MUST be divided by 100 for the purposes of formatting.

> For example,
> ```
> .local $percent = {1000 :integer style=percent}
> {{This formats as '10%' in the en-US locale: {$percent}}}
> ```
aphillips marked this conversation as resolved.
Show resolved Hide resolved

### Selection

When implementing [`MatchSelectorKeys`](spec/formatting.md#resolve-preferences),
Expand Down Expand Up @@ -416,7 +496,9 @@ To expand on the last of these,
consider this message:

```
.match {$count :plural minimumFractionDigits=1}
.input {$count :number minimumFractionDigits=1}
.local $selector = {$count :plural}
.match $selector
0 {{You have no apples}}
1 {{You have exactly one apple}}
* {{You have {$count :number minimumFractionDigits=1} apples}}
Expand All @@ -431,9 +513,9 @@ With the proposed design, this message would much more naturally be written as:

```
.input {$count :number minimumFractionDigits=1}
.match {$count}
0 {{You have no apples}}
1 {{You have exactly one apple}}
.match $count
0.0 {{You have no apples}}
1.0 {{You have exactly one apple}}
Comment on lines +518 to +519
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
0.0 {{You have no apples}}
1.0 {{You have exactly one apple}}
0 {{You have no apples}}
1 {{You have exactly one apple}}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on whether the fraction digits apply or not. It doesn't matter, because the context is proposing a separate :plural selector.

Copy link
Collaborator

@mihnita mihnita Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are traps if we don't compare numeric.

Most locales will format currencies (by default) with 2 decimals, but some with 3, and some with none.

So if the source message is

0.00 {{This is free today!}}

then in some locales this will never match.

Because they would format to 0 or 0.000. And that's the default. No attributes specified by the developers.

Worse, there are regions using the same language formatting the currency differently, because the local currency has sub-units or not.

So (for example) the Arabic translation would have to have keys for both 0.00 and 0.000, so that exact match works for various countries.

one {{You have {$count} apple}}
* {{You have {$count} apples}}
```
Expand All @@ -460,3 +542,21 @@ and they _might_ converge on some overlap that users could safely use across pla
#### Cons

- No guarantees about interoperability for a relatively core feature.

## Alternatives Considered (`key` matching)

### Standardize the Serialization Forms

Using the design above, remove the integer-only and no-sig-digits restrictions from LDML45
and specify numeric matching by specifying the form of matching `key` values.
Comparison is as-if by string comparison of the serialized forms, just as in LDML45.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this alternative leaves "specifying the form of matching key values" as undefined, I can't tell what selecting this alternative would mean. This should be either dropped, or defined more precisely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that I didn't flesh this out fully, depending instead on the example that we were discussing just above. I will add the details here.

Note well: I'm not married to this as the design, but I'm trying to get at the technical requirements, especially the expectations of message authors (including translators)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs resolution.


### Compare numeric values

This is the design proposed in #842.

This modifies the key-match algorithm to use implementation-defined numeric value exact match:

> 1. Let `exact` be the numeric value represented by `key`.
> 1. If `value` and `exact` are numerically equal, then