-
Notifications
You must be signed in to change notification settings - Fork 327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PatternValidator does not correctly validate input with newlines #495
Comments
@kmalski Thanks a lot for raising this issue and providing detailed explanations and solutions. I would say that multiline support for pattern matching is very rare and that is why nobody found this issue before. What we can do is to make this configurable and default to the single-line. This will maintain backward compatibility and at the same time, allow users with multiline pattern matching to be enabled when necessary. What do you think? Here is the class for the configuration and we just need to add a new entry with default as single-line. |
@stevehu Looks good for me. I found one more case in Joni library, when subtract of Option.MULTILINE is not enough, so I think it's reasonable to wait for answer (at least some time). |
Thanks for the update. Let's keep this issue open until we have the proper solution. |
We are facing the same issue within our project. Any plans to adjust the SchemaValidatorsConfig.java as mentioned by @stevehu above? |
Yes. Let us know if you need any help. Thanks. |
Hey @stevehu , I'm one of @anro87 's colleagues. I've just started writing a test for this feature but it quickly dawned on me that the problem was already fixed a few days ago in #737 It's not configurable though, not sure what that means for a release since it would technically break existing behavior Note: the Joni/ECMA262 side still exhibits this issue, it's the JDK-based matcher that has been fixed. |
Waiting on feedback how I should proceed. These are the options we have:
|
Hm I didn't really think about the use case in #782 . What do you think about keeping it the way it is now (no implicit anchoring) and doing very simple detection if the regex wants to be anchored, in that case we would use |
Re-opening as the fix for this was reverted due to causing other issues in I'm not familiar enough with the issue and it doesn't look like this is easy to fix so using GraalJS might be an option. |
Hi, great progress of the library from the time when I originally opened this issue. @justin-tay to make it easier for you, I have checked to see if this issue still exists and currently I can reproduce it for Joni and JDK implementations (GraalJS implementation seems to work fine). @Test
void multilineString() {
RegularExpression regex = new JoniRegularExpression("^[a-z]{1,10}$");
assertTrue(regex.matches("abc\n"));
} @Test
void multilineString() {
RegularExpression regex = new JDKRegularExpression("^[a-z]{1,10}$");
assertTrue(regex.matches("\nabc\n"));
} Both test cases pass, but because we are using anchors here, I would expect them to fail. |
Hi,
I have found that neither java regex based (
PatternValidatorJava
) nor joni based (PatternValidatorEcma262
) pattern validation does not work correctly with newlines.Any of implementation does not correctly interpret
^
and$
anchors. I would expect that, when I use them at the start and end of pattern (eg.^[a-z]{1,10}$
) they would not allow to pass any trailing newline character (eg.abc\n
should not be matched). There are separate problems with both implementation, so I will describe them individually.The problem is with default configuration for ECMAScript syntax in Joni library, which has multiline matching by default enabled. From the json-schema-validator code:
For the fast fix, the last line can be changed to:
but I have also rised issue in the Joni library, as I believe that this is not correct default (ECMAScript has disabled multiline matching by default).
What's more interesting, because of enabled multiline matching, currently this input
\r\nab\nab\n
will match this pattern^[a-z]{1,10}$
and pass validation. We want to allow single character-only word, and the entire sentence passes.Current implementation of matching with java regex looks like this:
The problem is how the find method works. From the documentation:
Attempts to find the next subsequence of the input sequence that matches the pattern.
So this function matches subsequence of input and for the two JDKs, which I tried, returnstrue
for inputabab\n
and pattern^[a-z]{1,10}$
, despite that I used^
and$
anchors. So any ending newline character will be always allowed for such patterns.Possible solution is to use
matches
method which attempts to match the entire region against the pattern, but this will result in implicit adding^
and$
anchors to every pattern.The text was updated successfully, but these errors were encountered: