Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the e-mail regexp more robust #33

Open
jiripetrzelka opened this issue Apr 7, 2022 · 6 comments
Open

Make the e-mail regexp more robust #33

jiripetrzelka opened this issue Apr 7, 2022 · 6 comments

Comments

@jiripetrzelka
Copy link

Would it be possible to update the regular expression for e-mails so that only one e-mail and only a valid e-mail would pass the validation test?

Currently, we are getting tens, if not hundreds of invalid entries, especially in the Ounits API but also in IIAs and LAs., such as:

  • multiple e-mails separated by arbitrary delimiter, such as commas, spaces, semicolons etc.,
  • e-mail address with white spaces at the beginning or the end of the string,
  • e-mail address accompanied with the name of the coordinator, directly in the element.

I guess many providers run their output through a validator before outputing the result to the outside world so I think that this would be a step towards ensuring a better quality of data exchanged.

Currently, since we validate the contents of the e-mail field against a standard e-mail validator, we are forced to either ignore the specific element altogether or consider the entire record (LA, IIA) invalid and therefore it does not reach end users.

<xs:simpleType name="Email">
<xs:annotation>
<xs:documentation>
All elements with this type should be valid email addresses.
Please note that passing the test for the attached regex pattern does NOT imply
for the content to be a valid email. This pattern is extremely simplified and
it will reject only a couple of obvious mistakes (as opposed to serious hacking
attempts).
</xs:documentation>
</xs:annotation>
<xs:restriction base="xs:string">
<xs:pattern value="[^@]+@[^\.]+\..+"></xs:pattern>
</xs:restriction>
</xs:simpleType>

@mkurzydlowski
Copy link
Contributor

mkurzydlowski commented Apr 11, 2022

@jiripetrzelka are you able to propose a better regexp?

I would like to point out that improving the regexp won't have any impact on the data being produced in the responses but that doesn't mean that we don't see value in making this regexp better.

@jiripetrzelka
Copy link
Author

@mkurzydlowski
Copy link
Contributor

It's unfortunately not that easy to check this regular expression. How should we proceed with validating it?

Can I ask you to create a pull request with the proposed change? There is some work that needs to be done to put this regular expression in the XSD and I'm afraid to break it, as I'm not really able to check it afterwards.

To sum up, I'm afraid that we might introduce more problems than it is worth. It would be great if there was already a well established type in XSD that we might just use.

@jiripetrzelka
Copy link
Author

I couldn't find any tool that would convert the above mentioned regexp into a XSD-compliant format.

I have only found some more simplistic examples, for example:

I guess any of them is slightly better than the regexp we currently use.

@mkurzydlowski
Copy link
Contributor

Doesn't the second solution imply that solutions 1 and 3 are not good enough?

I'm still not able to choose between solutions you proposed. If we don't find a solution that has been already validated by experts then we need to wait for some expert to join this discussion and propose or validate a solution.

@janinamincer-daszkiewicz
Copy link
Member

Any suggestions from other developers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants