Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Composer/Pcre Part 1 of Many #4323

Merged
merged 5 commits into from
Jan 21, 2025
Merged

Conversation

oleibman
Copy link
Collaborator

@oleibman oleibman commented Jan 17, 2025

The native preg functions (preg_match, preg_replace, etc.) often require us to add a lot of useless boilerplate code to satisfy Phpstan, Scrutinizer, etc. Composer/Pcre offers us a way to remove that boilerplate, thereby giving us a cleaner codebase. I decided to try it on a few modules, and saw a result that clearly demonstrated the usefulness of doing this (aside from the cleaner codebase).

Sample 22_Reader_issue1767 reads an Xlsx spreadsheet with complex sheet names used in defined names, and writes it to Xlsx and Xls output files. When I changed Writer/Xls/Parser to use Composer/Pcre, this sample failed in many different places writing the Xls file. It turns out that some regexes were failing not because the string didn't match, but because the regex encountered "catastrophic backtracking". Composer/Pcre throws an exception when this happens; the native preg_match does return false, but we were not checking for that. The regexes in question are now changed to something which works, and formal unit tests are added for them. Finding this previously undetected error indicates that we should proceed with this change.

An alternative to using Composer/Pcre would be to test for false after all the preg calls. I have done this in the two samples changed with this PR. That seems adequate for a small number of changes, but it really just makes for more clutter considering the large number of regexps that we use in our code. I think Composer/Pcre is a better choice.

It isn't quite transparent. Composer forces all regexps to use PREG_UNMATCHED_AS_NULL, so some match fields will now be null instead of null-string (or non-existent if the unmatched field comes at the end). Our test suite doesn't report any problem (yet) due to this change, although Phpstan is sensitive to it. Several Phpstan annotations were eliminated due to this change.

It is not necessary to do this all at once. This PR addresses all the calls in Writer. I intend to address other components in several tickets.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests
  • cleaner codebase

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

The native preg functions (preg_match, preg_replace, etc.) often require us to add a lot of useless boilerplate code to satisfy Phpstan, Scrutinizer, etc. Composer/Pcre offers us a way to remove that boilerplate, thereby giving us a cleaner codebase. I decided to try it on a few modules, and saw a result that clearly demonstrated the usefulness of doing this (aside from the cleaner codebase).

Sample 22_Reader_issue1767 reads an Xlsx spreadsheet with complex sheet names used in defined names, and writes it to Xlsx and Xls output files. When I changed Writer/Xls/Parser to use Composer/Pcre, this sample failed in many different places writing the Xls file. It turns out that some regexes were failing not because the string didn't match, but because the regex encountered "catastrophic backtracing". Composer/Pcre throws an exception when this happens; the native preg_match does return false, but we were not checking for that. The regexes in question are now changed to something which works, and formal unit tests are added for them. Finding this previously undetected error indicates that we should proceed with this change.

An alternative to using Composer/Pcre would be to test for false after all the preg calls. I have done this in the two samples changed with this PR. That seems adequate for a small number of changes, but it really just makes for more clutter considering the large number of regexps that we use in our code. I think Composer/Pcre is a better choice.

It isn't quite transparent. Composer forces all regexps to use PREG_UNMATCHED_AS_NULL, so some match fields will now be null instead of null-string (or non-existent if the unmatched field comes at the end). Our test suite doesn't report any problem (yet) due to this change, although Phpstan is sensitive to it. Several Phpstan annotations were eliminated due to this change, but some others are now needed.

It is not necessary to do this all at once. This PR addresses all the calls in Writer. I intend to address other components in several tickets.
Eliminate a number of annotations.
@oleibman oleibman enabled auto-merge January 21, 2025 23:14
@oleibman oleibman added this pull request to the merge queue Jan 21, 2025
Merged via the queue into PHPOffice:master with commit 3f15579 Jan 21, 2025
13 of 14 checks passed
@oleibman oleibman deleted the comppcre branch January 21, 2025 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant