Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

17191 fix is relative function #20077

Merged
merged 5 commits into from
Dec 19, 2023

Conversation

ggh2e3
Copy link
Contributor

@ggh2e3 ggh2e3 commented Dec 3, 2023

Q A
Is bugfix? ✔️
New feature?
Breaks BC?
Fixed issues #17191

I fixed the BaseUrl::isRelative method. I wrote some tests to cover all the cases that came up to my mind.

Thanks

Copy link

what-the-diff bot commented Dec 3, 2023

PR Summary

  • Bug Fix in CHANGELOG.md
    There was an error in the document that logs all significant changes made to the project. The issue was specifically located in the BaseUrl::isRelative($url) function. This function is a part of the helper functions in our codebase.

  • Improvements to BaseUrl.php
    The BaseUrl::isRelative($url) function was also improved, which makes analyzing URLs (web addresses) more efficient. This change ensures that the function is now correctly distinguishing between relative and absolute URLs. In simpler terms, our system can now better understand whether a provided internet address belongs to our own site or an external site.

  • Addition of BaseUrlTest.php
    A new file for testing was added. It focuses on checking the functionality of the BaseUrl class. This step ensures that all features connected to processing web addresses are working correctly. This is important to ensure that future changes will not unintentionally break any of these features.

These changes helped us rectify a crucial error, made our web address analysis more efficient, and strengthened our testing framework, ultimately enhancing the reliability and strength of the product.

Copy link

codecov bot commented Dec 3, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d43341a) 48.02% compared to head (76f1733) 48.02%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #20077   +/-   ##
=======================================
  Coverage   48.02%   48.02%           
=======================================
  Files         445      445           
  Lines       43889    43890    +1     
=======================================
+ Hits        21077    21078    +1     
  Misses      22812    22812           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@samdark samdark requested review from a team December 4, 2023 14:50
@bizley bizley merged commit fcaafd8 into yiisoft:master Dec 19, 2023
66 checks passed
@bizley bizley added this to the 2.0.50 milestone Dec 19, 2023
@ggh2e3 ggh2e3 deleted the 17191_fix_is_relative_function branch December 19, 2023 15:24
@SamMousa
Copy link
Contributor

SamMousa commented Jan 2, 2024

@bizley just noted this today. This is the note from parse_url:

Caution This function may not give correct results for relative or invalid URLs, and the results may not even match common behavior of HTTP clients. If URLs from untrusted input need to be parsed, extra validation is required, e.g. by using filter_var() with the FILTER_VALIDATE_URL filter.

https://www.php.net/parse_url

This means that we're using parse_url incorrectly since we're expecting to use it on relative URLS.

@bizley
Copy link
Member

bizley commented Jan 3, 2024

Ugh, right. Looks like reverse is in order then. On it.

@bizley
Copy link
Member

bizley commented Jan 3, 2024

Hm, after some more research I think we can keep it - we rely here only on the scheme part (whether it's present or not) - the warning is for checking other url parts, there are known bugs with it, but I could not find something about scheme itself being not recognized properly.

@rob006
Copy link
Contributor

rob006 commented Jan 3, 2024

BTW: benchmark that I mentioned in #20089 (comment): https://3v4l.org/fNYfS

parse_url() is 10x slower slower than strpos(), but it is still pretty fast.

@bizley
Copy link
Member

bizley commented Jan 3, 2024

I checked https://3v4l.org/ioiFVo - maybe we should go for triple strpos?

@rob006
Copy link
Contributor

rob006 commented Jan 3, 2024

I would go with regexp. It is comparable to triple strpos in terms of performance for relative URLs (and faster for absolute URLs) and supports all protocols at the same time: https://3v4l.org/Vb0rJ

@SamMousa
Copy link
Contributor

SamMousa commented Jan 3, 2024

Hm, after some more research I think we can keep it - we rely here only on the scheme part (whether it's present or not) - the warning is for checking other url parts, there are known bugs with it, but I could not find something about scheme itself being not recognized properly.

That's good news, so this is no longer a discussion about functional regression.

If it's just about performance it can lower the urgency a bit....

I checked https://3v4l.org/ioiFVo - maybe we should go for triple strpos?

This is not a fair test; you're testing cases that are very favorable; either the pos is 0, or in case of // it's 4. The 3rd case is bad because it doesn't have contain it.
I've added str_starts_with which is even faster but not available on lower versions.

Btw: what do we do with mixed case URLs?

I'm working on a more detailed performance test; give me a few more mins!

@bizley
Copy link
Member

bizley commented Jan 3, 2024

Not sure what you mean by the fair test, I was testing whether the url starts with http:// or https:// or // hence the 0 (in real code this would require inversion of course). But I would go with the regex Rob proposed (@^((\w+):)?//@) since it's fast (and on top of everything it will take care of mixed case protocols :)

@SamMousa
Copy link
Contributor

SamMousa commented Jan 3, 2024

With not I fair meant you're only testing cases where the URL is not relative.

https://3v4l.org/Dpob1

This is a cleaned up version if your original tests @bizley.

  • It has more examples, buth relative and non relative ones.
  • It has 6 implementations

Output php 8.3.1:

preg_match: 0.168
preg_match (case insensitive): 0.173
strpos: 0.226
str_starts_with: 0.243
substr_compare (case insensitive): 0.204
substr_compare (case sensitive): 0.221
parse_url: 0.376

Output 7.4

preg_match: 0.181
preg_match (case insensitive): 0.182
strpos: 0.190
str_starts_with: 0.189
substr_compare (case insensitive): 0.133
substr_compare (case sensitive): 0.144
parse_url: 0.231

This is from doing 200.000 iterations for each of checking 6 examples. (Don't compare the times with your tests since I've added more boiler plate and am verifying the results).

Interestingly, as we're always told: don't optimize too early. As we see here at some point preg_match becomes faster, likely because the regex engine actually compiles the regular expression. This is very expensive but amortized over large number of iterations it becomes cheaper.

With 300.000 reps on 8.3.1 preg_match wins vs all other strategies.

preg_match: 0.184
preg_match (case insensitive): 0.172
strpos: 0.244
str_starts_with: 0.257
substr_compare (case insensitive): 0.219
substr_compare (case sensitive): 0.223
parse_url: 0.322

With 30.000 reps however (a lot less consistent) we see cases where it's twice as slow.

preg_match: 0.034
preg_match (case insensitive): 0.017
strpos: 0.025
str_starts_with: 0.027
substr_compare (case insensitive): 0.021
substr_compare (case sensitive): 0.023
parse_url: 0.033

Given these results I agree that we should just do preg match. It is the most readable code and works more than fast enough.

TLDR: I agree with using preg_match

@bizley
Copy link
Member

bizley commented Jan 3, 2024

Great work, guys! Let's go with preg_match then. @SamMousa you are already working on the PR, right?

@SamMousa
Copy link
Contributor

SamMousa commented Jan 3, 2024

Correct! ;-) Just looking at the RFC for the scheme part:

 scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

https://www.rfc-editor.org/rfc/rfc3986

How'd you guess?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants