17191 fix is relative function #20077

ggh2e3 · 2023-12-03T15:53:59Z

Q	A
Is bugfix?	✔️
New feature?	❌
Breaks BC?	❌
Fixed issues	#17191

I fixed the BaseUrl::isRelative method. I wrote some tests to cover all the cases that came up to my mind.

Thanks

what-the-diff · 2023-12-03T15:54:27Z

PR Summary

Bug Fix in CHANGELOG.md
There was an error in the document that logs all significant changes made to the project. The issue was specifically located in the BaseUrl::isRelative($url) function. This function is a part of the helper functions in our codebase.
Improvements to BaseUrl.php
The BaseUrl::isRelative($url) function was also improved, which makes analyzing URLs (web addresses) more efficient. This change ensures that the function is now correctly distinguishing between relative and absolute URLs. In simpler terms, our system can now better understand whether a provided internet address belongs to our own site or an external site.
Addition of BaseUrlTest.php
A new file for testing was added. It focuses on checking the functionality of the BaseUrl class. This step ensures that all features connected to processing web addresses are working correctly. This is important to ensure that future changes will not unintentionally break any of these features.

These changes helped us rectify a crucial error, made our web address analysis more efficient, and strengthened our testing framework, ultimately enhancing the reliability and strength of the product.

codecov · 2023-12-03T15:56:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d43341a) 48.02% compared to head (76f1733) 48.02%.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #20077   +/-   ##
=======================================
  Coverage   48.02%   48.02%           
=======================================
  Files         445      445           
  Lines       43889    43890    +1     
=======================================
+ Hits        21077    21078    +1     
  Misses      22812    22812

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

SamMousa · 2024-01-02T13:38:52Z

@bizley just noted this today. This is the note from parse_url:

Caution This function may not give correct results for relative or invalid URLs, and the results may not even match common behavior of HTTP clients. If URLs from untrusted input need to be parsed, extra validation is required, e.g. by using filter_var() with the FILTER_VALIDATE_URL filter.

https://www.php.net/parse_url

This means that we're using parse_url incorrectly since we're expecting to use it on relative URLS.

bizley · 2024-01-03T10:22:29Z

Ugh, right. Looks like reverse is in order then. On it.

bizley · 2024-01-03T10:57:38Z

Hm, after some more research I think we can keep it - we rely here only on the scheme part (whether it's present or not) - the warning is for checking other url parts, there are known bugs with it, but I could not find something about scheme itself being not recognized properly.

rob006 · 2024-01-03T11:59:11Z

BTW: benchmark that I mentioned in #20089 (comment): https://3v4l.org/fNYfS

parse_url() is 10x slower slower than strpos(), but it is still pretty fast.

bizley · 2024-01-03T14:27:06Z

I checked https://3v4l.org/ioiFVo - maybe we should go for triple strpos?

rob006 · 2024-01-03T15:01:33Z

I would go with regexp. It is comparable to triple strpos in terms of performance for relative URLs (and faster for absolute URLs) and supports all protocols at the same time: https://3v4l.org/Vb0rJ

SamMousa · 2024-01-03T15:02:10Z

Hm, after some more research I think we can keep it - we rely here only on the scheme part (whether it's present or not) - the warning is for checking other url parts, there are known bugs with it, but I could not find something about scheme itself being not recognized properly.

That's good news, so this is no longer a discussion about functional regression.

If it's just about performance it can lower the urgency a bit....

I checked https://3v4l.org/ioiFVo - maybe we should go for triple strpos?

This is not a fair test; you're testing cases that are very favorable; either the pos is 0, or in case of // it's 4. The 3rd case is bad because it doesn't have contain it.
I've added str_starts_with which is even faster but not available on lower versions.

Btw: what do we do with mixed case URLs?

I'm working on a more detailed performance test; give me a few more mins!

bizley · 2024-01-03T15:19:14Z

Not sure what you mean by the fair test, I was testing whether the url starts with http:// or https:// or // hence the 0 (in real code this would require inversion of course). But I would go with the regex Rob proposed (@^((\w+):)?//@) since it's fast (and on top of everything it will take care of mixed case protocols :)

SamMousa · 2024-01-03T15:29:58Z

With not I fair meant you're only testing cases where the URL is not relative.

https://3v4l.org/Dpob1

This is a cleaned up version if your original tests @bizley.

It has more examples, buth relative and non relative ones.
It has 6 implementations

Output php 8.3.1:

preg_match: 0.168
preg_match (case insensitive): 0.173
strpos: 0.226
str_starts_with: 0.243
substr_compare (case insensitive): 0.204
substr_compare (case sensitive): 0.221
parse_url: 0.376

Output 7.4

preg_match: 0.181
preg_match (case insensitive): 0.182
strpos: 0.190
str_starts_with: 0.189
substr_compare (case insensitive): 0.133
substr_compare (case sensitive): 0.144
parse_url: 0.231

This is from doing 200.000 iterations for each of checking 6 examples. (Don't compare the times with your tests since I've added more boiler plate and am verifying the results).

Interestingly, as we're always told: don't optimize too early. As we see here at some point preg_match becomes faster, likely because the regex engine actually compiles the regular expression. This is very expensive but amortized over large number of iterations it becomes cheaper.

With 300.000 reps on 8.3.1 preg_match wins vs all other strategies.

preg_match: 0.184
preg_match (case insensitive): 0.172
strpos: 0.244
str_starts_with: 0.257
substr_compare (case insensitive): 0.219
substr_compare (case sensitive): 0.223
parse_url: 0.322

With 30.000 reps however (a lot less consistent) we see cases where it's twice as slow.

preg_match: 0.034
preg_match (case insensitive): 0.017
strpos: 0.025
str_starts_with: 0.027
substr_compare (case insensitive): 0.021
substr_compare (case sensitive): 0.023
parse_url: 0.033

Given these results I agree that we should just do preg match. It is the most readable code and works more than fast enough.

TLDR: I agree with using preg_match

bizley · 2024-01-03T15:35:28Z

Great work, guys! Let's go with preg_match then. @SamMousa you are already working on the PR, right?

SamMousa · 2024-01-03T15:36:26Z

Correct! ;-) Just looking at the RFC for the scheme part:

 scheme        = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

https://www.rfc-editor.org/rfc/rfc3986

How'd you guess?

ggh2e3 added 5 commits December 3, 2023 15:07

17191: fixed isRelative method

fc582a2

17191: refacor tests using dataprovider

d3e4df7

yiisoft#17191: improved ensureScheme tests readability

d45c1cc

yiisoft#17191: improved ensureScheme tests readability

005a9c7

yiisoft#17191: updated changelog

76f1733

samdark requested review from a team December 4, 2023 14:50

bizley approved these changes Dec 18, 2023

View reviewed changes

Arhell approved these changes Dec 19, 2023

View reviewed changes

bizley merged commit fcaafd8 into yiisoft:master Dec 19, 2023
66 checks passed

bizley added this to the 2.0.50 milestone Dec 19, 2023

bizley added type:enhancement type:bug Bug and removed type:enhancement labels Dec 19, 2023

ggh2e3 deleted the 17191_fix_is_relative_function branch December 19, 2023 15:24

rob006 mentioned this pull request Dec 27, 2023

fix #17191: improved consistency and readability of UrlManager::creat… #20089

Open

SamMousa mentioned this pull request Jan 3, 2024

fix: improve isRelative performance #20095

Merged

rob006 mentioned this pull request Jun 14, 2024

yii\web\UrlManager::createAbsoluteUrl() creates wrong url in version >= 2.0.50 #20199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

17191 fix is relative function #20077

17191 fix is relative function #20077

ggh2e3 commented Dec 3, 2023

what-the-diff bot commented Dec 3, 2023

codecov bot commented Dec 3, 2023 •

edited

Loading

SamMousa commented Jan 2, 2024

bizley commented Jan 3, 2024

bizley commented Jan 3, 2024

rob006 commented Jan 3, 2024

bizley commented Jan 3, 2024

rob006 commented Jan 3, 2024

SamMousa commented Jan 3, 2024

bizley commented Jan 3, 2024

SamMousa commented Jan 3, 2024 •

edited

Loading

bizley commented Jan 3, 2024

SamMousa commented Jan 3, 2024

17191 fix is relative function #20077

17191 fix is relative function #20077

Conversation

ggh2e3 commented Dec 3, 2023

what-the-diff bot commented Dec 3, 2023

PR Summary

codecov bot commented Dec 3, 2023 • edited Loading

Codecov Report

SamMousa commented Jan 2, 2024

bizley commented Jan 3, 2024

bizley commented Jan 3, 2024

rob006 commented Jan 3, 2024

bizley commented Jan 3, 2024

rob006 commented Jan 3, 2024

SamMousa commented Jan 3, 2024

bizley commented Jan 3, 2024

SamMousa commented Jan 3, 2024 • edited Loading

bizley commented Jan 3, 2024

SamMousa commented Jan 3, 2024

codecov bot commented Dec 3, 2023 •

edited

Loading

SamMousa commented Jan 3, 2024 •

edited

Loading