-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
17191 fix is relative function #20077
Conversation
PR Summary
These changes helped us rectify a crucial error, made our web address analysis more efficient, and strengthened our testing framework, ultimately enhancing the reliability and strength of the product. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #20077 +/- ##
=======================================
Coverage 48.02% 48.02%
=======================================
Files 445 445
Lines 43889 43890 +1
=======================================
+ Hits 21077 21078 +1
Misses 22812 22812 ☔ View full report in Codecov by Sentry. |
@bizley just noted this today. This is the note from
This means that we're using |
Ugh, right. Looks like reverse is in order then. On it. |
Hm, after some more research I think we can keep it - we rely here only on the scheme part (whether it's present or not) - the warning is for checking other url parts, there are known bugs with it, but I could not find something about scheme itself being not recognized properly. |
BTW: benchmark that I mentioned in #20089 (comment): https://3v4l.org/fNYfS
|
I checked https://3v4l.org/ioiFVo - maybe we should go for triple strpos? |
I would go with regexp. It is comparable to triple strpos in terms of performance for relative URLs (and faster for absolute URLs) and supports all protocols at the same time: https://3v4l.org/Vb0rJ |
That's good news, so this is no longer a discussion about functional regression. If it's just about performance it can lower the urgency a bit....
This is not a fair test; you're testing cases that are very favorable; either the pos is 0, or in case of Btw: what do we do with mixed case URLs? I'm working on a more detailed performance test; give me a few more mins! |
Not sure what you mean by the fair test, I was testing whether the url starts with http:// or https:// or // hence the 0 (in real code this would require inversion of course). But I would go with the regex Rob proposed ( |
With not I fair meant you're only testing cases where the URL is not relative. This is a cleaned up version if your original tests @bizley.
Output php 8.3.1:
Output 7.4
This is from doing 200.000 iterations for each of checking 6 examples. (Don't compare the times with your tests since I've added more boiler plate and am verifying the results). Interestingly, as we're always told: don't optimize too early. As we see here at some point preg_match becomes faster, likely because the regex engine actually compiles the regular expression. This is very expensive but amortized over large number of iterations it becomes cheaper. With 300.000 reps on 8.3.1
With 30.000 reps however (a lot less consistent) we see cases where it's twice as slow.
Given these results I agree that we should just do preg match. It is the most readable code and works more than fast enough. TLDR: I agree with using |
Great work, guys! Let's go with preg_match then. @SamMousa you are already working on the PR, right? |
Correct! ;-) Just looking at the RFC for the scheme part:
https://www.rfc-editor.org/rfc/rfc3986 How'd you guess? |
I fixed the
BaseUrl::isRelative
method. I wrote some tests to cover all the cases that came up to my mind.Thanks