-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a regular expression package to be replaced by another package. #87
Comments
Since that pr, we've added lru caching to the library, which makes frequently seen user agents parse almost instantly. Is the performance with that really a problem? I'd be open to considering faster regex implementations if it gives a large benefit, though I'd think it'd make more sense if the regex engine we use was not exposed for dependency injection - if any regex engine is missing features we need or implements a slightly different flavor of regex which ends up having observable behavior differences, that would be problematic. |
While that is true, it is obviously only for user agents which have been seen before and are still in the cache by their second hit. FWIW a while back a dailymotion employee kindly provided a sample of their access logs1, which should be realistic2, it has ~75k entries of which ~20k unique, with a very long tail of "one hit wonders", user agents seen only once and never again (this is relevant because as I learned LRU is pretty bad at one hit wonders at low sizes). On that file, when I investigated different cache algorithms I got the following hit rates at cachesize 1000 (which I understand is what you're using):
So even with a cache, the regex performance can be quite impactful. With that said, rather than switching the regex library what I would recommend is looking if somebody has implemented FilteredRE2 in Go: regexes.yaml has a lot of regexes, there are more than 600 device parsers as of 0.18, a faster regex engine will not make that much of a difference in the end I think. What An other thing you may want to look at — but this one you'll really have to bench for go specifically as regexp may or may not have this issue — is using Footnotes
|
The standard library's regexp is already quite fast for capturing groups and is often the fastest choice when working in Go. While faster libraries like Hyperscan exist, Go lacks a direct Hyperscan-like (streaming DFA) library that also supports capturing-group offsets. Libraries like PCRE or Oniguruma can do that, but it's a toss-up whether they'll actually be faster. I agree with masklinn BTW |
#59 Related to this PR, the regexp package is very slow.
I suggest creating an interface and changing it so that processing related to regular expressions can be injected externally as a dependency.
The text was updated successfully, but these errors were encountered: