-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Javascript: non-regex version about 16x-20x faster on my machine #52
Conversation
Does it pass all the tests? |
yes! Sorry for the delay. Just saw this today. I had a small fix needed for test 3, but yes - now it is passing all tests. |
Sorry to ask after such a long delay, but... is this still valid to merge? Could you re-verify both the test passing and the performance benchmark? And can you post the performance tests/results in some way so they can be independently verified? |
@getify I just tested again and it's still passing all tests. I don't have time at the moment to setup a formal performance benchmark, but you could definitely grab a large json file and pipe it through a few thousand times in both versions. Mine will definitely outperform the current version. |
We've just addressed a significant performance bug. As such, I'd like to re-validate whether the performance issue here is still as significant. Closing for now, but if performance benchmarks can highlight further issues to address, please re-open. |
@getify I don't see how to reopen this, but yes it absolutely should be reopened. Try testing with Mullvad's wireguard.json file (github doesn't take .json filenames so I renamed to wireguard.txt), which I've attached here: wireguard.txt You can test performance easily with: fetch('wireguard.txt').then(x=>x.text()).then(x=>{
console.log(+new Date)
var res = JSON.minify(x);
console.log(+new Date)
}) Your version: 4223ms (4.223 seconds) My version is over 324 times faster than yours |
The problem has nothing to do with quadratic performance in your algorithm. It has to do with the performance of regular expressions. An iterative loop (mine) will outperform regex (yours) every day of the week and twice on sunday. Regex is notoriously slow. So long as you are doing this: while (tmp = tokenizer.exec(json)) { ... then my version will be many, many times faster than yours. |
I can't re-open since the repository you submitted it from has been deleted. But I am looking into the performance issues more closely. I appreciate the test file you provided. And I'm sorry we didn't address your concerns earlier. I wasn't willing to accept a PR with a full-rewrite that doesn't handle the ports to all the other languages. Also, we had no formal performance benchmarking, so the assertions of performance improvements were harder to verify. In any case, now that I have that file you provided, I can definitely see performance issues, and I can work on fixing them. And I am. FWIW, on my machine, the current algorithm takes almost 5 seconds on your provide file, which is... BAD. But if I just comment-out the look-behind regex (the one we were trying to optimize) to prevent it from running, the process takes just 34 ms. That's 150x better, which is of course pretty significant. So, it's CLEARLY actually this look-behind regex that's the big performance culprit here, not just any regex usage. I have some approaches to fix this. So I'm playing with which option is best. |
No worries. I'll still contend that an iterative loop is always going to be faster than a regex (and less memory-intensive), but concede that a non-lookback version of regex may be good enough for your usage. I could probably port to the other languages, but I just don't have the time right now. Good luck either way |
Just discovered JSON.minify; I had written this function a couple years ago. It's a non-regex version that's about 16x-20x faster than your current version (based on tests on my machine).
Thought it might be worthwhile to share. The code is definitely a little ugly and not easy to read (I actually wrote this based on a C-version for a client, and that's pretty obvious looking at the code).
If you decide to use this you'll probably want to change some variable names and clean it up a bit, but the performance speaks for itself :)