Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript: non-regex version about 16x-20x faster on my machine #52

Closed
wants to merge 2 commits into from
Closed

Javascript: non-regex version about 16x-20x faster on my machine #52

wants to merge 2 commits into from

Conversation

cegfault
Copy link

Just discovered JSON.minify; I had written this function a couple years ago. It's a non-regex version that's about 16x-20x faster than your current version (based on tests on my machine).

Thought it might be worthwhile to share. The code is definitely a little ugly and not easy to read (I actually wrote this based on a C-version for a client, and that's pretty obvious looking at the code).

If you decide to use this you'll probably want to change some variable names and clean it up a bit, but the performance speaks for itself :)

@getify
Copy link
Owner

getify commented Jan 10, 2020

Does it pass all the tests?

@cegfault
Copy link
Author

cegfault commented Feb 2, 2020

yes!

Sorry for the delay. Just saw this today. I had a small fix needed for test 3, but yes - now it is passing all tests.

@getify
Copy link
Owner

getify commented Jan 23, 2021

Sorry to ask after such a long delay, but... is this still valid to merge? Could you re-verify both the test passing and the performance benchmark? And can you post the performance tests/results in some way so they can be independently verified?

@cegfault
Copy link
Author

@getify I just tested again and it's still passing all tests.

I don't have time at the moment to setup a formal performance benchmark, but you could definitely grab a large json file and pipe it through a few thousand times in both versions. Mine will definitely outperform the current version.

@getify
Copy link
Owner

getify commented Dec 15, 2021

We've just addressed a significant performance bug. As such, I'd like to re-validate whether the performance issue here is still as significant. Closing for now, but if performance benchmarks can highlight further issues to address, please re-open.

@getify getify closed this Dec 15, 2021
@cegfault
Copy link
Author

@getify I don't see how to reopen this, but yes it absolutely should be reopened.

Try testing with Mullvad's wireguard.json file (github doesn't take .json filenames so I renamed to wireguard.txt), which I've attached here: wireguard.txt

You can test performance easily with:

fetch('wireguard.txt').then(x=>x.text()).then(x=>{
  console.log(+new Date)
  var res = JSON.minify(x);
  console.log(+new Date)
})

Your version: 4223ms (4.223 seconds)
My version : 13ms (0.013 seconds)

My version is over 324 times faster than yours

@cegfault
Copy link
Author

The problem has nothing to do with quadratic performance in your algorithm. It has to do with the performance of regular expressions.

An iterative loop (mine) will outperform regex (yours) every day of the week and twice on sunday. Regex is notoriously slow. So long as you are doing this:

while (tmp = tokenizer.exec(json)) {

... then my version will be many, many times faster than yours.

@getify
Copy link
Owner

getify commented Dec 22, 2021

I can't re-open since the repository you submitted it from has been deleted. But I am looking into the performance issues more closely. I appreciate the test file you provided. And I'm sorry we didn't address your concerns earlier. I wasn't willing to accept a PR with a full-rewrite that doesn't handle the ports to all the other languages. Also, we had no formal performance benchmarking, so the assertions of performance improvements were harder to verify. In any case, now that I have that file you provided, I can definitely see performance issues, and I can work on fixing them. And I am.


FWIW, on my machine, the current algorithm takes almost 5 seconds on your provide file, which is... BAD. But if I just comment-out the look-behind regex (the one we were trying to optimize) to prevent it from running, the process takes just 34 ms. That's 150x better, which is of course pretty significant. So, it's CLEARLY actually this look-behind regex that's the big performance culprit here, not just any regex usage.

I have some approaches to fix this. So I'm playing with which option is best.

@cegfault
Copy link
Author

No worries. I'll still contend that an iterative loop is always going to be faster than a regex (and less memory-intensive), but concede that a non-lookback version of regex may be good enough for your usage.

I could probably port to the other languages, but I just don't have the time right now.

Good luck either way

@getify getify mentioned this pull request Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants