-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for CBOR and perhaps ascii85 for RPC calls #12
Comments
Based on https://github.com/brianolson/cbor_py , it looks like CBOR is expected to be quite a bit faster than JSON if both libraries are equally well optimized. However, Python's JSON implementation seems to be accelerated by using a lot of native C code, so a pure-Python CBOR implementation may be slower than Python's JSON. (From a security standpoint I'd be more comfortable with pure Python since it's memory-safe and C isn't, but it's not obvious that this trumps other factors.) |
Yes I will do some investigations; I have some pretty efficient encoders and decoders now. If you could send me (either a link, or send to my email) some examples of typical JSON messages you get (one large, perhaps one or two medium ones) I try them out. If we do CBOR is it worth bothering with ascii85? I don't mind adding it too if it's a small change. CBOR will be useful for many other things I have planned. |
Here are two JSON messages I'm observing in Electrum-NMC: http://3q4jhw6htdrfftl3.onion/mastiff-shrug Chunk 80 is the largest chunk in the Namecoin blockchain; Chunk 215 is the most recent chunk that has 2016 headers. I obtained those by hacking Electrum-NMC to dump them after they were decoded, so there may be subtle differences between those files versus what came over the wire, but for the purpose of testing different encoding schemes they should be good enough. Let me know once you've downloaded the files so I can shut off the onion service that's hosting those files.
ascii85 does perform some useful compression for data that has a lot of 0's (which CBOR doesn't do AFAIK), so there's a chance it may still be useful. Empirical data would be helpful here. Speaking of empirical data and compression, I noticed that running the two JSON files in the above link through |
Bitcoin headers contain non-trivial amount of redundant data btw that you could just cut out from chunks. See e.g. https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2017-December/015385.html |
@SomberNight Yes, the idea of excluding the previous block hash occurred to me, and I think it's worth trying (orthogonally to the other ideas discussed here). I wasn't aware of the other tricks covered in that email; I'd have no objection to trying them too. However, I suspect that gzip and/or ascii85 compression will still be useful even with the previous block hash excluded, because AuxPoW headers include the parent block header, and I don't think it's possible to guess the parent block's previous block hash from context (since not all blocks in the parent chain will be visible in the sidechain). Similarly, the nBits of the parent chain won't be constant, because a sidechain can have multiple parent chains (Namecoin has at least 3 commonly used parent chains right now AFAIK). |
@kyuupichan I had to shut off the onion service; let me know when you'd like me to start it up again. |
Can't you just upload the files here on GitHub? Or are they too large? |
@JeremyRand sorry I don't have a Tor browser installed. Can you put them on dropbox, google drive or something? |
@kyuupichan here are the chunks that were available previously on the onion service. |
Based on my testing with real-world dumps of Namecoin ElectrumX traffic (I'll upload my data and code shortly), it looks like compressing CBOR with DEFLATE (i.e. the compression algorithm of zlib, but without the redundant zlib header) yields substantially better compression and substantially better CPU usage than compressing JSON with DEFLATE. |
OK, what about the various compressed sizes vs original JSON? Anything on ascii85? CBOR is quite a bit of work (at least, to do it properly, which I would want). |
Here's the data I have so far: https://github.com/namecoin/electrum-nmc-compression-test Using chunk 80 as an example (it's the largest chunk in the Namecoin chain), the status quo (JSON with hex-encoded headers) is 5602176 bytes. Compressing the JSON with DEFLATE, with max speed, default, and max compression settings for DEFLATE, yields (respectively) 2487984, 2306142, and 2289435 bytes, with a compress+decompress time of 107, 254, and 657 milliseconds. Encoding the JSON as CBOR instead (with binary encoding for the headers) yields 2801099 without compression. Compressing the CBOR with DEFLATE (with the same 3 settings) yields 2027342, 1982980, and 1972675 bytes, with a compress+decompress time of 79, 113, and 526 milliseconds. So, CBOR with speed-optimized DEFLATE is both much faster and much better compression than JSON with any setting of DEFLATE. Increasing the compression settings against the CBOR version does yield a small bit of additional compression, but I'm highly skeptical that the increased CPU usage is worth it.
I haven't yet tried ascii85, I'll see if I can get some data on that shortly.
Any particular reason not to just use the |
I'd need to look a bit more at cbor2 but I suspect I'd be fine with it. Do you want to suggest protocol and code changes to enable this feature? |
Updated the aforementioned repo with analysis on ascii85; the tl;dr is that ascii85 is much worse than CBOR.
My experience with the aiorpcx codebase is basically nonexistent, so I won't be very efficient at making those changes, but I learn relatively quickly and I'm willing to attempt it if you don't have time to do it anytime soon. I might not have time to attempt it until circa next week though. |
No problem. It's mainly you pushing for this, though it would be a nice addition in general I think. |
@kyuupichan Should I submit the protocol changes as a PR to the ElectrumX repo, or is there a better place given that a lot of this code will live in aiorpcX rather than ElectrumX itself? |
Whatever you think is best |
@JeremyRand regarding cbor2, are your timings above based on that? Is the encoding to cbor with cbor2 package a lot slower than encoding to JSON? I don't think you gave plain JSON encoding times above |
Looking at your code it seems that your timings don't include the time taken to convert a python data structure to cbor with cbor2. I would want to know how long that takes, and also how long converting to json takes. I suspect I could make cbor2 faster, but probably not by much. It might benefit from being compiled with Cython/pypy. |
If this turns out to be beneficial for bandwidth and not detrimental to performance it should become the default.
kyuupichan/electrumx#712
The CBOR would probably not follow the standard 100% as I'm not convinced some of the parts are worth it, but all the key parts for RPC communication would be present.
The text was updated successfully, but these errors were encountered: