-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/optimize collections for json #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice improvement Gareth! Just left a couple of very minor comments, but this all thing looks very nice to me. Thanks! I'm approving this ahead of time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting approach, what are the % perf gains?
@rracariu in answer to your question regarding performance gains. Roughly speaking... If you read a large grid and then exhaustively read all information there's no performance gain. If you read a large grid and just use some the grid's meta it's a 1000% faster. If you just read a grid and only read a few of the tags on each dict's it's about 30% faster. If you read a large grid and then immediately transfer it back to JSON then again it's 1000s of times faster. There's quite a few situation is our server side usage where this happens so we should see some very large performance gains. Note the added support for JSON string and byte buffers with encoded JSON strings. |
Refactor Haystack core so it's lazy for decoding values.
Before this change, decoding a grid would eagerly create rows and dicts (HVal) at the point the data was decoded. This adds a lot of unnecessary overhead if only part of the data structure is ever used. In our server side usage, we create lots of haystack values that may or may not be used at all.
This change makes with with haystack collections extremely lazy for JSON. Only when data is accessed for the first time will it be decoded. The refactoring could potentially also be applied to other encodings but since JSON is by far the fastest to parse it makes sense to start there.
Each collection (HList, HGrid and HDict) now has a backing store. Different stores abstract how the data is loaded. Hence we can make it lazy.
I've also added support for creating haystack data structures from JSON strings and JSON strings encoded in byte buffers. This way a haystack value can be created and only if something is done with it, will the byte buffer be read, decoded to a string and then decoded to its JSON.
I appreciate there's a lot of code here. To really zoom into the core pieces of code that do the work, please take a look at DictJsonStore, GridJsonStore and ListJsonStore (the new JSON specific store) versus DictObjStore, GridObjStore and ListObjStore (the old way that decodes everything up front (still used when creating dicts on the fly from code - which is fine).