Feature/optimize collections for json #38

garethj2 · 2025-01-09T10:59:58Z

Refactor Haystack core so it's lazy for decoding values.

Before this change, decoding a grid would eagerly create rows and dicts (HVal) at the point the data was decoded. This adds a lot of unnecessary overhead if only part of the data structure is ever used. In our server side usage, we create lots of haystack values that may or may not be used at all.

This change makes with with haystack collections extremely lazy for JSON. Only when data is accessed for the first time will it be decoded. The refactoring could potentially also be applied to other encodings but since JSON is by far the fastest to parse it makes sense to start there.

Each collection (HList, HGrid and HDict) now has a backing store. Different stores abstract how the data is loaded. Hence we can make it lazy.

I've also added support for creating haystack data structures from JSON strings and JSON strings encoded in byte buffers. This way a haystack value can be created and only if something is done with it, will the byte buffer be read, decoded to a string and then decoded to its JSON.

I appreciate there's a lot of code here. To really zoom into the core pieces of code that do the work, please take a look at DictJsonStore, GridJsonStore and ListJsonStore (the new JSON specific store) versus DictObjStore, GridObjStore and ListObjStore (the old way that decodes everything up front (still used when creating dicts on the fly from code - which is fine).

jaxgzz

Nice improvement Gareth! Just left a couple of very minor comments, but this all thing looks very nice to me. Thanks! I'm approving this ahead of time

spec/core/TrioReader.spec.ts

spec/core/grid/GridColumn.spec.ts

src/core/dict/DictJsonStore.ts

perf/readGrid.js

spec/core/TrioReader.spec.ts

src/core/grid/GridColumn.ts

rracariu

Interesting approach, what are the % perf gains?

spec/core/dict/DictHValObjStore.spec.ts

garethj2 · 2025-01-10T12:38:43Z

@rracariu in answer to your question regarding performance gains. Roughly speaking...

If you read a large grid and then exhaustively read all information there's no performance gain.

If you read a large grid and just use some the grid's meta it's a 1000% faster. If you just read a grid and only read a few of the tags on each dict's it's about 30% faster.

If you read a large grid and then immediately transfer it back to JSON then again it's 1000s of times faster.

There's quite a few situation is our server side usage where this happens so we should see some very large performance gains. Note the added support for JSON string and byte buffers with encoded JSON strings.

garethj2 added 30 commits January 6, 2025 17:05

Optimize dict for JSON

32301ab

Refactor grid

a285160

Move column caching into HGrid implementation

c1da280

Refactor grid

2313d94

Add grid stores

82b7827

Add refreshColumns to grid

eca0c8a

Manually refresh columns in a grid

ab4ab62

Remove dict validate

637454b

Tweak grid toZinc

b043088

Remove observable hack

1c7323a

Add tests for grid json encoding

48de7b5

Add lazy grid json store

50173e9

Tweak index

dc17823

Add toJSONString to all hval types

7aa5a02

Ensure dict lazily decodes gradually

d11e1ff

Add JSON string store implementations for grid and dict

e5ba439

Encode to JSON byte buffer

ba0bfbf

Add JSON byte buffer support to grid and dict store

d2ce258

Add Uint8Array buffer stores

ac924a6

Move list

e48b3bd

Make symbols readonly

9071bda

Add list store

5c638ff

Add list JSON handling

b342cd3

Add list JSON store

98e55b7

Optimize JSON encoding of a grid

a8aa469

Tweak JSON list encoding

b7e991a

Ensure dict json store saves on memory

bf8a1c7

Tweak position of dict store symbol

4367dc4

Refactor grid data handling

b39b7a4

Tweak list json save memory

4fc4cdd

garethj2 added 2 commits January 9, 2025 10:37

Add list JSON string store

f4e95f4

Add list byte buffer support

f155b8b

garethj2 requested review from EliteScientist, rracariu, hecsalazarf, riccardoleder, jaxgzz and kianj2 January 9, 2025 10:59

garethj2 self-assigned this Jan 9, 2025

Add performance test

a1c3e7a

jaxgzz approved these changes Jan 9, 2025

View reviewed changes

spec/core/TrioReader.spec.ts Outdated Show resolved Hide resolved

spec/core/grid/GridColumn.spec.ts Outdated Show resolved Hide resolved

src/core/dict/DictJsonStore.ts Show resolved Hide resolved

garethj2 added 2 commits January 9, 2025 16:04

Remove unnecessary toJSON

20778cf

Fix test comments

1a59c5c

hecsalazarf approved these changes Jan 9, 2025

View reviewed changes

perf/readGrid.js Outdated Show resolved Hide resolved

spec/core/TrioReader.spec.ts Outdated Show resolved Hide resolved

src/core/grid/GridColumn.ts Outdated Show resolved Hide resolved

rracariu approved these changes Jan 10, 2025

View reviewed changes

spec/core/dict/DictHValObjStore.spec.ts Show resolved Hide resolved

PR feedback

489e44f

garethj2 merged commit 564c01e into master Jan 10, 2025
1 check passed

garethj2 deleted the feature/optimize-grid-and-dict-for-json branch January 10, 2025 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/optimize collections for json #38

Feature/optimize collections for json #38

garethj2 commented Jan 9, 2025 •

edited

Loading

jaxgzz left a comment

rracariu left a comment

garethj2 commented Jan 10, 2025 •

edited

Loading

Feature/optimize collections for json #38

Feature/optimize collections for json #38

Conversation

garethj2 commented Jan 9, 2025 • edited Loading

jaxgzz left a comment

Choose a reason for hiding this comment

rracariu left a comment

Choose a reason for hiding this comment

garethj2 commented Jan 10, 2025 • edited Loading

garethj2 commented Jan 9, 2025 •

edited

Loading

garethj2 commented Jan 10, 2025 •

edited

Loading