hashmap: differentiate serialization of string and byte keys #192

rvagg · 2019-09-11T00:49:40Z

This is an alternative to both #180 and #184; I'd like to retire those discussions.

The current state of keys in HashMap spec: the algorithm can accept both string and bytes as keys and they are hashed as bytes for the purpose of indexing (this is implied but not explicitly stated by the current spec form) and for the purpose of serialisation into block form they are stored as bytes regardless of whether you provide string or bytes.

The primary problem with this approach is that we lose the ability to differentiate when we deserialise. You require context to know whether they should be used as bytes or converted back into string. The algorithm has to be agnostic to this so it ends up getting pushed up the application stack. In naive usage, where you don't have much context, or haven't brought that context along for the ride (perhaps you're inspecting objects through the ipld explorer), you just get byte arrays, even if you were using them as strings. I believe it's fair to say that common usage of this data structure will be as it is in most programming languages: string keys. So being able to differentiate would be nice.

The proposed solution here is to (1) explicitly allow both string and bytes in the spec, (2) define some basic rules for how these things should be consistently hashed, and (3) serialize them as their original form. So on the block, a string key would be stored as a string. A byte array provided as a key would be stored as bytes.

Minor complications exists if you use a HashMap with both string and byte keys. I don't expect this will happen much in reality, particularly in the typed languages, you should have a consistent interface (especially if such interfaces are defined through schemas where you'd hopefully do something like type MyMap { String : Foo } representation advanced HashMap - there's your context). Implementations have to do some awkward things like: sorting buckets of mixed types requires a bit of care, checking for the existence of a key also requires care because the same key could be provided as bytes or string and the hash would be the same but you have to make sure that "does this already exist?" works properly. IMO these should be left to the implementation for now and they should also probably carry suggestions against mixed types, which I'm doing here: https://github.com/rvagg/iamap/pull/8/files#diff-04c6e90faac2675aa89e2176d2eec7d8R244

rvagg · 2019-09-11T00:51:12Z

I also included original notes about int keys and possible sparse array usage from #180 and put them into design/history/ for possible future reference. That's not a spec change, just a tracking of some thinking here.

…proofs-to-datastructures add PoStProof description and note size of arrays

mikeal · 2020-10-08T21:16:06Z

@rvagg what’s the status of this one?

rvagg · 2020-10-13T21:57:26Z

we're going to drop this (for now at least) - the clear preference is for an ADL to behave just like its data model equivalent. There may be reason for an implementation to offer this, maybe as a side API, but it's probably not something we care about at the spec level where our data structures should be ADLs

hashmap: differentiate serialization of string and byte keys

c1c256d

rvagg mentioned this pull request Sep 11, 2019

hashmap: 3 kinds of map keys, string, bytes, integers (for discussion) #180

Closed

rvagg mentioned this pull request Sep 11, 2019

hashmap: serialize keys as strings not bytes #184

Closed

vmx approved these changes Sep 11, 2019

View reviewed changes

hashmap: include discussion of int keys in design-history

9955731

Stebalien pushed a commit to Stebalien/specs that referenced this pull request Sep 18, 2019

Merge pull request ipld#192 from filecoin-project/feat/post-and-seal-…

6daf85f

…proofs-to-datastructures add PoStProof description and note size of arrays

rvagg closed this Oct 13, 2020

rvagg deleted the rvagg/hashmap-key-kind-differentiation branch October 13, 2020 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hashmap: differentiate serialization of string and byte keys #192

hashmap: differentiate serialization of string and byte keys #192

rvagg commented Sep 11, 2019 •

edited

Loading

rvagg commented Sep 11, 2019

mikeal commented Oct 8, 2020

rvagg commented Oct 13, 2020

hashmap: differentiate serialization of string and byte keys #192

hashmap: differentiate serialization of string and byte keys #192

Conversation

rvagg commented Sep 11, 2019 • edited Loading

rvagg commented Sep 11, 2019

mikeal commented Oct 8, 2020

rvagg commented Oct 13, 2020

rvagg commented Sep 11, 2019 •

edited

Loading