Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

hashmap: differentiate serialization of string and byte keys #192

Closed
wants to merge 2 commits into from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Sep 11, 2019

This is an alternative to both #180 and #184; I'd like to retire those discussions.

The current state of keys in HashMap spec: the algorithm can accept both string and bytes as keys and they are hashed as bytes for the purpose of indexing (this is implied but not explicitly stated by the current spec form) and for the purpose of serialisation into block form they are stored as bytes regardless of whether you provide string or bytes.

The primary problem with this approach is that we lose the ability to differentiate when we deserialise. You require context to know whether they should be used as bytes or converted back into string. The algorithm has to be agnostic to this so it ends up getting pushed up the application stack. In naive usage, where you don't have much context, or haven't brought that context along for the ride (perhaps you're inspecting objects through the ipld explorer), you just get byte arrays, even if you were using them as strings. I believe it's fair to say that common usage of this data structure will be as it is in most programming languages: string keys. So being able to differentiate would be nice.

The proposed solution here is to (1) explicitly allow both string and bytes in the spec, (2) define some basic rules for how these things should be consistently hashed, and (3) serialize them as their original form. So on the block, a string key would be stored as a string. A byte array provided as a key would be stored as bytes.

Minor complications exists if you use a HashMap with both string and byte keys. I don't expect this will happen much in reality, particularly in the typed languages, you should have a consistent interface (especially if such interfaces are defined through schemas where you'd hopefully do something like type MyMap { String : Foo } representation advanced HashMap - there's your context). Implementations have to do some awkward things like: sorting buckets of mixed types requires a bit of care, checking for the existence of a key also requires care because the same key could be provided as bytes or string and the hash would be the same but you have to make sure that "does this already exist?" works properly. IMO these should be left to the implementation for now and they should also probably carry suggestions against mixed types, which I'm doing here: https://github.com/rvagg/iamap/pull/8/files#diff-04c6e90faac2675aa89e2176d2eec7d8R244

@rvagg
Copy link
Member Author

rvagg commented Sep 11, 2019

I also included original notes about int keys and possible sparse array usage from #180 and put them into design/history/ for possible future reference. That's not a spec change, just a tracking of some thinking here.

Stebalien pushed a commit to Stebalien/specs that referenced this pull request Sep 18, 2019
…proofs-to-datastructures

add PoStProof description and note size of arrays
@mikeal
Copy link
Contributor

mikeal commented Oct 8, 2020

@rvagg what’s the status of this one?

@rvagg
Copy link
Member Author

rvagg commented Oct 13, 2020

we're going to drop this (for now at least) - the clear preference is for an ADL to behave just like its data model equivalent. There may be reason for an implementation to offer this, maybe as a side API, but it's probably not something we care about at the spec level where our data structures should be ADLs

@rvagg rvagg closed this Oct 13, 2020
@rvagg rvagg deleted the rvagg/hashmap-key-kind-differentiation branch October 13, 2020 21:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants