This repository has been archived by the owner on Jun 29, 2022. It is now read-only.
hashmap: differentiate serialization of string and byte keys #192
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is an alternative to both #180 and #184; I'd like to retire those discussions.
The current state of keys in HashMap spec: the algorithm can accept both
string
andbytes
as keys and they are hashed asbytes
for the purpose of indexing (this is implied but not explicitly stated by the current spec form) and for the purpose of serialisation into block form they are stored asbytes
regardless of whether you providestring
orbytes
.The primary problem with this approach is that we lose the ability to differentiate when we deserialise. You require context to know whether they should be used as
bytes
or converted back intostring
. The algorithm has to be agnostic to this so it ends up getting pushed up the application stack. In naive usage, where you don't have much context, or haven't brought that context along for the ride (perhaps you're inspecting objects through the ipld explorer), you just get byte arrays, even if you were using them as strings. I believe it's fair to say that common usage of this data structure will be as it is in most programming languages: string keys. So being able to differentiate would be nice.The proposed solution here is to (1) explicitly allow both
string
andbytes
in the spec, (2) define some basic rules for how these things should be consistently hashed, and (3) serialize them as their original form. So on the block, astring
key would be stored as astring
. A byte array provided as a key would be stored asbytes
.Minor complications exists if you use a HashMap with both
string
andbyte
keys. I don't expect this will happen much in reality, particularly in the typed languages, you should have a consistent interface (especially if such interfaces are defined through schemas where you'd hopefully do something liketype MyMap { String : Foo } representation advanced HashMap
- there's your context). Implementations have to do some awkward things like: sorting buckets of mixed types requires a bit of care, checking for the existence of a key also requires care because the same key could be provided asbytes
orstring
and the hash would be the same but you have to make sure that "does this already exist?" works properly. IMO these should be left to the implementation for now and they should also probably carry suggestions against mixed types, which I'm doing here: https://github.com/rvagg/iamap/pull/8/files#diff-04c6e90faac2675aa89e2176d2eec7d8R244