Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ordered Maps: Updates #1656

Open
ChristianGruen opened this issue Dec 15, 2024 · 18 comments
Open

Ordered Maps: Updates #1656

ChristianGruen opened this issue Dec 15, 2024 · 18 comments
Labels
Discussion A discussion on a general topic. XDM An issue related to the XPath Data Model

Comments

@ChristianGruen
Copy link
Contributor

If maps are updated, insertion/deletion order may be an issue, even more if maps will be ordered by default (#1651).

This topic needs to be discussed in more depth before we take any actions.

Personally, I think we should focus on XML updates first (related: #1225).

@ChristianGruen ChristianGruen added XDM An issue related to the XPath Data Model Discussion A discussion on a general topic. labels Dec 15, 2024
@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Dec 15, 2024

Examples for updating ordered maps with existing syntax:

Insert an entry at index 10 (could be map:insert-before):

let $map := map:build(1 to 20, string#1, void#0)
return map:merge((
  $map?[position() <= 10],
  { 'INSERTED': () },
  $map?[position() > 10]
))

…or (if we add an optional positional argument to map functions):

let $map := map:build(1 to 20, string#1, void#0)
return map:merge((
  map:filter($map, fn($key, $value, $pos) { $pos <= 10 }),
  { 'INSERTED': () },
  map:filter($map, fn($key, $value, $pos) { $pos > 10 })
))

Append an entry (could be map:append):

map:merge(($map, { 'appended': () }))

Prepend an entry:

map:merge({ 'prepended': () }, $map))

@MarkNicholls
Copy link

I can't really comment on #1225 it requires too much local knowledge.

For maps, lets assume, for the moment, maps are ordered.

we have this

map:merge($maps as map(*)*) as map(*)
map:of-pairs()

which presumably constructs a map, based on the order of the sequence of maps (and their inherent order)

then

map:keys
map:keys-where
map:entries
map:for-each
map:pairs
map:replace
map:values
map:filter

presumably these obey/preserve order

then we have the alleged awkward squad

map:remove
I would have thought this would obviously preserve order? you remove the entry and the rest stay in order? (am I missing something?)

map:put

ok, this what i'd call update/insert, and for me the 'obvious' behaviour is it overwrites the existing entry (in its ordered position) OR it adds on on the end.

I accept IF you are interested in order, then that's probably not very flexible, but I think losing the order completely is at best confusing, but then there is the question of how to insert things in order.

well we already have merge and of-pairs, so we just need a way to create the parts of the map, for an insert, you need the bit before the place you want to insert and the bit after.

I can't see how I would use filter to insert an item at the n'th position, the index isnt in the filter predicate, and to be frank, if it was, it feels a little clunky, I'd have to filter the map after the position i want to insert, and then before, and then merge.

I think splitAt is a bit cleaner, an order is really just something that maps to the natural numbers (i.e. an index) so inserting etc at a position, is really just about specifying indexes.

P.S.

fn:deep-equal

seems a more awkward problem, 2 maps with different orders but otherwise identical, for me are not equal, but I sense that may be a problem.

@MarkNicholls
Copy link

MarkNicholls commented Dec 15, 2024

@ChristianGruen

actually I think your position example is probably good enough, and splitAt is an unnecessary luxury (unless there is some big performance benefit that can be gained by having a dedicated function to split, I sense it could be a common task, and the maps can be large)

what do you think about deep-equals?

@ChristianGruen
Copy link
Contributor Author

I accept IF you are interested in order, then that's probably not very flexible, but I think losing the order completely is at best confusing, but then there is the question of how to insert things in order.

Your perspective is certainly interesting to hear. For me, it is the other way round: I would be surprised if someone used map:put and expected to get an ordered result, because to me “putting something into something” has no notion of orderedness at all. I would rather like us to have well-defined positional update functionality. After that, we could still define how map:put and map:delete (which seems easier) fit into that.

From an implementer’s perspective (but this should not be the deciding factor), one has certainly more freedom to decide what is going to happen internally if orderedness after updates is undefined.

I can't see how I would use filter to insert an item at the n'th position, the index isnt in the filter predicate.

I have edited my comment to show how map:filter could be used for positional tests. But indeed it’s more verbose than the new map filter syntax.

As far as fn:deep-equal is concerned, order will remain irrelevant (otherwise, we would lose backward compatibility). An option could be added to respect order.

@MarkNicholls
Copy link

let me appeal to principle rather than intuition

For me the driving motivation here is to derive a map that is "pure" especially in the context of me constructing and testing JSON.

I actually don't care what the order is, as long as its consistent, and it being inconsistent is an (seemingly unnecessary) impure side effect, that causes me no end of pain when I test (thus I have to impose an order in the tests, to make the code effectively pure again).

So for me a put with an undefined order, is a side effect, and actually, because I cannot guarantee any piece of code doesn't contain it, I then have to write code to impose an order....the only difference now would be that I can do that inside XSLT rather than externally in F# (I actually probably wouldn't bother because I test in C#, so plugging F# into the test pipeline would be easier than plugging XSLT 4 in).

So I cannot reason about the code, my tests can't either.

Intuitively the word "put" is just a word, if I rewrote your statement like this

I would be surprised if someone used map:append and expected to get an ordered result, because to me “appending something onto something” has no notion of orderedness at all

this statement now seems highly dubious, yet all I've done is change the label 'put' to 'append', ok, we don't want to change the label in the language for backwards compatibility reasons, but interpreting 'put' as effective "update or append" rather than its current "update or inject (at random)" seems to solve the issue without causing any real harm, its just the same as the old put, except it has a well defined position, no previous implementations will break (based on the spec) because no order was ever previously defined, if like you, I didn't expect it to have a defined order, I wouldn't care, having an order does not break logic that expects no consistent order.

So I'm struggling to see a problem. The cost is to the implementor of the language, but it doesn't seem (naively) onerous.

For the deep-equal issue, I have reservations about the whole notion of deep-equals, but intuitively if I compare 2 lists in a different order, then they are not equal, even if the entries are the same, this seems to be the same principle, but I accept there is a backwards compatibility cost to that, for me deep-equals is the wrinkle in this proposal.

(it would be solved by having ordered and unordered maps as separate data types, my observation is I would then no longer ever use unordered maps, and really for me this would just be a new version of map and thus a versioning issue, which I would happily accept, I don't share much of the backwards compatibility concerns)

@ChristianGruen
Copy link
Contributor Author

@MarkNicholls Thanks for your thoughts.

Immutable/persistent maps are more complex than their mutable counterparts, and supporting efficient updates with a notion of orderedness makes it even harder. It would be interesting to get feedback from other implementers on how far we should go.

@MarkNicholls
Copy link

yep I'm aware of the issues of persistent data structures, for me, the specifics of what the order is in a put is not important, whats important is its defined, so it can be the easiest option.

@michaelhkay
Copy link
Contributor

I believe that the primary purpose of making maps ordered is for human readability of serialized content, not to provide a way of capturing additional information, and we should design the feature with this as the primary goal. That's why, for example, I proposed that deep-equal() should ignore order. I'm therefore not in favour of introducing additional features for manipulating maps as sequences.

There's always the fallback of using map:pairs() to reduce the map to a sequence of key-value pairs, then manipulating this sequence using sequence-oriented functions, then reconstructing the map using map:of-pairs().

As for implementation, I found that on the Java platform there's a persistent ordered map implementation in the VAVR library. I couldn't find one on .NET, but I've implemented my own by running a two-way linked list through the entries of a non-ordered ImmutableDictionary.

I think we should allow bulk map creation operations like map:build and parse-json the options of either retaining or not retaining input order. To me it then seems natural that if you built the map retaining order, then put() and remove() should also respect the order, while if you built it as unordered, put() and remove() shouldn't impose an order on the result. That goes back to the idea that maps have orderedness as a property that affects the behaviour of put() and remove().

@MarkNicholls
Copy link

MarkNicholls commented Dec 16, 2024

I cant see a C# or an F# one.
Haskell has an ordered map which is just 2 maps, one indexed by key, the other by the 'index', where each underlying map already preserves the order via key, i.e. so the 2nd map effectively stores the order. You could build it with 2 persistent maps (there is one in F#).
It doesn't look like you would get contiguous indexes, i.e. if you delete something you would get a hole. So using 'position()' described above would be a headache (i.e. probably impractical), but it would be a map, persistent, and ordered.

@ChristianGruen
Copy link
Contributor Author

I spent some time analyzing the existing uses of maps: In most cases, maps are only created once. map:put was mostly used for small maps; map:remove is almost non-existent, which makes me wonder if it’s worth the effort to optimize it at this point.

We will probably offer only ordered maps. Internally, we will use a mutable map implementation for new maps, as it seems to be much faster than all immutable implementations that I encountered so far.

At the moment, our current HAMT extensions for ordered entries have a similar weak spot like VAVR’s LinkedHashMap, which is the linear worst-case complexity for put operations. Maybe we will tackle this by keeping the ordered entry list unchanged, but remembering updated (and possibly deleted) entries in auxiliary data structures, expecting that those do not get too large. If they do, the map can be cleaned up and reorganized.

@ChristianGruen
Copy link
Contributor Author

ChristianGruen commented Dec 20, 2024

After further tests and experiments, I have come to the opinion that we should drop unordered maps completely:

  • The memory overhead for preserving order seems small enough, compared to the overhead that maps and, in particular, XDM values create in general.
  • If if turns out that specific update patterns on ordered maps result in a bad runtime, it should be possible to tackle those with further optimizations strategies, or benefit from advancements in existing libraries.

We could still provide options to disable orderedness, but I wonder who will use it? It seems like a fairly technical tweak that may only affect specific implementations, so maybe it should rather be a vendor-specific feature than part of the standard.

I would still love to learn more about large XDM maps that are used in practical applications. The most impressive use cases I have seen so far did not process more than 10 million entries. For testing purposes, I have successfully created created, updated, serialized/stored and parsed/retrieved maps with ten times the amount of data in a reasonable amount of time.

With regard to the update operations, I still think it would be helpful to have functions to control the insertion position, but I also think that many other language features should have a higher priority at this stage.

@MarkNicholls With regard to map:put, if we always preserve order, I agree it will be most intuitive to append new entries.

@MarkNicholls
Copy link

I tend to agree with the comment about unordered maps, having a map with an unspecified order I wince at (I'd rather it defaulted to order by key, which tends to be out the box), but if the cost is minimal to adopt write sequence ordered maps everywhere then that's may just be simpler all round.

@benibela
Copy link

map:put

ok, this what i'd call update/insert, and for me the 'obvious' behaviour is it overwrites the existing entry (in its ordered position) OR it adds on on the end.

Overriding is the most efficient, since it does not have to change the key storage

@michaelhkay
Copy link
Contributor

Overriding is the most efficient, since it does not have to change the key storage

That's a gross simplification, it depends entirely on the data structures that are chosen. And a put() that replaces an existing entry does sometimes have to change the key that's stored: the fact that the keys are equal doesn't mean they are indistinguishable.

@michaelhkay
Copy link
Contributor

if the cost is minimal to adopt write sequence ordered maps everywhere then that's may just be simpler all round.

There will certainly be cases where the cost is noticeable. Certainly the cost in memory consumption, which can be very significant for some applications. I certainly think I owe it to my users to provide an option to drop ordering for cases where it's an unnecessary overhead, though I don't object to ordering by default.

@ChristianGruen
Copy link
Contributor Author

There will certainly be cases where the cost is noticeable. Certainly the cost in memory consumption, which can be very significant for some applications. I certainly think I owe it to my users to provide an option to drop ordering for cases where it's an unnecessary overhead, though I don't object to ordering by default.

After having played around with different approaches, I am pretty convinced now that the cost (both in terms of runtime and memory) depends a lot on the implementation, and I think the most important question rather is if we want to confront vendors with the effort involved in finding or writing an efficient implementations.

It reminds me a lot of the XQuery unordered expression, which (I guess) was introduced to make life easier for the implementer, and which was never really used in practice. Similarly, if Python or ECMAScript should still have an option to disable orderedness, it is difficult to spot.

Overriding is the most efficient, since it does not have to change the key storage

That's a gross simplification, it depends entirely on the data structures that are chosen. And a put() that replaces an existing entry does sometimes have to change the key that's stored: the fact that the keys are equal doesn't mean they are indistinguishable.

We believe to have found an efficient solution for both approaches (replace vs. delete/insert). From a user point of view, a replace would probably be most intuitive, as I assume that hardly anyone is available of the fact that map:put may also change keys. It will remain an edge case anyway – but certainly one that needs to be defined.

@adamretter
Copy link

I would much prefer to see either:

  1. Maps remain unordered by default (as they are now). There could be an additional option or syntax added to turn on ordering in some manner.
  2. A new Data Type in the XDM, which is not called Map, which provides an ordered map like capability. I believe Python calls this type 'Dictionary'.

My primary concerns here are:

  1. Preserving existing behaviour of XQuery 3.1.
  2. Performance.

Regards performance, I am aware that there are various persistent and non-persistent data structures for Immutable Maps that may offer ordering. However I suspect that these can never be as efficient as a simple mutable Open Hash Map using something like Robin Hood hashing. I would expect an immutable structure, to exhibit worse CPU cache locality and increase the amount of GC required. I am willing to be corrected though.

@ChristianGruen
Copy link
Contributor Author

@adamretter Thanks for sharing your thoughts.

However I suspect that these can never be as efficient as a simple mutable Open Hash Map using something like Robin Hood hashing. I would expect an immutable structure, to exhibit worse CPU cache locality and increase the amount of GC required. I am willing to be corrected though.

My observation is that the general overhead for immutability is much more significant than making an already immutable data structure ordered. One tweak that we have introduced just recently is to:

  • use a mutable (ordered) hash map for bulk create operations (with map:merge, map:build, etc), and
  • transform the map to an immutable HAMT to perform further updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion A discussion on a general topic. XDM An issue related to the XPath Data Model
Projects
None yet
Development

No branches or pull requests

5 participants