77 Lookup returning path selection #832

michaelhkay · 2023-11-15T10:29:38Z

Note that unlike many of the functions we have added, these are non-trivial: they cannot easily be implemented in XSLT or XQuery.

This is a first cut and I expect some refinement will be needed, but reviews are invited.

I might subsequently propose layering some XSLT syntax on top of this for convenience.

ChristianGruen · 2023-11-15T12:27:36Z

Exciting! And ambitious. Some questions (that could possibly have been answered by reading the rules in more detail):

If a subitem of a candidate is updated, I assume the result is dropped in favor of the updated parent item?

(: will probably return [ 'x' ] ? :)
array:deep-update(
  [ [ '1' ] ],
  fn { ?1, ?1?1 },  (: → or, possibly, with #297: ??1 :)
  fn { 'x' }
)

(: will probably return map { 'a': 1 } ? :)
map:deep-update(
  map { 'a': map { 'a': map { 'a': 0 } } },
  map:find(?, 'a'),
  fn { 1 }
)

Can be items updated as well?

map:deep-update(
  map { 'x': (1, 2) },
  fn { head(?x) },
  fn { 0 }
)

michaelhkay · 2023-11-15T12:37:37Z

Q1: I chose to make this an error; you can't select two items A and B if one contains the other. Dropping the update on the child would have been another option.

Q2: yes, that example should work fine.

ChristianGruen · 2023-11-15T12:54:15Z

Q1: I chose to make this an error; you can't select two items A and B if one contains the other. Dropping the update on the child would have been another option.

With XQuery Update, it’s sometimes convenient to be able to write delete nodes //x.
Being strict in the beginning might be a good choice, though.

ChristianGruen · 2023-11-16T16:28:56Z

Upfront: The evolve function from Ramda is interesting as well: https://ramdajs.com/docs/#evolve

Thanks again for the proposal. As I was somewhat thrilled by the powerful semantics of the single update function, I presented this in an internal meeting. The first feedback I got was: Gosh, updates are really becoming cryptic now…

One reason why I believe people are fond of XQuery Update is its verbosity (that doesn’t necessarily include myself). For the following XML document…

<xml>
  <person><name>John</name><city>London</city></person>
  <person><name>Jack</name><city>Manchester</city></person>
</xml>

…there’s hardly a need to explain what the following expressions do:

delete nodes //city,
replace node /xml/person[1]/name/text() with 'Joe',
insert node <country>UK</country> into /xml/person[1]

With the proposed functions, and with the full notation, we would probably get something like…

let $persons := [
  map { 'name': 'John', 'city': 'London' },
  map { 'name': 'Jack', 'city': 'Manchester' }
]
return $persons
=> array:deep-update( (: delete :)
  function($persons) { array:values($persons) },
  function($person) { map:remove($person, 'city') }
)
=> array:deep-update( (: insert :)
  function($persons) { $persons(1) },
  function($person) { map:put($person, 'country', 'UK') }
)
=> array:deep-update( (: replace :)
  function($persons) { $persons(1)('name') },
  function($person) { 'Joe' }
)

With a syntax as compacted as possible we get:

$persons
=> array:deep-update(fn { ?* }, fn { map:remove(., 'city') })       (: delete :)
=> array:deep-update(fn { ?1 }, fn { map:put(., 'country', 'UK' })  (: insert :)
=> array:deep-update(fn { ?1?name }, fn { 'Joe' })                  (: replace :)

This probably looks accessible to people who use lookup operators and function items every day, but it could get a mean hassle for more occasional users. One approach to mitigate this could be dedicated functions for deletions, insertions, and replacements:

$persons
=> deep-delete( fn($persons) { $persons?*?city } )
=> deep-insert( fn($persons) { $persons?1 }, fn { 'country' }, fn { 'UK' } )
=> deep-replace( fn($persons) { $persons?1?name }, fn { 'Joe' } )

The syntax could be further simplified by providing a simple sequence of steps to the target nodes, and getting rid of function items, at least for simple operations:

$persons
=> deep-delete( [1 to array:size($persons), 'city'] )
=> deep-insert( [1], 'country', 'UK' )
=> deep-replace( [1, 'name'], 'Joe' )

Remarks:

I’ve dropped the map: and array: prefixes. I think a user shouldn’t care about the exact input type as long as it’s an array or a map.
Instead, we may need to focus on the target type (deep-insert requires different arguments for arrays and maps).
I’ve used an array structure for the first parameter, as it allows us to specify multiple child nodes.
There could be a better wildcard solution instead of 1 to array:size($persons).
In addition, the definition of descendant paths is an open issue here.

Maybe we need just both: Simple variants for non-experts, and deep-update as a more powerful and generic solution with function items?

Personally, I have a hard time convincing myself that annotations can be implemented without compromising the vast majority of our non-updating use cases. Even if the data model isn’t affected, it would seem to affect numerous places in the code that are critical in terms of runtime and memory consumption.

An separate path array, as suggested for the simpler functions variants, would certainly simplify things a lot.

michaelhkay · 2023-11-16T17:13:04Z

I see the deep-update function as a primitive onto which we can put XQuery/XSLT syntactic sugar, so I'm not too concerned about the usability of the function. The main reason I did it this way was that it made it easier to define the semantics in a way that was common to XSLT and XQuery.

I've certainly considered the idea of restricting the expression that can be used in the select function - perhaps making it something like the XSLT pattern syntax. In fact the current Saxon implementation does impose restrictions - see https://www.saxonica.com/documentation12/index.html#!functions/saxon/with-pedigree But I'm not convinced they are necessary, since most of the functions that navigate maps and arrays can be defined in terms of a few primitives.

An important aim is to ensure that the implementation can use immutable/persistent structures so that parts of the tree that don't change don't need to be physically copied. I've never managed to achieve that with node trees (despite several attempts) because node identity and parentage are so deeply embedded in the data model. I think it's achievable here because identity and parentage are transient.

I do recognise that implementation is challenging and that there's a need to demonstrate feasibility with a proof-of-concept implementation. But I think it's a good idea to start with a spec that we would like to be able to deliver to users, and then work out how to solve the implementation problems.

michaelhkay · 2023-11-17T07:31:12Z

One potential simplification would be to select (for replacement) only values that are immediate members of an array or entry-values of a map, rather than individual items within those values. The main problem with this is that the lookup operators (? and ??) select individual items, not values. It might be possible to refine the semantics of those operators so that in some contexts they select the values rather than the individual items.

There might also be mileage in borrowing from JSONPath, where the result of a selection expression is not just a value, but a path to a value - equivalent in essentials to the "annotated value" in this proposal, but described rather differently.

A related problem is handling entries where the value is an empty sequence. It's hard to select such values using lookup operators, and we can't currently replace an empty sequence by something else. If we replace a non-empty sequence by an empty sequence, it doesn't delete the array member or map entry, it merely sets it to empty.

michaelhkay · 2023-11-17T07:34:38Z

Note that the proposal could be considerably simplified if we were to adopt the suggestion in #298 (comment) that arrays should be treated as a subtype of maps.

ChristianGruen · 2023-11-17T10:39:29Z

I'm optimistic about the chance to avoid full copies and reference unmodified parts of the original map/array as that's a general requirement for immutable data structures – and I agree it's not applicable to XML node trees unless we drop node identities.

My main concern is about the selection process, though. The annotation/pedigree approach reminds me of something we have done for XQuery Full Text scoring in the past (for $n score $s in //x[. contains text 'y'], etc.). In our current implementation, most of it has been dropped, as the optional enrichment affected too many expressions (the additional challenge here was that scores needed to be propagated to subsequent results). We never became happy with it, but of course this may have partly been due to the deficiencies how we tackled it.

My conclusion was that properties that can get lost unnoticed during processing (possibly via distinct-values, sorts or casts, in our case) – such properties shouldn't exist at all in a formally strict language; they should rather be made explicit.

I think the XQUF spec is a valuable document to learn about various obstacles users might encounter for map/array updates. Luckily we don't need to care about different node types, namespaces or attributes vs. texts, but there are good reasons why more than one update primitive was introduced (technically, it would certainly have been possible to define fewer primitives). With op:deep-update, for example, the target of a deletion would need to be split across both the selection and the update function, as indicated above. I believe it's more than syntactic sugar:

deep-update( fn { ?1 }, fn { array:remove(., 2) } )
vs deep-delete( fn { ?1?2 )
or deep-delete( [1, 2] )

Arithmeticus · 2024-01-05T22:39:51Z

specifications/xpath-functions-40/src/function-catalog.xml

+               <fos:expression><eg>map:deep-update(
+   map{'x':map{'p':1, 'q':2}, 'y':map{'p':11, 'q':12}}, 
+   fn{map:find(.,'p')?*}, 
+   f{.+5})</eg></fos:expression>


f{.+5} - > fn{.+5}

ChristianGruen · 2024-02-29T15:20:59Z

Related: https://lists.w3.org/Archives/Public/public-xslt-40/2024Feb/0021.html

ChristianGruen · 2024-03-04T17:45:37Z

Thanks to the re-enabled previews; I’ve just spotted the new UpdateExpr.

Our experience with XQUF shows that chains/pipelines of updates are important, if not essential. We should understand how this would look like with the new syntax. I’ve seen the following example:

let $temp := 
      update map $data 
      replace ??Paris 
      with array:append(map{"to":"Frankfurt", "distance":480})
   return update $temp 
          replace ??cities 
          with map:put("Frankfurt", [map{"to":"Paris", "distance":357}])

I assume it’s supposed to be array:append(., ...) and map:put(., ...)?

We should head for a flat syntax; maybe something like:

update { $data } {
  replace value ??Paris with array:append(., map { "to": "Frankfurt", "distance": 480 }),
  for $city in ??cities
  return replace value $city with map:put($city, "Frankfurt", [ map { "to": "Paris", "distance": 357 } ]))
} {
  delete values ??phone
}

And I bet that XQUF users will demand a similar syntax for nodes. In principle, the same construct could be used to wrap existing XQUF expressions:

update { $node } {
  for $n in .//*[. = 'Paris']
  return insert nodes (<to/>, <distance/>) into $n
} {
  delete nodes .//phone
}

michaelhkay · 2024-03-04T17:59:15Z

Yes, doing the examples led me to similar conclusions. But I'm inclined to get something simple working first. It's clearly already quite powerful.

That's why I stuck with "replace" for the time being, though it would clearly be useful to also have insert/remove.

I'm not comfortable using "," between expressions for something other than sequence concatenation - though I know that's what we do in function calls!

ChristianGruen · 2024-03-04T18:05:02Z

I'm not comfortable using "," between expressions for something other than sequence concatenation - though I know that's what we do in function calls!

…and for basically all other operations that cannot be represented by a single expression ;·) In non-trivial XQuery code, it’s very common indeed.

ChristianGruen · 2024-03-06T10:01:38Z

@michaelhkay A remark regarding test cases:

Would it be possible to create PRs in the qt4tests repository for tests of features that haven’t been added to the specification yet, and only merge those once the PR has been finalized and accepted?

michaelhkay · 2024-03-06T10:17:18Z

Yes, it would probably be a good discipline to start using branches and PRs on the test suite.

ChristianGruen · 2024-03-06T10:39:12Z

Yes, it would probably be a good discipline to start using branches and PRs on the test suite.

Glad to hear. To avoid too much overhead, I believe it’s fine to just merge things that have already been accepted by the group.

michaelhkay · 2024-05-17T23:27:37Z

I have revised the proposal very substantially. The changes also impact on the semantics of deep lookup, and the concept of labels.

An update expression can now have multiple clauses, including delete, replace, and extend sub-operations.

The creation of parent, ancestor, and root functions is now built-in to the deep lookup operator when the entry modifier (formerly pairs) is used. It is no longer required to explicitly pin the root first. The fn:pin and fn:label functions are therefore dropped, along with the section in the XPath book describing them.

The semantics of the update expression are now described formally in terms of an XSLT-style recursive-descent transformation of the tree of maps and arrays.

There's a glitch I still need to fix - multi-step lookup expressions like ??x??y don't retain ancestry information all the way back to the root. But I think the proposal is now in a state where detailed review is very welcome.

ChristianGruen · 2024-05-18T07:47:29Z

Very interesting again; it will take me a while to give acceptable feedback.
To make things easier for me, I’ll create a related issue on updating nodes first.

ChristianGruen · 2024-05-20T12:13:37Z

Some feedback (not as comprehensive as I had hoped):

What’s the rationale behind the additional map/array keywords in the syntax? Is it required to differentiate the two types for the update semantics, or is it a tentative step to add a node keyword later on?
Our experience with node updates has shown that it is very useful and powerful to support sequences as input for updates (and to avoid ugly constructs like for $map in $maps return update map $map { ... }). Similar to the XQUF node/nodes keywords, the plural forms maps/arrays could be allowed as aliases (without attaching further semantics).
In the introductory examples, all leaf targets are maps. Will it be possible to do things like update map { delete ?array?entry::1 }, or is the syntax restricted to updating map leaves?
I suggest sticking to the XQUF wording – even if the order of arguments is swapped – and changing extend TARGET with DATA to insert DATA into TARGET.
The syntax of some examples in “4.13.5.1 Update example” deviates from the previous ones (curly braces are missing in the first example; no ::entry specifiers).
In your comment further above, you already mentioned the lack of support for multi-step lookup expressions. I would propose to enforce the entry specified for all lookup steps, not only the last: ?entry::a?entry::b.
If I understood the formal rules correctly, sequences of update statements are supposed to result in multiple subsequent updates (even though an implementation might merge them if the result would be identical). In my view, this contrasts with the presented term Pending Update List, which suggests (to me) that multiple updates will be performed together on the same copy R', in a defined order, after potential normalizations. I think it would be less misleading to say that with the first update, a copy R' is created, with the second updated R'' is created from R', and so on.
I assume the imply that code like the following one should be supported as well. To start with, should we rather replace SingleExpr with UnaryLookup, or does the target expression need to be arbitrarily composable?

update map $map {
  delete (
    let $root := .?entry::sub?parent()
    let $sub := $root?sub
    return $sub ! (?entry::leav)
  )
}

I believe the hardest challenges remain terminology and usability. It doesn’t feel overwhelmingly intuitive that (only?) the last segment of a lookup expression must use the entry specifier to be able to do updates:

First, we would again need to clarify what exactly a ”map entry” is supposed to be… obviously something else than what the map:entry function yields.
We would need to introduce the concept of “array entries” in the language.
It’s not really clear why entry::X yields something updatable, but not ?X, ?key::X, ?value::X, or ?content::X.

The terminology of the update clauses is fairly descriptive: “delete Y”, “replace X with Y”. Maybe we should try something similar for the lookup and rename entry to something that describes what’s going on, like lookup or track? Examples:

update map $cities {
  delete ?lookup::*??lookup::mail,
  delete ?track::*??track::mail
}

An alternative (similar to the ones discussed in previous weeks) would be to get rid of the explicit specifier and implicitly track the path for delete ?a?b?c. To make this possible, the target expression should be limited to a UnaryLookup expression. Examples:

update map $cities {
  delete .?*??mail
}

…and a possible extension for nodes:

update nodes $cities {
  delete ./*//mail
}

michaelhkay · 2024-06-21T16:55:33Z

Fix #1247

dnovatchev · 2024-08-25T16:36:22Z

A minor error found in one of the examples:

Expression: [ "a", "b", (), "d" ]?value::*
Result: ( ["a"], ["b"], [], ["d"] )

The correct result is actually:

( ["a"], ["b"], [()], ["d"] )

michaelhkay · 2024-09-17T11:32:23Z

I think the changes in this PR were all either abandoned, or reworked as part of other PRs. Therefore closing this one with no further action.

ChristianGruen · 2024-09-17T11:38:35Z

@michaelhkay I think we would need a new version of this PR before we can look at #1283. Things I would expect to be added/revised:

Singular forms of the keys/values/pairs modifiers.
The clarification if we keep item or items, or use content.
The note that arrays behave like map with consecutive integer keys.

…and various changes proposed for expressions.xml: https://github.com/qt4cg/qtspecs/pull/832/files#diff-b37a92a9eb3ab9ba48a00de9627a1124466b9c86ecb2b4989d04be3942c597a6

michaelhkay · 2024-09-17T14:41:13Z

Thanks, yes. I have just spent an hour or so reviewing PR #1283 and I agree it has dependencies on changes that need to be reintroduced or revised.

michaelhkay added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature Tests Needed Tests need to be written or merged labels Nov 15, 2023

ChristianGruen mentioned this pull request Nov 16, 2023

Allow manipulation of maps and arrays #77

Open

michaelhkay mentioned this pull request Nov 22, 2023

Need more discussion and explanation of deep-lookup operator #854

Closed

Arithmeticus reviewed Jan 5, 2024

View reviewed changes

michaelhkay mentioned this pull request Jan 22, 2024

Simulating Objects: Performance #961

Closed

michaelhkay force-pushed the 77-deep-update branch from 2c147ba to 035bae7 Compare February 29, 2024 12:24

michaelhkay added the Revise PR has been discussed and substantive changes requested label Mar 14, 2024

michaelhkay force-pushed the 77-deep-update branch from 9490835 to d2954e0 Compare May 17, 2024 23:20

michaelhkay requested a review from ndw as a code owner May 17, 2024 23:20

ChristianGruen mentioned this pull request May 18, 2024

Generalization of Deep Updates #1225

Open

michaelhkay mentioned this pull request Jun 17, 2024

77b Update expressions #1283

Open

michaelhkay force-pushed the 77-deep-update branch from 78cc64e to 986153c Compare June 21, 2024 17:02

michaelhkay added 8 commits June 25, 2024 08:44

77 Add map:deep-update and array:deep-update

9c37d91

Revised spec for deep update

31100a3

Add formal part of update specification, and examples

5cf4312

fix typo

8c7f9b5

Substantial revision of deep update proposal

550caa3

Ensure that update expressions can select ?x?y?z

b36c465

Move deep update itself to a separate PR

20fb0ad

Minor corrections

3e0226b

michaelhkay force-pushed the 77-deep-update branch from 986153c to 3e0226b Compare June 25, 2024 07:46

michaelhkay removed the Revise PR has been discussed and substantive changes requested label Jun 26, 2024

ChristianGruen mentioned this pull request Jul 22, 2024

Arrays and maps: Members, entries, values, contents, pairs, … #1338

Open

ChristianGruen added the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label Sep 6, 2024

michaelhkay closed this Sep 17, 2024

michaelhkay added Abandoned PR was rejected, withdrawn, or superseded and removed XQFO An issue related to Functions and Operators Blocked PR is blocked (has merge conflicts, doesn't format, etc.) Tests Needed Tests need to be written or merged labels Sep 17, 2024

ChristianGruen mentioned this pull request Jan 20, 2025

Array Lookups: partial removal of out-of-bounds checks #1715

Open

ChristianGruen mentioned this pull request Jan 27, 2025

Node Updates: Functions #1702

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

77 Lookup returning path selection #832

77 Lookup returning path selection #832

michaelhkay commented Nov 15, 2023 •

edited

Loading

ChristianGruen commented Nov 15, 2023

michaelhkay commented Nov 15, 2023

ChristianGruen commented Nov 15, 2023

ChristianGruen commented Nov 16, 2023 •

edited

Loading

michaelhkay commented Nov 16, 2023 •

edited

Loading

michaelhkay commented Nov 17, 2023 •

edited

Loading

michaelhkay commented Nov 17, 2023

ChristianGruen commented Nov 17, 2023 •

edited

Loading

Arithmeticus Jan 5, 2024

ChristianGruen commented Feb 29, 2024

ChristianGruen commented Mar 4, 2024 •

edited

Loading

michaelhkay commented Mar 4, 2024

ChristianGruen commented Mar 4, 2024

ChristianGruen commented Mar 6, 2024

michaelhkay commented Mar 6, 2024

ChristianGruen commented Mar 6, 2024

michaelhkay commented May 17, 2024

ChristianGruen commented May 18, 2024

ChristianGruen commented May 20, 2024

michaelhkay commented Jun 21, 2024

dnovatchev commented Aug 25, 2024

michaelhkay commented Sep 17, 2024

ChristianGruen commented Sep 17, 2024

michaelhkay commented Sep 17, 2024

77 Lookup returning path selection #832

77 Lookup returning path selection #832

Conversation

michaelhkay commented Nov 15, 2023 • edited Loading

ChristianGruen commented Nov 15, 2023

michaelhkay commented Nov 15, 2023

ChristianGruen commented Nov 15, 2023

ChristianGruen commented Nov 16, 2023 • edited Loading

michaelhkay commented Nov 16, 2023 • edited Loading

michaelhkay commented Nov 17, 2023 • edited Loading

michaelhkay commented Nov 17, 2023

ChristianGruen commented Nov 17, 2023 • edited Loading

Arithmeticus Jan 5, 2024

Choose a reason for hiding this comment

ChristianGruen commented Feb 29, 2024

ChristianGruen commented Mar 4, 2024 • edited Loading

michaelhkay commented Mar 4, 2024

ChristianGruen commented Mar 4, 2024

ChristianGruen commented Mar 6, 2024

michaelhkay commented Mar 6, 2024

ChristianGruen commented Mar 6, 2024

michaelhkay commented May 17, 2024

ChristianGruen commented May 18, 2024

ChristianGruen commented May 20, 2024

michaelhkay commented Jun 21, 2024

dnovatchev commented Aug 25, 2024

michaelhkay commented Sep 17, 2024

ChristianGruen commented Sep 17, 2024

michaelhkay commented Sep 17, 2024

michaelhkay commented Nov 15, 2023 •

edited

Loading

ChristianGruen commented Nov 16, 2023 •

edited

Loading

michaelhkay commented Nov 16, 2023 •

edited

Loading

michaelhkay commented Nov 17, 2023 •

edited

Loading

ChristianGruen commented Nov 17, 2023 •

edited

Loading

ChristianGruen commented Mar 4, 2024 •

edited

Loading