Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

77 Lookup returning path selection #832

Closed
wants to merge 8 commits into from

Conversation

michaelhkay
Copy link
Contributor

@michaelhkay michaelhkay commented Nov 15, 2023

Note that unlike many of the functions we have added, these are non-trivial: they cannot easily be implemented in XSLT or XQuery.

This is a first cut and I expect some refinement will be needed, but reviews are invited.

I might subsequently propose layering some XSLT syntax on top of this for convenience.

@michaelhkay michaelhkay added XQFO An issue related to Functions and Operators Feature A change that introduces a new feature Tests Needed Tests need to be written or merged labels Nov 15, 2023
@ChristianGruen
Copy link
Contributor

Exciting! And ambitious. Some questions (that could possibly have been answered by reading the rules in more detail):

  1. If a subitem of a candidate is updated, I assume the result is dropped in favor of the updated parent item?
(: will probably return [ 'x' ] ? :)
array:deep-update(
  [ [ '1' ] ],
  fn { ?1, ?1?1 },  (: → or, possibly, with #297: ??1 :)
  fn { 'x' }
)

(: will probably return map { 'a': 1 } ? :)
map:deep-update(
  map { 'a': map { 'a': map { 'a': 0 } } },
  map:find(?, 'a'),
  fn { 1 }
)
  1. Can be items updated as well?
map:deep-update(
  map { 'x': (1, 2) },
  fn { head(?x) },
  fn { 0 }
)

@michaelhkay
Copy link
Contributor Author

Q1: I chose to make this an error; you can't select two items A and B if one contains the other. Dropping the update on the child would have been another option.

Q2: yes, that example should work fine.

@ChristianGruen
Copy link
Contributor

Q1: I chose to make this an error; you can't select two items A and B if one contains the other. Dropping the update on the child would have been another option.

With XQuery Update, it’s sometimes convenient to be able to write delete nodes //x.
Being strict in the beginning might be a good choice, though.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 16, 2023

Upfront: The evolve function from Ramda is interesting as well: https://ramdajs.com/docs/#evolve


Thanks again for the proposal. As I was somewhat thrilled by the powerful semantics of the single update function, I presented this in an internal meeting. The first feedback I got was: Gosh, updates are really becoming cryptic now…

One reason why I believe people are fond of XQuery Update is its verbosity (that doesn’t necessarily include myself). For the following XML document…

<xml>
  <person><name>John</name><city>London</city></person>
  <person><name>Jack</name><city>Manchester</city></person>
</xml>

…there’s hardly a need to explain what the following expressions do:

delete nodes //city,
replace node /xml/person[1]/name/text() with 'Joe',
insert node <country>UK</country> into /xml/person[1]

With the proposed functions, and with the full notation, we would probably get something like…

let $persons := [
  map { 'name': 'John', 'city': 'London' },
  map { 'name': 'Jack', 'city': 'Manchester' }
]
return $persons
=> array:deep-update( (: delete :)
  function($persons) { array:values($persons) },
  function($person) { map:remove($person, 'city') }
)
=> array:deep-update( (: insert :)
  function($persons) { $persons(1) },
  function($person) { map:put($person, 'country', 'UK') }
)
=> array:deep-update( (: replace :)
  function($persons) { $persons(1)('name') },
  function($person) { 'Joe' }
)

With a syntax as compacted as possible we get:

$persons
=> array:deep-update(fn { ?* }, fn { map:remove(., 'city') })       (: delete :)
=> array:deep-update(fn { ?1 }, fn { map:put(., 'country', 'UK' })  (: insert :)
=> array:deep-update(fn { ?1?name }, fn { 'Joe' })                  (: replace :)

This probably looks accessible to people who use lookup operators and function items every day, but it could get a mean hassle for more occasional users. One approach to mitigate this could be dedicated functions for deletions, insertions, and replacements:

$persons
=> deep-delete( fn($persons) { $persons?*?city } )
=> deep-insert( fn($persons) { $persons?1 }, fn { 'country' }, fn { 'UK' } )
=> deep-replace( fn($persons) { $persons?1?name }, fn { 'Joe' } )

The syntax could be further simplified by providing a simple sequence of steps to the target nodes, and getting rid of function items, at least for simple operations:

$persons
=> deep-delete( [1 to array:size($persons), 'city'] )
=> deep-insert( [1], 'country', 'UK' )
=> deep-replace( [1, 'name'], 'Joe' )

Remarks:

  • I’ve dropped the map: and array: prefixes. I think a user shouldn’t care about the exact input type as long as it’s an array or a map.
  • Instead, we may need to focus on the target type (deep-insert requires different arguments for arrays and maps).
  • I’ve used an array structure for the first parameter, as it allows us to specify multiple child nodes.
  • There could be a better wildcard solution instead of 1 to array:size($persons).
  • In addition, the definition of descendant paths is an open issue here.

Maybe we need just both: Simple variants for non-experts, and deep-update as a more powerful and generic solution with function items?


Personally, I have a hard time convincing myself that annotations can be implemented without compromising the vast majority of our non-updating use cases. Even if the data model isn’t affected, it would seem to affect numerous places in the code that are critical in terms of runtime and memory consumption.

An separate path array, as suggested for the simpler functions variants, would certainly simplify things a lot.

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Nov 16, 2023

I see the deep-update function as a primitive onto which we can put XQuery/XSLT syntactic sugar, so I'm not too concerned about the usability of the function. The main reason I did it this way was that it made it easier to define the semantics in a way that was common to XSLT and XQuery.

I've certainly considered the idea of restricting the expression that can be used in the select function - perhaps making it something like the XSLT pattern syntax. In fact the current Saxon implementation does impose restrictions - see https://www.saxonica.com/documentation12/index.html#!functions/saxon/with-pedigree But I'm not convinced they are necessary, since most of the functions that navigate maps and arrays can be defined in terms of a few primitives.

An important aim is to ensure that the implementation can use immutable/persistent structures so that parts of the tree that don't change don't need to be physically copied. I've never managed to achieve that with node trees (despite several attempts) because node identity and parentage are so deeply embedded in the data model. I think it's achievable here because identity and parentage are transient.

I do recognise that implementation is challenging and that there's a need to demonstrate feasibility with a proof-of-concept implementation. But I think it's a good idea to start with a spec that we would like to be able to deliver to users, and then work out how to solve the implementation problems.

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Nov 17, 2023

One potential simplification would be to select (for replacement) only values that are immediate members of an array or entry-values of a map, rather than individual items within those values. The main problem with this is that the lookup operators (? and ??) select individual items, not values. It might be possible to refine the semantics of those operators so that in some contexts they select the values rather than the individual items.

There might also be mileage in borrowing from JSONPath, where the result of a selection expression is not just a value, but a path to a value - equivalent in essentials to the "annotated value" in this proposal, but described rather differently.

A related problem is handling entries where the value is an empty sequence. It's hard to select such values using lookup operators, and we can't currently replace an empty sequence by something else. If we replace a non-empty sequence by an empty sequence, it doesn't delete the array member or map entry, it merely sets it to empty.

@michaelhkay
Copy link
Contributor Author

Note that the proposal could be considerably simplified if we were to adopt the suggestion in #298 (comment) that arrays should be treated as a subtype of maps.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Nov 17, 2023

I'm optimistic about the chance to avoid full copies and reference unmodified parts of the original map/array as that's a general requirement for immutable data structures – and I agree it's not applicable to XML node trees unless we drop node identities.

My main concern is about the selection process, though. The annotation/pedigree approach reminds me of something we have done for XQuery Full Text scoring in the past (for $n score $s in //x[. contains text 'y'], etc.). In our current implementation, most of it has been dropped, as the optional enrichment affected too many expressions (the additional challenge here was that scores needed to be propagated to subsequent results). We never became happy with it, but of course this may have partly been due to the deficiencies how we tackled it.

My conclusion was that properties that can get lost unnoticed during processing (possibly via distinct-values, sorts or casts, in our case) – such properties shouldn't exist at all in a formally strict language; they should rather be made explicit.

I think the XQUF spec is a valuable document to learn about various obstacles users might encounter for map/array updates. Luckily we don't need to care about different node types, namespaces or attributes vs. texts, but there are good reasons why more than one update primitive was introduced (technically, it would certainly have been possible to define fewer primitives). With op:deep-update, for example, the target of a deletion would need to be split across both the selection and the update function, as indicated above. I believe it's more than syntactic sugar:

deep-update( fn { ?1 }, fn { array:remove(., 2) } )
vs deep-delete( fn { ?1?2 )
or deep-delete( [1, 2] )

<fos:expression><eg>map:deep-update(
map{'x':map{'p':1, 'q':2}, 'y':map{'p':11, 'q':12}},
fn{map:find(.,'p')?*},
f{.+5})</eg></fos:expression>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f{.+5} - > fn{.+5}

@ChristianGruen
Copy link
Contributor

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Mar 4, 2024

Thanks to the re-enabled previews; I’ve just spotted the new UpdateExpr.

Our experience with XQUF shows that chains/pipelines of updates are important, if not essential. We should understand how this would look like with the new syntax. I’ve seen the following example:

let $temp := 
      update map $data 
      replace ??Paris 
      with array:append(map{"to":"Frankfurt", "distance":480})
   return update $temp 
          replace ??cities 
          with map:put("Frankfurt", [map{"to":"Paris", "distance":357}])

I assume it’s supposed to be array:append(., ...) and map:put(., ...)?

We should head for a flat syntax; maybe something like:

update { $data } {
  replace value ??Paris with array:append(., map { "to": "Frankfurt", "distance": 480 }),
  for $city in ??cities
  return replace value $city with map:put($city, "Frankfurt", [ map { "to": "Paris", "distance": 357 } ]))
} {
  delete values ??phone
}

And I bet that XQUF users will demand a similar syntax for nodes. In principle, the same construct could be used to wrap existing XQUF expressions:

update { $node } {
  for $n in .//*[. = 'Paris']
  return insert nodes (<to/>, <distance/>) into $n
} {
  delete nodes .//phone
}

@michaelhkay
Copy link
Contributor Author

Yes, doing the examples led me to similar conclusions. But I'm inclined to get something simple working first. It's clearly already quite powerful.

That's why I stuck with "replace" for the time being, though it would clearly be useful to also have insert/remove.

I'm not comfortable using "," between expressions for something other than sequence concatenation - though I know that's what we do in function calls!

@ChristianGruen
Copy link
Contributor

I'm not comfortable using "," between expressions for something other than sequence concatenation - though I know that's what we do in function calls!

…and for basically all other operations that cannot be represented by a single expression ;·) In non-trivial XQuery code, it’s very common indeed.

@ChristianGruen
Copy link
Contributor

@michaelhkay A remark regarding test cases:

Would it be possible to create PRs in the qt4tests repository for tests of features that haven’t been added to the specification yet, and only merge those once the PR has been finalized and accepted?

@michaelhkay
Copy link
Contributor Author

Yes, it would probably be a good discipline to start using branches and PRs on the test suite.

@ChristianGruen
Copy link
Contributor

Yes, it would probably be a good discipline to start using branches and PRs on the test suite.

Glad to hear. To avoid too much overhead, I believe it’s fine to just merge things that have already been accepted by the group.

@michaelhkay michaelhkay added the Revise PR has been discussed and substantive changes requested label Mar 14, 2024
@michaelhkay michaelhkay requested a review from ndw as a code owner May 17, 2024 23:20
@michaelhkay
Copy link
Contributor Author

I have revised the proposal very substantially. The changes also impact on the semantics of deep lookup, and the concept of labels.

An update expression can now have multiple clauses, including delete, replace, and extend sub-operations.

The creation of parent, ancestor, and root functions is now built-in to the deep lookup operator when the entry modifier (formerly pairs) is used. It is no longer required to explicitly pin the root first. The fn:pin and fn:label functions are therefore dropped, along with the section in the XPath book describing them.

The semantics of the update expression are now described formally in terms of an XSLT-style recursive-descent transformation of the tree of maps and arrays.

There's a glitch I still need to fix - multi-step lookup expressions like ??x??y don't retain ancestry information all the way back to the root. But I think the proposal is now in a state where detailed review is very welcome.

@ChristianGruen
Copy link
Contributor

Very interesting again; it will take me a while to give acceptable feedback.
To make things easier for me, I’ll create a related issue on updating nodes first.

@ChristianGruen
Copy link
Contributor

Some feedback (not as comprehensive as I had hoped):

  1. What’s the rationale behind the additional map/array keywords in the syntax? Is it required to differentiate the two types for the update semantics, or is it a tentative step to add a node keyword later on?
  2. Our experience with node updates has shown that it is very useful and powerful to support sequences as input for updates (and to avoid ugly constructs like for $map in $maps return update map $map { ... }). Similar to the XQUF node/nodes keywords, the plural forms maps/arrays could be allowed as aliases (without attaching further semantics).
  3. In the introductory examples, all leaf targets are maps. Will it be possible to do things like update map { delete ?array?entry::1 }, or is the syntax restricted to updating map leaves?
  4. I suggest sticking to the XQUF wording – even if the order of arguments is swapped – and changing extend TARGET with DATA to insert DATA into TARGET.
  5. The syntax of some examples in “4.13.5.1 Update example” deviates from the previous ones (curly braces are missing in the first example; no ::entry specifiers).
  6. In your comment further above, you already mentioned the lack of support for multi-step lookup expressions. I would propose to enforce the entry specified for all lookup steps, not only the last: ?entry::a?entry::b.
  7. If I understood the formal rules correctly, sequences of update statements are supposed to result in multiple subsequent updates (even though an implementation might merge them if the result would be identical). In my view, this contrasts with the presented term Pending Update List, which suggests (to me) that multiple updates will be performed together on the same copy R', in a defined order, after potential normalizations. I think it would be less misleading to say that with the first update, a copy R' is created, with the second updated R'' is created from R', and so on.
  8. I assume the imply that code like the following one should be supported as well. To start with, should we rather replace SingleExpr with UnaryLookup, or does the target expression need to be arbitrarily composable?
update map $map {
  delete (
    let $root := .?entry::sub?parent()
    let $sub := $root?sub
    return $sub ! (?entry::leav)
  )
}

I believe the hardest challenges remain terminology and usability. It doesn’t feel overwhelmingly intuitive that (only?) the last segment of a lookup expression must use the entry specifier to be able to do updates:

  • First, we would again need to clarify what exactly a ”map entry” is supposed to be… obviously something else than what the map:entry function yields.
  • We would need to introduce the concept of “array entries” in the language.
  • It’s not really clear why entry::X yields something updatable, but not ?X, ?key::X, ?value::X, or ?content::X.

The terminology of the update clauses is fairly descriptive: “delete Y”, “replace X with Y”. Maybe we should try something similar for the lookup and rename entry to something that describes what’s going on, like lookup or track? Examples:

update map $cities {
  delete ?lookup::*??lookup::mail,
  delete ?track::*??track::mail
}

An alternative (similar to the ones discussed in previous weeks) would be to get rid of the explicit specifier and implicitly track the path for delete ?a?b?c. To make this possible, the target expression should be limited to a UnaryLookup expression. Examples:

update map $cities {
  delete .?*??mail
}

…and a possible extension for nodes:

update nodes $cities {
  delete ./*//mail
}

@michaelhkay
Copy link
Contributor Author

Fix #1247

@michaelhkay michaelhkay removed the Revise PR has been discussed and substantive changes requested label Jun 26, 2024
@dnovatchev
Copy link
Contributor

A minor error found in one of the examples:

Expression: [ "a", "b", (), "d" ]?value::*
Result: ( ["a"], ["b"], [], ["d"] )

The correct result is actually:

( ["a"], ["b"], [()], ["d"] )

@ChristianGruen ChristianGruen added the Blocked PR is blocked (has merge conflicts, doesn't format, etc.) label Sep 6, 2024
@michaelhkay
Copy link
Contributor Author

I think the changes in this PR were all either abandoned, or reworked as part of other PRs. Therefore closing this one with no further action.

@michaelhkay michaelhkay added Abandoned PR was rejected, withdrawn, or superseded and removed XQFO An issue related to Functions and Operators Blocked PR is blocked (has merge conflicts, doesn't format, etc.) Tests Needed Tests need to be written or merged labels Sep 17, 2024
@ChristianGruen
Copy link
Contributor

@michaelhkay I think we would need a new version of this PR before we can look at #1283. Things I would expect to be added/revised:

  • Singular forms of the keys/values/pairs modifiers.
  • The clarification if we keep item or items, or use content.
  • The note that arrays behave like map with consecutive integer keys.

…and various changes proposed for expressions.xml: https://github.com/qt4cg/qtspecs/pull/832/files#diff-b37a92a9eb3ab9ba48a00de9627a1124466b9c86ecb2b4989d04be3942c597a6

@michaelhkay
Copy link
Contributor Author

Thanks, yes. I have just spent an hour or so reviewing PR #1283 and I agree it has dependencies on changes that need to be reintroduced or revised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Abandoned PR was rejected, withdrawn, or superseded Feature A change that introduces a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants