-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the $position
argument from the $action
function passed to folds
#1341
Comments
A couple of weeks ago you pointed out that the "formal equivalents" used to specify many functions were untested and (therefore) in some cases incorrect. Over the last week I have been preparing PR #1331 to address this problem. This includes technology to not only ensure that these expressions are syntactically valid, but also that it is possible to execute them and to get the correct expected results for all the examples. To achieve this I had to rewrite some of the equivalent expressions to prevent circular dependencies (resulting in infinite recursion when testing). I found that in many cases it was convenient to define the functions using fold-left, and in many cases the folds are indeed positional. An example is
I think this demonstrates that if you allow orthogonality to drive your design decisions, people will soon find use cases. Your line of argument seems to reduce to "Microsoft got this right and Javascript got it wrong". I don't find that line of reasoning compelling. You don't provide any evidence that Javascript users find this feature unnecessary or difficult. And as a side remark, if you want to convince people to come round to your point of view, it's not a good strategy to start by insulting them. Your first two points seem to suggest that you think those who disagree with you are stupid. That is not a constructive way to have a technical debate, and it's certainly not a line of reasoning that helps you to achieve your goals. I used to be a member of a standards group that was chaired by an industrial psychologist, and after one 5-day meeting he was able to demonstrate that the proposals that succeeded were characterised by the fact that the proposer showed understanding and respect for opposing positions. |
Why use folds here at all? Isn't this exactly what fn:filter was intended to do - and in a much simpler way? filter(1 to count($input), fn($x, $pos) {$predicate($input[$pos], $pos)}) Here is a complete example screenshot:
Nope, not that "Microsoft got this right"... But that Microsoft did the necessary work and found (exactly as we did) no convincing use-cases for position-aware
Nothing like that is intended. The truth is that when I initially proposed to add a And there is nothing "stupid" in such a multi-step approach. Usually the first step encompasses a broad action, which then is narrowed down and made more and more precise in the following phases. We did a good first step, the problem is that we didn't proceed to the next steps - and this is exactly what this issue is aiming to achieve. Someone who learns from each phase and improves the result in every next phase, is not stupid - on the contrary. And I strongly believe we can improve what we did in our first step - this is why I am raising this issue at all. |
"A more complicated way": yes, there are other ways of expressing this, but the aim of the exercise was to eliminate circular dependencies in the language definition, and from that point of view I found that fold-left, as currently defined, was a very useful thing to have as core functionality that everything else could be defined on top of. The definition using fold-left essentially relies only on function calling, and not on predicates, the "to" operator, or "count" - all things that can easily lead to circularity. |
Its quite common to use folds as effectively 'primitive' in functional contexts (because they correspond directly to the underlying maths - i.e. they ARE primitive), so I think its a sensible strategy. |
So is this proposal an objection to?
In other languages I would zip the collection against the natural numbers. e..g. you take ['a','b','c'] and then fold that, as long as the zip is easy in the language, I would take the simplicity of dropping the position argument. So I think I agree with the proposal. P.S. as an aside, microsoft did add positional arguments to |
A shame that the word "zip" has so many meanings... But I have some sympathy with this idea. It's something to consider along with the discussion of array:pairs() etc; and perhaps we should revive the discussion of automatic destructuring of the "zipped" values. It's also related to the thread on how the lookup operator should return labelled values, in which array members are labelled with their position in the array. A thought: perhaps if the callback function supplied to for-each, filter, fold, etc is a focus function, then the body of the function should have access not only to the context item/value, but also to position() and last()? I've never been a great fan of these implicit variables presented as functions, mainly because of the problematic scoping rules, but the language is more orthogonal if we reuse existing concepts rather than inventing new ones. My concern here is that anything we invent in this area might make things even more complicated. |
This suggests ideas that lean towards the grand unification of sequences and arrays - Dimitre's Kollection concept. If we define a structure that is a sequence of (integer, value) pairs with the integers being (1,2,3,...) - let's call it a numbered list -, then it's easy to construct such a structure from either a sequence or an array, and many of the (new) functions/operators we are defining could be defined to operate on a numbered list rather than having separate versions that operate on sequences or arrays. A function expecting to be passed an (integer, value) pair might perhaps declare it in the function signature as But is this really an improvement on what we have now? It delivers the same functionality in a different way, but is it simpler or more powerful? |
Yes, zip is unfortunate, its just that's what its called historically in its most abstract definition, you're 'zipping' two sequences in pairs...like....a zip on a coat....its grown on me over the years I'm not completely following your comments, probably because I don't read all the comments elsewhere, and so are unfamiliar with other floated suggestions, but for context or maybe an alternative reality, I'll write some F# code, you don't need to completer understand it, but you'll get the feel, there's more than one way to do these things, but this one works for me. ("//" are comments)
so here because the languages supports a function The F# people decided that labelling index positions was common enough to have a special function to do it, and then they didn't need positions to pollute any other functions. very similar people employed probably employed mostly by the same organisation though decided in C# for SelectMany to have 2 different versions of the method, one with position and one without, because they deemed it so common a user would want the position of something inside a collection that that was worth it, but in C# at the time, tuples sort of didn't really exist, and it was all a bit clunky to do it the above way (use of folds in C# are pretty rare as its imperative), I think they did this because they wanted to kick the tuple can down the road for later, but wanted the language to be convenient. I think you're a better judge about how easy/clunky the above F# code would be translated into XPath, personally if 'indexing' is easy already, then I would drop the position parameter, if not, then my inclination is to make it easy, and drop the position parameter. |
So a possible option would be:
and then optionally: |
yes I think this is a similar situation where the C# people found themselves 15 years ago (ish), they didn't have a tuples, or pattern matching, but they had a class that represented a tuple, but it was all a bit clunky, so they added the extra positional argument as an additional method overload on some core methods. I find your construction a bit clunky, but I suspect its as clean as you can get without extra machinery, and it means you don't pollute the core functions with position AND if you do revisit how to generate an index, lets say in version 6 tuples exist as first order types, then you simply add a new function (i.e. personally I think its what the C# people should have done, though maybe they had other stuff to worry about, who knows, as Dmitri has pointed out I think they only polluted Select and SelectMany as these were 'core' functions, and they did it as an extra overload). Actually with the extra machinery |
Absolutely. I would simplify the definitions of the functions - seems there is no need to create maps. let $index-base := fn($seq as item()*) {index-where($seq, boolean#1)},
$enumeration :=
fn($seq as item()*){
for-each-pair($seq, $index-base($seq), fn($x, $y) { [$x, $y] } )
}
return
( $enumeration(('a', 'b', 'c', 'd')),
'==================',
[$enumeration(())]
) Very similarly for Maybe we need to give a default value for And maybe the return value of |
Editorial notes:
Please note that all these are opinions, and opinions are biased by nature and tend to be solipsistic. As Michael indicated, inclusive language (e.g., using phrases like “I think”, “I could imagine”, or by putting your observations into context) can be helpful to increase your audience. And for clarification: In the discussion that caused this issue, I mentioned that both Microsoft and JavaScript have chosen a consistent approach. What would be your approach for XPath: Would you propose to remove the positional argument from all higher-order functions, or only for folds as suggested in the title? And my assessment: I am in strong favor of a consistent approach, covering all HOF/callback functions, and I would certainly like to keep positional arguments. They have already proven to be helpful in various of our use cases, and we already seeing developers starting to use them. If we drop them, I believe that the tendency would be to rely on existing solutions rather than using an additional enumeration function (no matter how the position is wrapped in the result of this function): (: positional argument :)
for-each($data, fn($item, $pos) { $pos || '. ' || $item }
(: already popular if there are more than two input sequences :)
for-each(1 to count($data), fn($pos) { $pos || '. ' || $data[$pos] }
(: the most common approach, which can also be wrapped into a function item, e.g. in chains :)
for $item at $pos in $data
return $pos || '. ' || $item For folds, the group of developers who use them is small enough and won’t increase a lot. I’m fairly optimistic that the targeted coders will have enough mental capability to understand what a positional argument does. In case it is dropped, they should be capable of creating a custom approach. |
I agree about trying to be consistent, you could always consistently have the simplest signature, but choose to add additional special cases if you think there is a specific benefits to be had? i.e. I wouldn't view it as a ban on positional arguments, rather than a suggestion to always have the simplest signature available. |
The simplest signature would indeed remain available for all functions, and the positional argument would be optional. If we add it to e.g. |
apologies if I've misunderstood the spec is actually not especially clear about what the parameters mean in HOFs, you have to look at the semantically equivalent code to decipher it. so this..
and it requires that the $action is a function that takes an integer as its 3rd parameter? and this parameter is the position? and this is the parameter that is being suggested is removed? (to make it simpler) I don't know much about javascript to be honest, the languages I do know with fold, I'm pretty sure don't have a position argument in the folder. I'm suggesting if you think this parameter is particularly useful, then have 2 fold-lefts, one with and one without, but mentally the one without is the base definition to me, the 2nd one is the exception thats there for usability reasons. or have I missed the point? |
That's an open issue indeed, with some discussion on it in the GitHub issue #981. One suggestion is to attach comments for the parameters in the function signatures.
This is how reduce and forEach are defined in JavaScript. The current solution in XPath is very similar: It is possible to pass functions with fewer arguments. |
oh...that's what I'm missing. Is that just in 4.0? hmmmm......let me reconsider, my instinct is still to drop all parameters except the inherent ones, and if the user wants to pass extra stuff in, they simply map the array/sequence/map to inject the new values. but in half an hour I may have changed my mind. |
its been an hour... referring back to Hoare's quote about simplicity and things being obviously wrong, for me this sits in that camp. I googled the javascript (its not my world) and almost the first relevant thing I found was this.
which evaluates to
to me that's baffling, there's nothing obvious about it at all, but it all hinges on optional arguments in the parseInt function being passed unintentionally to optional arguments in the mapper lambda. For me this isn't obviously wrong, and I think the grammar is too complex, and to actually make it do whats intended involves a lot of boilerplate that the language designers (of javascript) are presumably trying to help you remove with their flexible definitions, there is an minimal inherent complexity that no matter how you tweak the grammar, if you try to remove boilerplate in one place it will just emerge somewhere else, and here at the cost of my bewilderment. for me it falls in the "so complicated, there are no obvious errors", I have to mentally allign the 'extra' hidden parameters in the map, with the optional parameters in parseInt and then deduce its wrong, I'm not clever enough to do this in general. So I would drop position (I don't like the javascript thing with the missing parameters at all, but that's out of scope here). |
Thank you, Mark (@MarkNicholls ) for showing and asking for common sense! I absolutely agree about the unnecessary complexity being introduced into the clean definition of folds. I also pointed several times in this discussion that we can decide to have a separate , new function for folds with position - let's call it
No, you absolutely hit the focus of discussion and you very correctly pointed out by using the word "pollute" - that this would be the result of inserting the position into the long-established and pristine-clear folds. |
@dnovatchev To be sure, would it make sense then to change the title of this issue to something like “Remove the $position argument from all higher-order functions”, or would your suggestion be to only drop it for folds (and probably scans)? |
@MarkNicholls Thats’s baffling indeed. We are safe in this regard, as the arity of the passed function must be specified when passing functions. A valid XPath 4 expression is: I appreciate your assessment. |
yes, I did realise it wasn't quite as bad in xpath, though if the function was in a variable, then it would be more opaque, you would have to trace it back, I'm not convinced, but I haven't got anything concrete to give you in response, and its off topic. If I think of something concrete I'll raise it separately. |
It is not only black and white, in reality there are more than 50 shades of grey. I would be happy if the position-aware As @MarkNicholls pointed out, even though some of the methods of the Enumerable class in .NET have a position-aware action-argument, these are specified as separate overloads, thus not adding complexity to the already existing, simpler overloads. The reason we cannot do the same in XPath is purely technical -- we cannot have two overloads that have the same arity. One could specify though, a definition for fn:for-each(
$input as item()*,
$action as fn(*) as item()*,
$position-aware as xs:boolean := false()
) as item()* And we can specify a rule that for the position-aware case the signature of the $action as fn(item(), xs:integer) as item()* and in the default case (when by default $action as fn(item()) as item()* Thus, in the simple, default, general case these functions will continue to be simple, readable and position unaware. But should someone need to use a position-aware I think this is the best we could do for the functions (sans folds - which we agree are not position-aware) so that the extra position-awareness / complexity stays hidden in the general case. Certainly, we may well decide not to use the above formalism and, instead, to give to every position-unaware function a corresponding double - position-aware and with a separate name - but then the number of functions will be doubled. |
I think I know who you are referring to. To counter this – hopefully non-polemically – allowing shades of grey might be to accept diversity, and to agree that positional arguments will be helpful to some while they may not be helpful to all.
Thanks for the clarification. |
ah, actually I would prefer them to be removed from all functions, and actually folds are the ones I'm least troubled by (though I still don't like them). There is some reasoning behind this, it isn't just stylistic or blind prejudice. so functions like
you may be tempted to think that this is quite specific to processing sequences but in actual fact it is a 'map'/'bind' and applies (in principle) to all sorts of data types. in ML for sequences it would be
This functions appears to be quite specific to a datatype, but in many languages this is generalised to
where M is a data type over some other type (e.g a parametric type/a generic) i.e. you can replace M with your data type (as long as its a Functor in this case) and it holds, so we can write
and this functions exists, but also (here M is "Map Key" i.e. a partially applied type function, parametric types are functions over types)
and these functions exist (actually I don't know if they exist in XPath, but they do in principle). As soon as you introduce position this generalisation breaks, an integer position of a Sequence makes sense, but an integer position of a Map doesn't, you could try to generalise your position to item()*, but if we had a Set, then that by definition doesn't have a position (there are lots and lots of data types that are functors, Map for example is a Functor twice over, once in the Key, and once in the Value). So the observation is, the more complex you make the signature (abstraction) the less general that signature can be applied across different data types (models). Ok, so who cares? Well Scala/Haskell/(in some cases C#/F#) programmers care because these things are explicit in the language, but in other languages these patterns are preserved implicitly (i.e. without a formal mechanism) idiomatically, it makes the language more regular, more predictable, and more general (it can be quite liberating, because rather than search for a function that does X, if you know the signature, then you usually immediately know what the function is called even if you've never used the data type before). Ironically this DOESN'T directly apply to fold (or unfold), these functions are "general" (in the sense the correspond to some mind boggling category theory thing I don't understand but can be applied across all data structures), but their signature is directly and 'mechanically' linked to the structure of the data type itself, i.e. a fold on a tree has a different signature to a fold on sequence BUT the general principle applies, the more complex you make an abstraction (which a signature is), the less general it can be applied, and if you formally derive the signature for a fold over sequence/array it won't have position in it. I understand the convenience argument, and actually given that I now know that functions parameters can be coerced, I completely understand and sympathise with the temptation to make the users lives easier, i.e. to give them things they may want, but I genuinely think it is better to give the users the tools to get the data they need and compose functionality in the simplest (and thus most general) possible manner, i.e. empower them, rather than try to do it for them. So, IF its easy for a user to inject the position of a entry in a sequence, then it should be unnecessary to have functions that give them position later down the line (it was clunky in C# at the time, so they added it to overloads of Select/SelectMany). |
@MarkNicholls Here are their signatures: fn:for-each($input as item()*, $action as fn(item()) as item()*
array:for-each($array as array(*), $action as fn(item()*) as array(*)
map:for-each($map as map(*), $action as fn(xs:anyAtomicItem, item()*) as item()* With version 4.0, the position argument was attached to the first two functions. Thanks for the interesting and generic perspective. To complete the picture, @dnovatchev started an attempt to introduce a Kollection type (#910). Another (rather pragmatic) attempt to assimilate maps and arrays can be found in #1338. Of course the 3.1 solution cannot be reversed (i.e., in contrast to sequences & arrays, the function parameters of map functions come with at least 2 arguments for keys & values, as shown above). From a pragmatic perspective, this has certainly been an intuitive design choice. |
The
$position
argument, passed to the$action
-function-argument of the folds is unnecessary and artificial:$action
-function-argument to all functions processing sequences and producing result based on their values.And @michaelhkay himself: "I'm inclined to propose dropping the position argument for both fold and scan. It complicates the specification and the use cases are unconvincing. I believe it has been incorrectly specified (for fold-left, the first time $action is called, the value supplied for $pos is 2, whereas for fold-right it is count($input)-1; and the "Error conditions" section talks of $action being applied to 2 arguments). For the -right forms in particular, the semantics are mind-bending enough without introducing this complication."
$action
function when a 3-arg. function was meant (or the other way around).Aggregate
All
Any
Average
First
FirstOrDefault
Last
LastOrDefault
Max
MaxBy
Min
MinBy
Zip
This is not an accidental mistake, as Microsoft added to other Enumerable methods overloads that do require position-aware
$action
-function arguments:Select
SelectMany
SkipWhile
TakeWhile
Where
{"position": $input[$pos]}
and then a fold-operation that only has a non-position-aware$action
-function argument, and has this map as input.Proposed solution:
$action
-function.The text was updated successfully, but these errors were encountered: