Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

720 Allow methods in maps with access to $this #916

Closed
wants to merge 3 commits into from

Conversation

michaelhkay
Copy link
Contributor

This proposal allows functions within maps to access the containing map using the variable $this.

The proposal needs editorial work to integrate it fully into the text, but it is intended to be sufficiently complete to enable a full technical review.

Fix #720

@michaelhkay michaelhkay added the Tests Needed Tests need to be written or merged label Dec 21, 2023
@ChristianGruen ChristianGruen changed the title Allow methods in maps with access to $this 720 Allow methods in maps with access to $this Dec 21, 2023
@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 22, 2023

I appreciate the effort spent in the new proposal. There are basically two aspects that continue to make me feel uneasy:

  1. Currently, $map?X and $map('X') are equivalent (as a result, our processor uses the same internal representation for such expressions). By introducing a custom strategy for lookups that differs from dynamic function calls, the analogy would break apart. If we are going to activate methods when they are requested, we should not limit the activation step to the lookup operator: It should instead be done generically.
  2. The lookup strategy feels like a good ad-hoc solution. But don’t we restrict ourselves in the long term if we activate maps at such a late stage? A processor will have more freedom to optimize code if bindings take place as early as possible. – I think we should at least try to update method closures when a map is created (from scratch, or as an updated map).

As I indicated in the original thread, I’d feel much more comfortable if we had a custom data type for objects that allowed for more static optimizations. Maps certainly have properties that make them suitable for simulating objects. Other properties are contradictory and error-prone: Variables can be removed at runtime that functions rely on; functions can be overwritten with arbitrary data or functionality; the initial record type can be easily changed by performing updates that don’t match the item-type definition; etc. etc.

Another important aspect is that error messages are already wildly cryptic when functions are involved. A basic example:

declare item-type num:int as record(n as xs:integer, square as %method function() as xs:integer);
declare function new-int($n as xs:integer) as num:int {
  map { 'n' :$n, 'square': %method fn() { $this?n * $this?n } }
};
let $int := new-int(3)
let $updated-int := map:remove($int, 'square')
return $updated-int?square()

The raised error messages would be something like “An empty sequence cannot be invoked”. The actual “bug” occurs earlier in the code, though: Why can methods be removed at all? Shouldn’t it be the task of the programming language to prevent users from doing this? If we had an object type…

declare function new-int($n as xs:integer) as object(*) {
  object { 'n': $n, 'square': fn() { $this?n * $this?n } }
};

…we wouldn’t provide functions like object:remove as it simply makes no sense. $this references could be attached to the object, it could be referenced from function items without %method annotations, and a processor would have much more freedom to statically check and optimize the resulting code.

However, I’m aware that it may be unrealistic to already achieve this with version 4.0 of the language if we decide to finalize it in 2024.

@ChristianGruen
Copy link
Contributor

An editorial note: When I first read the proposal, it was not obvious to me that the record definition is not a prerequisite for declaring maps with methods. Perhaps the first example could be stripped down to the basics:

declare function geo:rectangle($height, $width) {
  map {
    "height": $height,
    "width": $width,
    "area": %method function() { $this?height * $this?width }
  } 
};

…or even…

let $rectangle := function($height, $width) {
  map {
    "height": $height,
    "width": $width,
    "area": %method function() { $this?height * $this?width }
  } 
};

@michaelhkay
Copy link
Contributor Author

Thanks for the feedback.

  1. I did consider associating this behaviour with map:get() rather than with the lookup operator per se. But then you start wondering about other operations such as map:for-each. In the end I felt it was cleaner to redefine "?" as doing two things: a map:get(), followed by a binding of $this.
  2. I think there's plenty of scope for the implementation strategy to be different from the model described in the specification. But the specification is a lot simpler if $this is bound at the last possible moment, avoiding complications of operations like map:put() rebinding $this to the new map.

I think the question of maps being rather too flexible and therefore too error-prone is valid, but it's orthogonal to the proposal. I think you're looking for some way of labelling a map with a type that constrains what operations can be performed on the map; I think that's a separate issue. It's more ambitious than this proposal, and I would couple it with the equally desirable (and ambitious) aim of defining an explicit class hierarchy. The point about this proposal is that it gives you a lot of bangs for the buck.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 22, 2023

Another editorial note: $self should probably be $this.

I've rewritten the examples to use a potential object type:

let $calc := object {
  "product": fn($in) {
    if (empty($in))
    then 1
    else head($in) * $this?product(tail($in))
  }
}
return $calc?product((1.05, 1.04, 1.03))

let $lib := object {
  "even": fn($v) { $v = 0 or $this?odd(abs($v)-1) },
  "odd" : fn($v) { not($this?even($v)) }
}
return $lib?odd(23)

…but I think we should go further and provide a custom declarator (maybe there's a better name than object):

declare object int($value as xs:integer) {
  (: variables; could also be generated automatically
     from the defined parameters :)
  value := $value,
  (: functions.. here, `$value` would work as well :)
  square = fn() { int($this?value * $this?value)  }
}
let $int := int(5)
return $int?square()?square()?value

It would be much easier for processors (@rhdunn maybe even IDEs?) to raise static errors for undefined functions, such as $int?product(), and to give better user feedback.

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Dec 22, 2023

First of all: Nothing is easier than countering elaborate work with contrasting suggestions, and hoping that someone will fix all the bits and bobs for you. That being said, I’ll exactly continue with that…

I think you're looking for some way of labelling a map with a type that constrains what operations can be performed on the map.

I think I’m looking for more than that, and yet I believe it’s not too far away: One essential aspect that I consider essential, besides user feedback, is efficiency: Objects and functions can be implemented very slickly and straightforwardly if their structure is statically known: you can work with fixed offsets to address their contents, etc. “Looking up” functions in maps already implies that it’s not necessarily a cheap approach: There is usually no need to go and search something if it always exists in the same place. Just think of sequences: While it is possible to store their contents in maps, no implementor would do that as there are smarter and more efficient ways. This may all be irrelevant for an occasional computation of a square or product value, but it makes a big difference if we want to allow users to write serious applications.

By spending some more time on the fundamentals, I believe we can push the concept much further than by just extending an existing data structure that was primarily designed to hold a variable size of keys and values. If we introduce methods as proposed, I doubt we’ll envisage a more thorough solution later on.

The good thing is: Hardly any map function would need to be duplicated for a new object/struct/record type. Next, if we had a declarator for this data structure, the types could be derived from that constructor. Just think of the example in the proposal:

declare function geo:rectangle(
       $height as xs:double, $width as xs:double)
    as record(height as xs:double, 
              width as xs:double,
              area as %method function() as xs:double) {
    map{"height": $height,
        "width": $width,
        "area": %method function(){$this?height * $this?width}
       } 
};

Wouldn’t it be much more convenient to drop the redundant information and derive the structure from the declaration?…

declare record geo:rectangle($height as xs:double, $width as xs:double) {
  height := $height,
  width := $width,
  area := function() { $this?height * $this?width }
};

…or even

(: definition: record: height as xs:double, width as xs:double, area as function() as xs:double :)
declare record geo:rectangle($height as xs:double, $width as xs:double) {
  area := function() { $height * $width }
};

It’s tempting to combine this with the already existing record definitions, but then we would again resort to maps.

The point about this proposal is that it gives you a lot of bangs for the buck.

I like the phrase. It’s probably exactly the bangs I’m pretty much afraid of…

@rhdunn
Copy link
Contributor

rhdunn commented Dec 22, 2023

I think records should be sufficient w.r.t. static checking, as they define the properties that are required and optional. That includes function type signatures.

The object concept would help in the case where the functions are the same across all instances of the object. In that case, an IDE could more easily link/navigate to the function definition. Otherwise, the IDE would need to analyse where/how the instance was created in order to try and extract that information.

There should be enough context with the current proposal for an IDE to infer the presence and type of the $this variable. Although, I've not yet tried supporting this within an IDE context.

@Arithmeticus
Copy link
Contributor

As a potential user, I find some aspects of this PR opaque and unexpected.

I am under the general impression that the expressions $rect?area, map:get($rect, "area"), and $rect("area") are all equivalent, and one just adopts their preferred syntactic flavor (I myself prefer the second flavor because I think it better communicates my intentions to subsequent readers of my code). To support the concept of methods in one syntactic approach and not the others appears to break that implicit contract. If my understanding is incorrect, and these are not syntactic alternatives, perhaps the difference can be communicated more effectively. At any rate, I recommend a note that briefly explains why the choice was made. I don't think I'd be the only one puzzled/concerned why calling on the map value "area" will trigger either a response or a dynamic error, depending on which syntax I've adopted.

Further, if the method is not activated under the last two of the three expressions, what can a user expect returned when calling map:get($rect, "area") or $rect("area")? I believe the paragraph that discusses the difference between terms dormant and active implies an error, but it should be restated in the paragraph starting "It does not happen as a result...."

I assume that a map can refer to a function defined externally for its method function, e.g., , "area": %method $myfunction. What happens when $myfunction, which has $this, is invoked such that it is not a singleton map entry value? Will $this throw a dynamic (or static) error? Say so in the prose?

@michaelhkay
Copy link
Contributor Author

Takeaway from today's discussion.

I think there are really two things we are trying to achieve.

One is the ability to write functions that refer to the map/record with which they belong using a variable such as $this. As pointed out, this can be done easily enough using the => operator: $rectangle => area() invoking a function that takes the $rectangle map/record as its only argument.

The second is the ability to give the function a name with local scope, so that we can have an area() function for rectangles that is different from the area() function for circles, without having to use globally-scoped names.

One solution would be to have an operator that is like "?" in that it looks for the function within the map (or interprets the function name with local scope in some other way), but is like "=>" in that it uses the LH operand as the implicit first argument of the function.

Perhaps we could define $rectangle => ?area() (new syntax not currently allowed) as a shorthand for $rectangle?area($rectangle), where the value of $rectangle?area is a perfectly ordinary function that accepts $rectangle as its first argument?

Or perhaps we could allow the syntax $rectangle?area($) with the same effect: again interpreting it as a syntactic macro. The '$' is supposed to connote a magic argument: we are calling an arity-1 function here.

In both cases area is a regular function, so if you want to do a regular function call you can, for example

$rectangle?(if ($random) then 'area' else 'perimeter')($rectangle)

@ChristianGruen
Copy link
Contributor

Perhaps we could define $rectangle => ?area() (new syntax not currently allowed) as a shorthand for $rectangle?area($rectangle), where the value of $rectangle?area is a perfectly ordinary function that accepts $rectangle as its first argument?

Another already existing variant for $rectangle => ?area() would be $rectangle !? area(.). Probably both of them are similarly intuitive (or counterintuitive, depending on what one is used to)?

$rectangle?(if ($random) then 'area' else 'perimeter')($rectangle) would be $rectangle !? (if ($random) then 'area' else 'perimeter')(.).

@ndw
Copy link
Contributor

ndw commented Jan 17, 2024

If we're going to end up adding magic, can we just define the problem out of existence with more magic? Can we simply say that if the expression on the right hand side of ? is a function item, we pass the map as the first argument implicitly?

$map?key === map:get($map, 'key')

$map?f(x) === $map => f(x)

Given that we have ? and ??, I'm a little leary of trying to explain to users that we also have => ? or !?

@michaelhkay
Copy link
Contributor Author

Can we simply say that if the expression on the right hand side of ? is a function item, we pass the map as the first argument implicitly?

That breaks compatibility, and it stops you putting ordinary ("static") functions in a map.

Also note that $map?f(x) === $map => f(x) isn't what we want - we want f to have local scope rather than global scope.

There's a range of possibilities for making $map?f(x) work based on the characteristics of f. One is my original proposal: f is annotated as a %method. Another would be that the arity of f is one greater than the number of arguments supplied. Another is that the first argument is named $this (or something reserved like $fn:this). Another is that the function body accesses the map using a function call this() which involves adding something to its captured context.

Let's explore that idea:

  • The dynamic context is extended with an item called the "containing map".
  • When the result of map:get, or operations that invoke map:get, is a singleton function item, the returned function item has its captured context augmented by setting the "containing map" to the map from which it was obtained.
  • The value of the containing map is available using the function call this().

Semantically kludgey, but it doesn't impose much cognitive load on the typical user.

@ChristianGruen
Copy link
Contributor

Another is that the function body accesses the map using a function call this() which involves adding something to its captured context.

Good to read, that’s what I also had in mind yesterday. It would be my preferred approach (and I believe it would cause fewer complications than dormant variables at compile time).

@ndw
Copy link
Contributor

ndw commented Jan 17, 2024

My notation was a bit sloppy, I guess. When I said $map?f(x) === $map => f(x), I didn't mean the global function item f(), I meant f the function item that was taken from the map. But I see your point about breaking compatibility with "ordinary" function items in the map.

@dnovatchev
Copy link
Contributor

dnovatchev commented Jan 19, 2024

I hope it is still not too-late.

I propose to have a simple syntax for invoking a "member-function" with key myFunName on a map $m like this:

$m ?> myFunName({any-arguments-here})

The rules are simple:

  1. The left-hand-side (LHS) must be a map.

  2. The right-hand-side (RHS) must be a key-name of the map specified by the LHS.

  3. The value of the key-name specified by the RHS must be a function.

  4. The function will be invoked with a first, implicitly-provided argument, which is $m

@michaelhkay
Copy link
Contributor Author

$m ?> myFunName({any-arguments-here})

That's certainly viable, though the number of unfamiliar operators is becoming a bit daunting.

@dnovatchev
Copy link
Contributor

dnovatchev commented Jan 19, 2024

$m ?> myFunName({any-arguments-here})

That's certainly viable, though the number of unfamiliar operators is becoming a bit daunting.

There are nice Unicode symbols like this:

Or this:

Or:

, , ,

We could establish a good business selling keyboards with keys for these 😄

@dnovatchev
Copy link
Contributor

$m ?> myFunName({any-arguments-here})

That's certainly viable, though the number of unfamiliar operators is becoming a bit daunting.

There are nice Unicode symbols like this:

Or this:

Or:

, , ,

We could establish a good business selling keyboards with keys for these 😄

More seriously, going with this eliminates the need to add annotations to the member-functions.

@ChristianGruen
Copy link
Contributor

An advantage of this() or %method would be that the declared function would be bound to the map (i.e., it wouldn’t be possible to invoke it with a map different from the one in which it was declared).

From the implementer’s point of view, the $self argument would certainly be the most trivial option (and it can be an important aspect if we want to have more than just 2 implementations of this feature in the future). What remains is syntactic sugar, i.e., an optional alternative to the existing $map!?f(.) syntax

@dnovatchev
Copy link
Contributor

$m ?> myFunName({any-arguments-here})

That's certainly viable, though the number of unfamiliar operators is becoming a bit daunting.

There are nice Unicode symbols like this:

Or this:

Or:
, , ,
We could establish a good business selling keyboards with keys for these 😄

More seriously, going with this eliminates the need to add annotations to the member-functions.

OK, after 2-days considerations I find the below one the best - it is a very clear visual sign and it is also just 2 ordinary key-strokes:

|>

@dnovatchev
Copy link
Contributor

An advantage of this() or %method would be that the declared function would be bound to the map (i.e., it wouldn’t be possible to invoke it with a map different from the one in which it was declared).

I don't see why this could be an "advantage" - more like unnecessary limitation.
The same function can be used in two or more unrelated objects and this is a good thing - no redundancy - just one update to the function for all using objects - no redundancy issues.

We do have a much simpler alternative now in this discussion and this is significant.

@benibela
Copy link

I do not like this %method annotation

It might hurt the performance if the processor has to check at each lookup if it returns a function and what annotations the function has.

I propose to have a simple syntax for invoking a "member-function" with key myFunName on a map $m like this:

I think that is better

With a syntax, the processor only has to deal with member functions if the user actually wants to use member functions

@michaelhkay
Copy link
Contributor Author

michaelhkay commented Jan 22, 2024

I'm coming to the conclusion that a custom operator is indeed probably the best way to go. While I would love to exploit some of the richness of Unicode, one of the problems is that many of the symbols are not easily distinguishable - that applies in particular to the many arrow shapes; it would be a nightmare for users to find the right one. So I think an ASCII composite symbol is probably prudent. Both "?>" and "|>" seem viable, we can take a majority vote. Another alternative might be middle dot, · (which is harder to write but easier to read):

$rectangle?>resize(2)?>area()
$rectangle|>resize(2)|>area()
$rectangle ?> resize(2) ?> area()
$rectangle |> resize(2) |> area()
$rectangle·resize(2)·area()

@ChristianGruen
Copy link
Contributor

ChristianGruen commented Jan 22, 2024

$rectangle?>resize(2)?>area()
$rectangle|>resize(2)|>area()
$rectangle·resize(2)·area()

If we are convinced that we want to introduce syntactic sugar, …

  • I would suggest using ?>. Its main task is still a lookup (since I expect that the operator will be limited to maps and arrays), i.e., not very different from the existing ? operator.
  • In addition, we should also change =!> to !> (which I would prefer anyway). If we keep =!>, we should use =?> instead of ?>.
  • I wouldn’t be happy with the middle dot. Coincidentally, we already use it by ourselves, but only for a very specific feature for advanced users (which is better typing for Java Bindings).

@michaelhkay Would A !? f(.) and A ?> f() be equivalent, or are there minor semantic differences between the two?

@michaelhkay
Copy link
Contributor Author

Would A !? f(.) and A ?> f() be equivalent

Yes provided that A is a singleton map and f() doesn't take the form f(x, y) where x and y are context-sensitive expressions.

@dnovatchev
Copy link
Contributor

f we are convinced that we want to introduce syntactic sugar, …

  • I would suggest using ?>. Its main task is still a lookup (since I expect that the operator will be limited to maps and arrays), i.e., not very different from the existing ? operator.

I am in favor of |> .

We are using "?" in so many different operators that this places a significant cognitive load on the brain when matching the text to one of the many operators that contain "?".

Also, a "?" intuitively implies doubt/questions which also may intervene with the brain's straight-forward processing, reading and comprehension. When the developer specifies that a certain member-function should be invoked, he should not have any questions/doubts whether or not this is possible - this is more of an order - not a question.

@michaelhkay
Copy link
Contributor Author

This proposal has been superseded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A change that introduces a new feature Tests Needed Tests need to be written or merged XPath An issue related to XPath XQuery An issue related to XQuery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

From Records to Objects
7 participants