Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why force instantiate for constant=True? #287

Closed
jbednar opened this issue Oct 2, 2018 · 15 comments · Fixed by #776
Closed

Why force instantiate for constant=True? #287

jbednar opened this issue Oct 2, 2018 · 15 comments · Fixed by #776
Labels
component: type/value stuff system of parameter type/value checking, inheritnace, etc etc status: discussion Discussion. Not yet a specific feature/bug. Likely to result in multiple PRs/issues. type-bug Bug report
Milestone

Comments

@jbednar
Copy link
Member

jbednar commented Oct 2, 2018

I'm confused by this behavior:

import param

class A(param.Parameterized):
    a = param.Number(2)

aa = A(name="AAA")

class B(param.Parameterized):
    b = param.ObjectSelector(aa)
    c = param.ObjectSelector(aa, readonly=True)
    c = param.ObjectSelector(aa, constant=True)

bb = B(name="BBB")
print(aa, B.b, B.c, bb.b, bb.c)

<A AAA> <A AAA> <A AAA> <A AAA> <A A00004>

I don't see why bb.c differs from any of the other objects shown -- why is it not the specific object that I supplied for parameter b, and instead is a new copy of that object? This was just the source of a very confusing behavior when using ObjectSelector.

The code involved states that constant parameters must be instantiated, but I don't see why that should be true for ObjectSelector, which when instantiate=False (the default) is meant to select between existing objects. Should we start overriding this general constant=True behavior to avoid it for ObjectSelector?

@philippjfr
Copy link
Member

Should we start overriding this general constant=True behavior to avoid it for ObjectSelector?

I definitely think so, as you say it should be selecting between existing objects not making new copies. In fact I would have inspected the validation to fail because the copy is not one of the existing objects.

@ceball
Copy link
Contributor

ceball commented Oct 4, 2018

Hmm, yes, I can see how that's a problem. I can't get back into param justnow while I'm in the middle of the current pyct/nbsite stuff, but I can say that instantiate happens for constant to avoid the normal behavior where setting on the class would change the value on an existing instance. The problem is more to do with instantiate always meaning copy (which was presumably done for other mutable things, such as lists) as opposed to being only about the value existing in instance dict vs parameter.default. So modifying how instantiate works for object selector seems like a good idea to me (or having general support for saying something about copying alongside instantiate, which could then be set appropriately for object selector).

In fact I would have inspected the validation to fail because the copy is not one of the existing objects.

As faras I remember, the default is not validated unless objects are supplied (no objects supplied in example above). Not sure how well tested, though.

@philippjfr
Copy link
Member

As faras I remember, the default is not validated unless objects are supplied (no objects supplied in example above). Not sure how well tested, though.

Yes, thanks, didn't look at the example closely enough.

@ceball ceball removed their assignment Nov 28, 2019
@ceball ceball added component: type/value stuff system of parameter type/value checking, inheritnace, etc etc status: discussion Discussion. Not yet a specific feature/bug. Likely to result in multiple PRs/issues. type-bug Bug report labels Apr 13, 2020
@ceball
Copy link
Contributor

ceball commented Apr 13, 2020

xref #28

@tonyfast tonyfast added this to the 2.0 milestone Aug 24, 2020
@jbednar
Copy link
Member Author

jbednar commented May 12, 2023

This still seems like it needs addressing.

@maximlt
Copy link
Member

maximlt commented Jun 22, 2023

instantiate happens for constant to avoid the normal behavior where setting on the class would change the value on an existing instance

Just some comments to make this issue a little easier to understand. The normal Param behavior when constant=False is that setting on the class changes the value of an existing instance if the value hasn't been set:

import param

class P(param.Parameterized):
    x = param.Parameter(1)

p = P()
assert p.x  == 1

# Changing the class value (Parameter default)
P.x = 10
# affects the instance if the value hasn't yet been set (instantiation or setattr)
assert p.x  == 10

With constant=True, which under the hood means that instantiate=True and creates a deepcopy of default stored in the instance __dict__ (_{attributename}_param_value), one can no longer change the value of an existing instance by setting on the class.

import param

class P(param.Parameterized):
    x = param.Parameter(1, constant=True)

p = P()
assert p.x  == 1

# Changing the value on the class
P.x = 10
# No longer affects the instance
assert p.x  == 1

# New instances
new_p = P()
# obviously get the updated class value
assert new_p.x == 10

We had a discussion about this yesterday. @jbednar and/or @philippjfr could you please chime in and describe further what you believe should be the way forward here?

Out of curiosity I listed the Parameters that have instantiate=True:

List
  HookList
ClassSelector
  Dict
  Array
  DataFrame
  Series

Incidentally, constant=True leading to instantiate=True is currently broken on the main branch as a consequence of the work on Undefined 🙃 I have a fix for that I will push soon.

@jbednar
Copy link
Member Author

jbednar commented Jun 22, 2023

Ok, let's try to sort this out with examples. First, let's get a list of some objects that have identity and internal state:

import param

class A(param.Parameterized):
    a = param.Number(0)

obj1 = A(name="Obj_1", a=1)
obj2 = A(name="Obj_2", a=2)

objects = [obj1, obj2]
objects
image

Here objects is a list of two specific objects, each with a name and with different internal state.

Now let's define a parameter whose values is meant to be either obj1 or obj2, but with no other allowed value:

class B(param.Parameterized):
    c = param.Selector(default=objects[0], objects=objects)
    
bb = B()

My expectation is that B.c will be obj1, which is true:

print(bb.c)
image

So far, so good. What if I want B's parameter c not to be changeable after B is instantiated? If so I declare c to be constant, and indeed I am unable to change it:

class B(param.Parameterized):
    c = param.Selector(default=objects[0], objects=objects, constant=True)
    
bb = B()
bb.c = objects[1]
image

But I still expect that the value of c will always be one of the items in the given list of objects, because that is the fundamental semantics of a Selector -- it selects between different options provided as a list or dictionary. However, that expectation is now violated, because bb.c defaults to a deepcopy of the default value, which is an entirely different object from those in the list, with its own name and its own internal state:

bb = B()
print(bb.c, bb.c in objects)
image

To me, that behavior is not just suprising but entirely wrong, and it prevents constant from being usable when the intent is to select between existing objects.

From the comments in the code, I think what happened here was that someone was trying to ensure that constant parameters were immune to changes in that parameter in a superclass:

class C(param.Parameterized):
    c = param.Selector(default=objects[0], objects=objects, constant=True)

class B(C):
    pass

cc = B()
print(cc.c)

C.c = objects[1]
print(cc.c, objects[1])
image

I.e., c was declared constant and thus is not changeable after B is instantiated, so cc.c shouldn't be affected by a change in C. As you can see, that constraint has been achieved, thanks to instantiate being forced to True. But the fundamental semantics of this parameter has been violated in the process, because as you can see cc.c is not any of the objects from the list; it's a separate, deepcopied impersonator.

I think the upshot is that instantiate should be forced to true for a constant Parameter, as ensured by #771, affirmatively answering the question posed in the title of this issue. But it should not ever do a deepcopy of the default value for a Selector. I am not able to imagine how deepcopying could be appropriate for a Selector, as it violates the semantics of "selecting". So the problem isn't instantiate per se, it's that instantiate does a deepcopy rather than simply assigning the attribute a value in the instance's dictionary.

So, are there cases where deepcopying is appropriate, or was that always a mistake for instantiate? Some of the Parameter types actually force instantiate (and thus deepcopying) not just in the case of constants, but by default, as @maximlt lists above (e.g. param.List):

options = [1, 2, 3, 4]

class E(param.Parameterized):
    e = param.List(options)

ee = E()
print(ee.e)
image
ee.e += [5]
print(ee.e)
print(options)
image

As you can see, the options list has been deepcopied by default because instantiate=True and thus ee.e can diverge from options. If we force instantiate=False, changes to ee.e are confusingly also changes to options:

options = [1, 2, 3, 4]

class E(param.Parameterized):
    e = param.List(options, instantiate=False)

ee = E()
ee.e += [5]
print(ee.e)
image

With that in mind, deepcopying does seem reasonable by default for a non-Selector type like List, Hooklist, Dict, Array, DataFrame, and Series; in each case the Parameter accepts an object with state, and I don't think (unlike with Selectors) there is any implication that it is a specific object with identity. So it seems better to prioritize avoiding surprise by having the state shared as in the List exampleg.

For ClassSelector (the other Parameter type where instantiate=True by default), even though it's a Selector, it's not selecting from a list of existing items, just an item that meets some criterion (that it is from a certain class). So I think deepcopying also makes sense by default for a ClassSelector.

So, where does that leave us? I think it leaves us needing to make a distinction between a non-deepcopy instantiate=True and a deepcopy instantiate=True. I.e., is our instantiation to stop inheritance from the superclass (which is why it is forced to True when constant=True), or is it to stop sharing of internal state (which is why it is forced to True for Parameter types that are typically mutable like Lists)?

What I propose is that we need to separate instantiation from deepcopying. My guess is that the least disruptive way to do that is to leave instantiate largely as it is (i.e., changed to True for List and those other mutable Parameter types), but add a new slot shallowcopy that can force instantiation not to deepcopy. shallowcopy would default to being False, but if we force instantiate to True solely because of constant being true, we'd set shallowcopy to True by default. That way instantiate acts as it always has except for avoiding the bug in this issue plus also allowing someone to set shallowcopy to True explicitly if they want instantiation for some other reason but don't want deepcopying (which is a valid use case not currently covered at all, e.g. for a List parameter that is meant to point to an existing list, not a copied one).

Note that shallowcopy is a bit of a misnomer because it would actually not be a copy at all, just an attribute pointing to the same object; not sure if there is a better word for that. I.e., for shallowcopy=True, the value wouldn't be a copy.deepcopy() but it also wouldn't be a copy.copy(); it would simply be an attribute pointing to the same thing by assignment.

Also note that I called it shallowcopy=True rather than deepcopy=False so that it only needs special behavior when constant=True rather than for all those other Parameter types. I might be confused, but I think that keeps the code simpler!

How does this proposal sound? Did I get confused anywhere in this long chain of reasoning?

@maximlt
Copy link
Member

maximlt commented Jun 23, 2023

The regression that affected the behavior described in this issue was fixed in #771.

@philippjfr
Copy link
Member

philippjfr commented Jun 23, 2023

Thanks for the excellent writeup, I agree with everything you said until the point where you started talking about deepcopy vs shallowcopy. As you point out it's a misnomer, the real question is whether it should copy at all on instantiate, so to me the right name is something like copy_on_instantiate (ugly, I know). The description of how that should behave when setting it explicitly vs when instantiate is set to true automatically still applies.

I would then also disable copy_on_instantiate for non-mutable types such as Number, String, Integer etc. because it was always pointless to copy those.

@maximlt
Copy link
Member

maximlt commented Jun 23, 2023

is there a case where forcing a deepcopy is clearly right for some other Parameter type?

It seems like for Parameters that refers mutable objects, you need a mechanism to make sure you're not going to share the attribute across all instances of the Parameterized class.

Some other libraries, like attrs, allow to provide a factory, that can be leveraged to avoid sharing the default object across instances:

from attrs import define, field

@define
class C:
    a: list = field(factory=list)

C()
# C(a=[])

Param has chosen the opposite approach I think, making it less likely for its users not to shoot themselves in the foot, by automatically deepcopying the default value for these kinds of Parameter (e.g. List, Dict).

class P(param.Parameterized):
    # not shared as instantiate is True by default, to deepcopy the default object when an instance is created
    x = param.List([])
    # shared
    y = param.List([], instantiate=False)

which would be equivalent? to an hypothetical:

class P(param.Parameterized):
    # not shared (would it be called to set the class value?)
    x = param.List(factory=list)
    # shared, assuming instantiate is no longer True by default for List by False
    y = param.List([])

Certainly having a factory attribute (Selector already has compute_default_fn) would be nice to have. Aliasing instantiate to something more understandable and explicit (e.g. deepcopy, deepcopy_default, deepcopy_default_on_init) would also be nice.

@maximlt
Copy link
Member

maximlt commented Jun 23, 2023

I would then also disable copy_on_instantiate for non-mutable types such as Number, String, Integer etc. because it was always pointless to copy those.

instantiate is False by default.

@philippjfr
Copy link
Member

Sure, but if you set constant it's forced to True, right?

@maximlt
Copy link
Member

maximlt commented Jun 23, 2023

Yes!

@jbednar
Copy link
Member Author

jbednar commented Jun 23, 2023

Yes, if you set constant, instantiate is forced to True, but in my proposal, it would be forced to True without a deepcopy, so I don't think we need special handling for mutable types just to cover constants. That's because deepcopying would now apply only when instantiate is set to true by the user or defaulting to True for certain types, which are all (not coincidentally) types that are mutable anyway.

shallowcopy. As you point out it's a misnomer, the real question is whether it should copy at all on instantiate, so to me the right name is something like copy_on_instantiate (ugly, I know).

I think the correct name would be copy_default_on_instantiation; it's specifically about handling the default value.

I would then also disable copy_on_instantiate for non-mutable types such as Number, String, Integer etc. because it was always pointless to copy those.

I'd reformulate this as a question: Should instantiate being set imply deepcopy? Right now, yes. We could either (a) continue doing that, and then people explicitly need to add copy_default_on_instantiation=False if they set instantiate and don't want a deepcopy, or (b) we could change the semantics of instantiate so that it doesn't immediately imply copy_default_on_instantiation, and instead update List and other mutable classes to explicitly enable both instantiate and copy_default_on_instantiation. The difference between (a) and (b) is that with a user who explicitly set instantiate on their own Parameter declaration would find that with proposal (a) their parameters are no longer deepcopied, while with proposal (b) their code would continue to operate as before.

So my guess is that we should go with proposal (a) even though (b) is arguably clearer behavior. (b) makes instantiate and copy_default_on_instantiation completely orthogonal, which is what makes sense to me semantically, i.e. separately deciding whether to put the default value into the object's dictionary and whether to deepcopy when doing so. Those are separate issues and setting them separately is more explicit and I think clearer. But (a) is less disruptive given that previously our docs have stated that the main reason to set instantiate is to get deepcopying:

class Parameter(_ParameterBase):

        instantiate: controls whether the value of this Parameter will
        be deepcopied when a Parameterized object is instantiated (if
        True), or if the single default value will be shared by all
        Parameterized instances (if False). For an immutable Parameter
        value, it is best to leave instantiate at the default of
        False, so that a user can choose to change the value at the
        Parameterized instance level (affecting only that instance) or
        at the Parameterized class or superclass level (affecting all
        existing and future instances of that class or superclass). For
        a mutable Parameter value, the default of False is also appropriate
        if you want all instances to share the same value state, e.g. if
        they are each simply referring to a single global object like
        a singleton. If instead each Parameterized should have its own
        independently mutable value, instantiate should be set to
        True, but note that there is then no simple way to change the
        value of this Parameter at the class or superclass level,
        because each instance, once created, will then have an
        independently instantiated value.

I.e., because our docs previously associated deepcopying with instantiate, my vote is for proposal (a) so that we are simply adding a new, optional way to decouple deepcopying from instantiate, without changing the semantics of what people previously wrote. Anyone want to argue for (b) instead, the purist option that's a breaking change?

@jbednar
Copy link
Member Author

jbednar commented Jun 23, 2023

Certainly having a factory attribute (Selector already has compute_default_fn) would be nice to have. Aliasing instantiate to something more understandable and explicit (e.g. deepcopy, deepcopy_default, deepcopy_default_on_init) would also be nice.

I agree having clearer and better documented and more universal support for factory attributes would be nice to have, but I think that's orthogonal.

I don't think we should alias instantiate to deepcopy because instantiation is an independent concept from deepcopying. Instantiation is about putting an item for the default value into the instance's dictionary. Deepcopying is about whether that item is a deepcopy, or is simply the item, when doing so. So I'm focusing on controlling whether the deepcopying is done or not, and I would strongly disagree with replacing instantiate with an alias deepcopy because then we would somehow still need to provide control over instantiation, which would be deeply (no pun intended) confusing if we treat instantiate as identical in meaning to deepcopy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: type/value stuff system of parameter type/value checking, inheritnace, etc etc status: discussion Discussion. Not yet a specific feature/bug. Likely to result in multiple PRs/issues. type-bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants