use `fill` adjoint from ChainRules #202

DhairyaLGandhi · 2021-10-20T20:08:37Z

There might be reasons to overload it here, but the current implementation would override the generic fill adjoint, which can cause breakages. If you have an MWE, we can write a more targeted adjoint, and I 100% agree that our definitions should not assume numbers as eltypes of arrays. Presently, this causes SciML/NeuralPDE.jl#412

cc @ChrisRackauckas

ChrisRackauckas

At least, this isn't where this definition should live.

DhairyaLGandhi · 2021-10-20T20:13:06Z

Agreed, this also assumes pullback(x, dims::Int...) ... end and fails if one passes dims::Tuple. That's basically what caused the NeuralPDE breakage.

devmotion · 2021-10-20T20:15:27Z

Haha yeah this whole package is just full of type piracy and hacks that were accumulated over time. The more we can remove or move to the packages it actually belongs to, the better.

Let's run the tests and see if it breaks something (fingers crossed that they actually pass currently, it seems recent Zygote upgrades broke tests in DynamicPPL...).

DhairyaLGandhi · 2021-10-20T20:19:54Z

I would suggest fixing the errant adjoint anyway since it can't handle many other cases of interest. xref JuliaDiff/ChainRules.jl#537 so I suspect the tests wouldn't pass. We can of course handle it with a different rule.

devmotion · 2021-10-20T20:29:49Z

Even though I agree it really should not be part of DistributionsAD we can't just remove it if this in turn breaks Turing or other downstream packages.

Agreed, this also assumes pullback(x, dims::Int...) ... end and fails if one passes dims::Tuple. That's basically what caused the NeuralPDE breakage.

Then I guess a temporary workaround would be to just fix the dims::Tuple case and handle it correctly in the adjoint.

devmotion · 2021-10-20T20:59:55Z

So it seems some recent changes (last tests on master passed end of August with ReverseDiff and Tracker) in some upstream dependencies broke Tracker and ReverseDiff support. However, also Zygote errors, before tests were aborted mainly due to errors such as https://github.com/TuringLang/DistributionsAD.jl/pull/202/checks?check_run_id=3956438041#step:5:364 when testing arrays of distributions which potentially could be caused by the removal of the fill adjoint.

ChrisRackauckas · 2021-10-21T11:45:12Z

Then I guess a temporary workaround would be to just fix the dims::Tuple case and handle it correctly in the adjoint.

A better thing would be to narrow this dispatch. Which case is it actually fitting? Making it Any is clearly incorrect.

devmotion · 2021-10-21T11:53:52Z

I assume it can be restricted to e.g. fill(d::Distribution, dims::Int, dims2::Int...) and possibly fill(d::Distribution, dims::Tuple{Int,Vararg{Int}}) (if we have to handle tuples) since it is used mainly to make AD work with the filldist and arraydist product distributions. This is still type piracy and should not exist here but at least better than the current implementation. It was added in #19 originally.

mohamed82008 · 2021-10-21T12:01:32Z

This adjoint wasn't exactly one of my finest works. I agree it was a horrible idea in retrospect.

ChrisRackauckas · 2021-10-21T12:03:12Z

I think restrict it and merge, and then upstream the fix later

devmotion · 2021-10-21T13:00:34Z

I wonder if the adjoint is only needed in the tests due to

DistributionsAD.jl/test/ad/distributions.jl

Line 477 in 44a57e9

f_arraydist = (θ...,) -> arraydist(fill(d.f(θ...), n...))

and similar lines. I can't find any occurrences of fill(::Distribution, dims...) in the package, filldist always uses FillArrays.Fill and the implementation of arraydist does not use fill (unsurprisingly). So maybe we can just move the adjoint to the tests? Should still improve it a bit and only cover Distributions (if it's only needed in the tests we don't have to handle Tuples).

ChrisRackauckas · 2021-10-21T14:01:51Z

I think adding it to the tests is good. SciML/NeuralPDE.jl#412 is showing pretty ample evidence that this adjoint is pretty breaking downstream, so its removal is at least not bad. @DhairyaLGandhi update the PR?

mcabbott · 2021-10-21T14:16:25Z

f_arraydist = (θ...,) -> arraydist(fill(d.f(θ...), n...))

Is there a MWE of the problem this causes? I got lost in the tests here.

(I see that CI complains with this PR that some results are wrong, but they don't give errors.)

If the definition in ChainRules is not correct, then we should fix it, as it may cause problems we haven't thought of elsewhere.

devmotion · 2021-10-21T14:24:39Z

There's already an issue regarding fill with non-numbers: JuliaDiff/ChainRules.jl#537

Yeah, it's a bit unclear currently what test failures are actually caused by this PR and what by changes in upstream packages such as Zygote, ChainRules, ReverseDiff, Distributions etc. since also some Tracker and ReverseDiff tests error that passed on the master branch the last time they were run. I guess I complain too much in this issue and about AD in general lately but this package and in general AD support is just a mess and immensely time consuming to maintain. Nothing changed in this package but now different things are broken 🤷

ChrisRackauckas · 2021-10-21T14:27:55Z

IMO, the tests here should move to ChainRules, or it should get downstream tested (@oxinabox). These are all pretty core and it should be an issue if they are broken, and the solution shouldn't be type piracy fixes. We can slowly move to fix all of that, but for now can we at least remove the known incorrect adjoint 😅.

DhairyaLGandhi · 2021-10-21T14:34:28Z

Moving it to tests would still define it in the tests, so it could make the DistributionsAD tests brittle indirectly. Either way, I want to see what CI says. I don't know if downstream testing is sufficient (better to do it than not, for sure). @devmotion would you mind triggering CI

devmotion · 2021-10-21T14:34:43Z

There's only a single ChainRules definition left, everything else I already moved to Distributions and StatsFuns. Therefore I don't think ChainRules can help with running tests. The main problem are

the fixes and workarounds for Zygote, Tracker, and ReverseDiff which should be moved to the respective packages if they are needed and useful and
the alternative AD-friendlier distributions such as TuringUniform, TuringMvNormal etc. which ideally should not be needed if the originals in Distributions are made AD-friendlier (currently used to intercept calls to e.g. MvNormal or Uniform which is another source of type piracy).

devmotion · 2021-10-21T14:39:05Z

Either way, I want to see what CI says.

It will still fail, I checked it locally some minutes ago.

devmotion · 2021-10-21T14:42:44Z

Closed in favour of #203 which contains some additional fixes that seem sufficient for tests to pass locally.

DhairyaLGandhi · 2021-10-21T14:43:38Z

Okay then the answer is to use a restricted definition. +1 to make the regular distributions AD-able.

I agree that our adjoints should not be assuming too much about what specific arguments are passed to them.

use ChainRules fill

c491ef0

DhairyaLGandhi changed the title ~~use fill from ChainRules~~ use fill adjoint from ChainRules Oct 20, 2021

ChrisRackauckas approved these changes Oct 20, 2021

View reviewed changes

DhairyaLGandhi mentioned this pull request Oct 20, 2021

Neural adapter test is broken SciML/NeuralPDE.jl#412

Closed

re-add fill adjoint

aad188b

devmotion mentioned this pull request Oct 21, 2021

Remove adjoint for fill and fix tests #203

Merged

devmotion closed this Oct 21, 2021

DhairyaLGandhi deleted the dg/neuraladapter branch October 21, 2021 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use `fill` adjoint from ChainRules #202

use `fill` adjoint from ChainRules #202

DhairyaLGandhi commented Oct 20, 2021

ChrisRackauckas left a comment

DhairyaLGandhi commented Oct 20, 2021

devmotion commented Oct 20, 2021

DhairyaLGandhi commented Oct 20, 2021

devmotion commented Oct 20, 2021

devmotion commented Oct 20, 2021

ChrisRackauckas commented Oct 21, 2021

devmotion commented Oct 21, 2021

mohamed82008 commented Oct 21, 2021

ChrisRackauckas commented Oct 21, 2021

devmotion commented Oct 21, 2021

ChrisRackauckas commented Oct 21, 2021

mcabbott commented Oct 21, 2021

devmotion commented Oct 21, 2021

ChrisRackauckas commented Oct 21, 2021 •

edited

Loading

DhairyaLGandhi commented Oct 21, 2021

devmotion commented Oct 21, 2021 •

edited

Loading

devmotion commented Oct 21, 2021

devmotion commented Oct 21, 2021

DhairyaLGandhi commented Oct 21, 2021

use fill adjoint from ChainRules #202

use fill adjoint from ChainRules #202

Conversation

DhairyaLGandhi commented Oct 20, 2021

ChrisRackauckas left a comment

Choose a reason for hiding this comment

DhairyaLGandhi commented Oct 20, 2021

devmotion commented Oct 20, 2021

DhairyaLGandhi commented Oct 20, 2021

devmotion commented Oct 20, 2021

devmotion commented Oct 20, 2021

ChrisRackauckas commented Oct 21, 2021

devmotion commented Oct 21, 2021

mohamed82008 commented Oct 21, 2021

ChrisRackauckas commented Oct 21, 2021

devmotion commented Oct 21, 2021

ChrisRackauckas commented Oct 21, 2021

mcabbott commented Oct 21, 2021

devmotion commented Oct 21, 2021

ChrisRackauckas commented Oct 21, 2021 • edited Loading

DhairyaLGandhi commented Oct 21, 2021

devmotion commented Oct 21, 2021 • edited Loading

devmotion commented Oct 21, 2021

devmotion commented Oct 21, 2021

DhairyaLGandhi commented Oct 21, 2021

use `fill` adjoint from ChainRules #202

use `fill` adjoint from ChainRules #202

ChrisRackauckas commented Oct 21, 2021 •

edited

Loading

devmotion commented Oct 21, 2021 •

edited

Loading