This package implements a set of functions for transforming constrained random variables (e.g. simplexes, intervals) to Euclidean space. The 3 main functions implemented in this package are the link
, invlink
and logpdf_with_trans
for a number of distributions. The distributions supported are:
RealDistribution
:Union{Cauchy, Gumbel, Laplace, Logistic, NoncentralT, Normal, NormalCanon, TDist}
,PositiveDistribution
:Union{BetaPrime, Chi, Chisq, Erlang, Exponential, FDist, Frechet, Gamma, InverseGamma, InverseGaussian, Kolmogorov, LogNormal, NoncentralChisq, NoncentralF, Rayleigh, Weibull}
,UnitDistribution
:Union{Beta, KSOneSided, NoncentralBeta}
,SimplexDistribution
:Union{Dirichlet}
,PDMatDistribution
:Union{InverseWishart, Wishart}
, andTransformDistribution
:Union{T, Truncated{T}} where T<:ContinuousUnivariateDistribution
.
All exported names from the Distributions.jl package are reexported from Bijectors
.
Bijectors.jl also provides a nice interface for working with these maps: composition, inversion, etc. The following table lists mathematical operations for a bijector and the corresponding code in Bijectors.jl.
Operation | Method | Automatic |
---|---|---|
b ↦ b⁻¹ |
inv(b) |
✓ |
(b₁, b₂) ↦ (b₁ ∘ b₂) |
b₁ ∘ b₂ |
✓ |
(b₁, b₂) ↦ [b₁, b₂] |
stack(b₁, b₂) |
✓ |
x ↦ b(x) |
b(x) |
× |
y ↦ b⁻¹(y) |
inv(b)(y) |
× |
x ↦ log|det J(b, x)| |
logabsdetjac(b, x) |
AD |
x ↦ b(x), log|det J(b, x)| |
forward(b, x) |
✓ |
p ↦ q := b_* p |
q = transformed(p, b) |
✓ |
y ∼ q |
y = rand(q) |
✓ |
p ↦ b such that support(b_* p) = ℝᵈ |
bijector(p) |
✓ |
(x ∼ p, b(x), log|det J(b, x)|, log q(y)) |
forward(q) |
✓ |
In this table, b
denotes a Bijector
, J(b, x)
denotes the jacobian of b
evaluated at x
, b_*
denotes the push-forward of p
by b
, and x ∼ p
denotes x
sampled from the distribution with density p
.
The "Automatic" column in the table refers to whether or not you are required to implement the feature for a custom Bijector
. "AD" refers to the fact that it can be implemented "automatically" using automatic differentiation, i.e. ADBijector
.
link
: maps a sample of a random distributiondist
from its support to a value in R^n. Example:
julia> using Bijectors
julia> dist = Beta(2, 2)
Beta{Float64}(α=2.0, β=2.0)
julia> x = rand(dist)
0.7472542331020509
julia> y = link(dist, x)
1.084021356473311
invlink
: the inverse of thelink
function. Example:
julia> z = invlink(dist, y)
0.7472542331020509
julia> x ≈ z
true
logpdf_with_trans
: findslog
of the (transformed) probability density function of a distributiondist
at a samplex
. Example:
julia> using Bijectors
julia> dist = Dirichlet(2, 3)
Dirichlet{Float64}(alpha=[3.0, 3.0])
julia> x = rand(dist)
2-element Array{Float64,1}:
0.46094823621110165
0.5390517637888984
julia> logpdf_with_trans(dist, x, false) # ignoring the transformation
0.6163709733893024
julia> logpdf_with_trans(dist, x, true) # considering the transformation
-0.7760422307471244
A Bijector
is a differentiable bijection with a differentiable inverse. That's basically it.
The primary application of Bijector
s is the (very profitable) business of transforming (usually continuous) probability densities. If we transfrom a random variable x ~ p(x)
to y = b(x)
where b
is a Bijector
, we also get a canonical density q(y) = p(b⁻¹(y)) |det J(b⁻¹, y)|
for y
. Here J(b⁻¹, y)
is the jacobian of the inverse transform evaluated at y
. q
is also known as the push-forward of p
by b
in measure theory.
There's plenty of different reasons why one would want to do something like this. It can be because your p
has non-zero probability (support) on a closed interval [a, b]
and you want to use AD without having to worry about reaching the boundary. E.g. Beta
has support [0, 1]
so if we could transform p = Beta
into a density q
with support on ℝ, we could instead compute the derivative of logpdf(q, y)
wrt. y
, and then transform back x = b⁻¹(y)
. This is very useful for certain inference methods, e.g. Hamiltonian Monte-Carlo, where we need to take the derivative of the logpdf-computation wrt. input.
Another use-case is constructing a parameterized Bijector
and consider transforming a "simple" density, e.g. MvNormal
, to match a more complex density. One class of such bijectors is Normalizing Flows (NFs) which are compositions of differentiable and invertible neural networks, i.e. composition of a particular family of parameterized bijectors.[1] We'll see an example of this later on.
Other than the logpdf_with_trans
methods, the package also provides a more composable interface through the Bijector
types. Consider for example the one from above with Beta(2, 2)
.
julia> using Random; Random.seed!(42);
julia> using Bijectors; using Bijectors: Logit
julia> dist = Beta(2, 2)
Beta{Float64}(α=2.0, β=2.0)
julia> x = rand(dist)
0.36888689965963756
julia> b = bijector(dist) # bijection (0, 1) → ℝ
Logit{Float64}(0.0, 1.0)
julia> y = b(x)
-0.5369949942509267
In this case we see that bijector(d::Distribution)
returns the corresponding constrained-to-unconstrained bijection for Beta
, which indeed is a Logit
with a = 0.0
and b = 1.0
. The resulting Logit <: Bijector
has a method (b::Logit)(x)
defined, allowing us to call it just like any other function. Comparing with the above example, b(x) ≈ link(dist, x)
. Just to convince ourselves:
julia> b(x) ≈ link(dist, x)
true
What about invlink
?
julia> b⁻¹ = inv(b)
Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0))
julia> b⁻¹(y)
0.3688868996596376
julia> b⁻¹(y) ≈ invlink(dist, y)
true
Pretty neat, huh? Inverse{Logit}
is also a Bijector
where we've defined (ib::Inverse{<:Logit})(y)
as the inverse transformation of (b::Logit)(x)
. Note that it's not always the case that inv(b) isa Inverse
, e.g. the inverse of Exp
is simply Log
so inv(Exp()) isa Log
is true.
One more thing. See the 0
in Inverse{Logit{Float64}, 0}
? It represents the dimensionality of the bijector, in the same sense as for an AbstractArray
with the exception of 0
which means it expects 0-dim input and output, i.e. <:Real
. This can also be accessed through dimension(b)
:
julia> Bijectors.dimension(b)
0
julia> Bijectors.dimension(Exp{1}())
1
In most cases specification of the dimensionality is unnecessary as a Bijector{N}
is usually only defined for a particular value of N
, e.g. Logit isa Bijector{0}
since it only makes sense to apply Logit
to a real number (or a vector of reals if you're doing batch-computation). As a user, you'll rarely have to deal with this dimensionality specification. Unfortunately there are exceptions, e.g. Exp
which can be applied to both real numbers and a vector of real numbers, in both cases treating it as a single input. This means that when Exp
receives a vector input x
as input, it's ambiguous whether or not to treat x
as a batch of 0-dim inputs or as a single 1-dim input. As a result, to support batch-computation it is necessary to know the expected dimensionality of the input and output. Notice that we assume the dimensionality of the input and output to be the same. This is a reasonable assumption considering we're working with bijections.
Also, we can compose bijectors:
julia> id_y = (b ∘ b⁻¹)
Composed{Tuple{Inverse{Logit{Float64},0},Logit{Float64}},0}((Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0)), Logit{Float64}(0.0, 1.0)))
julia> id_y(y) ≈ y
true
And since Composed isa Bijector
:
julia> id_x = inv(id_y)
Composed{Tuple{Inverse{Logit{Float64},0},Logit{Float64}},0}((Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0)), Logit{Float64}(0.0, 1.0)))
julia> id_x(x) ≈ x
true
This far we've seen that we can replicate the functionality provided by link
and invlink
. To replicate logpdf_with_trans
we instead provide a TransformedDistribution <: Distribution
implementing the Distribution
interface from Distributions.jl:
julia> using Bijectors: TransformedDistribution
julia> td = transformed(dist)
TransformedDistribution{Beta{Float64},Logit{Float64},Univariate}(
dist: Beta{Float64}(α=2.0, β=2.0)
transform: Logit{Float64}(0.0, 1.0)
)
julia> td isa UnivariateDistribution
true
julia> logpdf(td, y)
-1.123311289915276
julia> logpdf_with_trans(dist, x, true)
-1.123311289915276
When computing logpdf(td, y)
where td
is the transformed distribution corresponding to Beta(2, 2)
, it makes more semantic sense to compute the pdf of the transformed variable y
rather than using the "un-transformed" variable x
to do so, as we do in logpdf_with_trans
. With that being said, we can also do
julia> logpdf_forward(td, x)
-1.123311289915276
In the computation of both logpdf
and logpdf_forward
we need to compute log(abs(det(jacobian(inv(b), y))))
and log(abs(det(jacobian(b, x))))
, respectively. This computation is available using the logabsdetjac
method
julia> logabsdetjac(b⁻¹, y)
-1.4575353795716655
julia> logabsdetjac(b, x)
1.4575353795716655
Notice that
julia> logabsdetjac(b, x) ≈ -logabsdetjac(b⁻¹, y)
true
which is always the case for a differentiable bijection with differentiable inverse. Therefore if you want to compute logabsdetjac(b⁻¹, y)
and we know that logabsdetjac(b, b⁻¹(y))
is actually more efficient, we'll return -logabsdetjac(b, b⁻¹(y))
instead. For some bijectors it might be easy to compute, say, the forward pass b(x)
, but expensive to compute b⁻¹(y)
. Because of this you might want to avoid doing anything "backwards", i.e. using b⁻¹
. This is where forward
comes to good use:
julia> forward(b, x)
(rv = -0.5369949942509267, logabsdetjac = 1.4575353795716655)
Similarily
julia> forward(inv(b), y)
(rv = 0.3688868996596376, logabsdetjac = -1.4575353795716655)
In fact, the purpose of forward
is to just do the right thing, not necessarily "forward". In this function we'll have access to both the original value x
and the transformed value y
, so we can compute logabsdetjac(b, x)
in either direction. Furthermore, in a lot of cases we can re-use a lot of the computation from b(x)
in the computation of logabsdetjac(b, x)
, or vice-versa. forward(b, x)
will take advantage of such opportunities (if implemented).
At this point we've only shown that we can replicate the existing functionality. But we said TransformedDistribution isa Distribution
, so we also have rand
:
julia> y = rand(td) # ∈ ℝ
0.999166054552483
julia> x = inv(td.transform)(y) # transform back to interval [0, 1]
0.7308945834125756
This can be quite convenient if you have computations assuming input to be on the real line.
But the real utility of TransformedDistribution
becomes more apparent when using transformed(dist, b)
for any bijector b
. To get the transformed distribution corresponding to the Beta(2, 2)
, we called transformed(dist)
before. This is simply an alias for transformed(dist, bijector(dist))
. Remember bijector(dist)
returns the constrained-to-constrained bijector for that particular Distribution
. But we can of course construct a TransformedDistribution
using different bijectors with the same dist
. This is particularly useful in something called Automatic Differentiation Variational Inference (ADVI).[2] An important part of ADVI is to approximate a constrained distribution, e.g. Beta
, as follows:
- Sample
x
from aNormal
with parametersμ
andσ
, i.e.x ~ Normal(μ, σ)
. - Transform
x
toy
s.t.y ∈ support(Beta)
, with the transform being a differentiable bijection with a differentiable inverse (a "bijector")
This then defines a probability density with same support as Beta
! Of course, it's unlikely that it will be the same density, but it's an approximation. Creating such a distribution becomes trivial with Bijector
and TransformedDistribution
:
julia> dist = Beta(2, 2)
Beta{Float64}(α=2.0, β=2.0)
julia> b = bijector(dist) # (0, 1) → ℝ
Logit{Float64}(0.0, 1.0)
julia> b⁻¹ = inv(b) # ℝ → (0, 1)
Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0))
julia> td = transformed(Normal(), b⁻¹) # x ∼ 𝓝(0, 1) then b(x) ∈ (0, 1)
TransformedDistribution{Normal{Float64},Inverse{Logit{Float64},0},Univariate}(
dist: Normal{Float64}(μ=0.0, σ=1.0)
transform: Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0))
)
julia> x = rand(td) # ∈ (0, 1)
0.538956748141868
It's worth noting that support(Beta)
is the closed interval [0, 1]
, while the constrained-to-unconstrained bijection, Logit
in this case, is only well-defined as a map (0, 1) → ℝ
for the open interval (0, 1)
. This is of course not an implementation detail. ℝ
is itself open, thus no continuous bijection exists from a closed interval to ℝ
. But since the boundaries of a closed interval has what's known as measure zero, this doesn't end up affecting the resulting density with support on the entire real line. In practice, this means that
td = transformed(Beta())
inv(td.transform)(rand(td))
will never result in 0
or 1
though any sample arbitrarily close to either 0
or 1
is possible. Disclaimer: numerical accuracy is limited, so you might still see 0
and 1
if you're lucky.
We can also do multivariate ADVI using the Stacked
bijector. Stacked
gives us a way to combine univariate and/or multivariate bijectors into a singe multivariate bijector. Say you have a vector x
of length 2 and you want to transform the first entry using Exp
and the second entry using Log
. Stacked
gives you an easy and efficient way of representing such a bijector.
julia> Random.seed!(42);
julia> using Bijectors: Exp, Log, SimplexBijector
julia> # Original distributions
dists = (
Beta(),
InverseGamma(),
Dirichlet(2, 3)
);
julia> # Construct the corresponding ranges
ranges = [];
julia> idx = 1;
julia> for i = 1:length(dists)
d = dists[i]
push!(ranges, idx:idx + length(d) - 1)
global idx
idx += length(d)
end;
julia> ranges
3-element Array{Any,1}:
1:1
2:2
3:4
julia> # Base distribution; mean-field normal
num_params = ranges[end][end]
4
julia> d = MvNormal(zeros(num_params), ones(num_params))
DiagNormal(
dim: 4
μ: [0.0, 0.0, 0.0, 0.0]
Σ: [1.0 0.0 0.0 0.0; 0.0 1.0 0.0 0.0; 0.0 0.0 1.0 0.0; 0.0 0.0 0.0 1.0]
)
julia> # Construct the transform
bs = bijector.(dists) # constrained-to-unconstrained bijectors for dists
(Logit{Float64}(0.0, 1.0), Log{0}(), SimplexBijector{true}())
julia> ibs = inv.(bs) # invert, so we get unconstrained-to-constrained
(Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0)), Exp{0}(), Inverse{SimplexBijector{true},1}(SimplexBijector{true}()))
julia> sb = Stacked(ibs, ranges) # => Stacked <: Bijector
Stacked{Tuple{Inverse{Logit{Float64},0},Exp{0},Inverse{SimplexBijector{true},1}},3}((Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0)), Exp{0}(), Inverse{SimplexBijector{true},1}(SimplexBijector{true}())), (1:1, 2:2, 3:4))
julia> # Mean-field normal with unconstrained-to-constrained stacked bijector
td = transformed(d, sb);
julia> y = rand(td)
4-element Array{Float64,1}:
0.36446726136766217
0.6412195576273355
0.5067884173521743
0.4932115826478257
julia> 0.0 ≤ y[1] ≤ 1.0 # => true
true
julia> 0.0 < y[2] # => true
true
julia> sum(y[3:4]) ≈ 1.0 # => true
true
A very interesting application is that of normalizing flows.[1] Usually this is done by sampling from a multivariate normal distribution, and then transforming this to a target distribution using invertible neural networks. Currently there are two such transforms available in Bijectors.jl: PlanarFlow
and RadialFlow
. Let's create a flow with a single PlanarLayer
:
julia> d = MvNormal(zeros(2), ones(2));
julia> b = PlanarLayer(2)
PlanarLayer{Array{Float64,2},Array{Float64,1}}([1.77786; -1.1449], [-0.468606; 0.156143], [-2.64199])
julia> flow = transformed(d, b)
TransformedDistribution{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}},PlanarLayer{Array{Float64,2},Array{Float64,1}},Multivariate}(
dist: DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
transform: PlanarLayer{Array{Float64,2},Array{Float64,1}}([1.77786; -1.1449], [-0.468606; 0.156143], [-2.64199])
)
julia> flow isa MultivariateDistribution
true
That's it. Now we can sample from it using rand
and compute the logpdf
, like any other Distribution
.
julia> y = rand(flow)
2-element Array{Float64,1}:
1.3337915588180933
1.010861989639227
julia> logpdf(flow, y) # uses inverse of `b`; not very efficient for `PlanarFlow` and not 100% accurate
-2.8996106373788293
julia> x = rand(flow.dist)
2-element Array{Float64,1}:
0.18702790710363
0.5181487878771377
julia> logpdf_forward(flow, x) # more efficent and accurate
-1.9813114667203335
Similarily to the multivariate ADVI example, we could use Stacked
to get a bounded flow:
julia> d = MvNormal(zeros(2), ones(2));
julia> ibs = inv.(bijector.((InverseGamma(2, 3), Beta())));
julia> sb = stack(ibs...) # == Stacked(ibs) == Stacked(ibs, [i:i for i = 1:length(ibs)]
Stacked{Tuple{Exp{0},Inverse{Logit{Float64},0}},2}((Exp{0}(), Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0))), (1:1, 2:2))
julia> b = sb ∘ PlanarLayer(2)
Composed{Tuple{PlanarLayer{Array{Float64,2},Array{Float64,1}},Stacked{Tuple{Exp{0},Inverse{Logit{Float64},0}},2}},1}((PlanarLayer{Array{Float64,2},Array{Float64,1}}([1.49138; 0.367563], [-0.886205; 0.684565], [-1.59058]), Stacked{Tuple{Exp{0},Inverse{Logit{Float64},0}},2}((Exp{0}(), Inverse{Logit{Float64},0}(Logit{Float64}(0.0, 1.0))), (1:1, 2:2))))
julia> td = transformed(d, b);
julia> y = rand(td)
2-element Array{Float64,1}:
2.6493626783431035
0.1833391433092443
julia> 0 < y[1]
true
julia> 0 ≤ y[2] ≤ 1
true
Want to fit the flow?
julia> using Tracker
julia> b = PlanarLayer(2, param) # construct parameters using `param`
PlanarLayer{TrackedArray{…,Array{Float64,2}},TrackedArray{…,Array{Float64,1}}}([-1.05099; 0.502079] (tracked), [-0.216248; -0.706424] (tracked), [-4.33747] (tracked))
julia> flow = transformed(d, b)
TransformedDistribution{MvNormal{Float64,PDMats.PDiagMat{Float64,Array{Float64,1}},Array{Float64,1}},PlanarLayer{TrackedArray{…,Array{Float64,2}},TrackedArray{…,Array{Float64,1}}},Multivariate}(
dist: DiagNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.0; 0.0 1.0]
)
transform: PlanarLayer{TrackedArray{…,Array{Float64,2}},TrackedArray{…,Array{Float64,1}}}([-1.05099; 0.502079] (tracked), [-0.216248; -0.706424] (tracked), [-4.33747] (tracked))
)
julia> rand(flow)
Tracked 2-element Array{Float64,1}:
0.5992818950827451
-0.6264187818605164
julia> x = rand(flow.dist)
2-element Array{Float64,1}:
-0.37240087577993225
0.36901028455183293
julia> Tracker.back!(logpdf_forward(flow, x), 1.0) # backprob
julia> Tracker.grad(b.w)
2×1 Array{Float64,2}:
-0.00037431072968105417
0.0013039074681623036
We can easily create more complex flows by simply doing PlanarFlow(10) ∘ PlanarFlow(10) ∘ RadialFlow(10)
and so on.
In those cases, it might be useful to use Flux.jl's treelike
to extract the parameters:
julia> using Flux
julia> @Flux.treelike Composed
julia> @Flux.treelike TransformedDistribution
julia> @Flux.treelike PlanarLayer
julia> Flux.params(flow)
Params([[-1.05099; 0.502079] (tracked), [-0.216248; -0.706424] (tracked), [-4.33747] (tracked)])
Though we might just do this for you in the future, so then all you'll have to do is call Flux.params
.
Another useful function is the forward(d::Distribution)
method. It is similar to forward(b::Bijector)
in the sense that it does a forward pass of the entire process "sample then transform" and returns all the most useful quantities in process using the most efficent computation path.
julia> x, y, logjac, logpdf_y = forward(flow) # sample + transform and returns all the useful quantities in one pass
(x = [-0.839739, 0.169613], y = [-0.810354, 0.963392] (tracked), logabsdetjac = -0.0017416108706436628 (tracked), logpdf = -2.203100286792651 (tracked))
This method is for example useful when computing quantities such as the expected lower bound (ELBO) between this transformed distribution and some other joint density. If no analytical expression is available, we have to approximate the ELBO by a Monte Carlo estimate. But one term in the ELBO is the entropy of the base density, which we do know analytically in this case. Using the analytical expression for the entropy and then using a monte carlo estimate for the rest of the terms in the ELBO gives an estimate with lower variance than if we used the monte carlo estimate for the entire expectation.
There's mainly two ways you can implement your own Bijector
, and which way you choose mainly depends on the following question: are you bothered enough to manually implement logabsdetjac
? If the answer is "Yup!", then you subtype from Bijector
, if "Naaaah" then you subtype ADBijector
.
Here's a simple example taken from the source code, the Identity
:
import Bijectors: logabsdetjac
struct Identity{N} <: Bijector{N} end
(::Identity)(x) = x # transform itself, "forward"
(::Inverse{<: Identity})(y) = y # inverse tramsform, "backward"
# see the proper implementation for `logabsdetjac` in general
logabsdetjac(::Identity{0}, y::Real) = zero(eltype(y)) # ∂ₓid(x) = ∂ₓ x = 1 → log(abs(1)) = log(1) = 0
A slightly more complex example is Logit
:
using StatsFuns: logit, logistic
struct Logit{T<:Real} <: Bijector{0}
a::T
b::T
end
(b::Logit)(x::Real) = logit((x - b.a) / (b.b - b.a))
(b::Logit)(x) = map(b, x)
# `orig` contains the `Bijector` which was inverted
(ib::Inverse{<:Logit})(y::Real) = (ib.orig.b - ib.orig.a) * logistic(y) + ib.orig.a
(ib::Inverse{<:Logit})(y) = map(ib, y)
logabsdetjac(b::Logit, x::Real) = - log((x - b.a) * (b.b - x) / (b.b - b.a))
logabsdetjac(b::Logit, x) = map(logabsdetjac, x)
(Batch computation is not fully supported by all bijectors yet (see issue #35), but is actively worked on. In the particular case of Logit
there's only one thing that makes sense, which is elementwise application. Therefore we've added @.
to the implementation above, thus this works for any AbstractArray{<:Real}
.)
Then
julia> b = Logit(0.0, 1.0)
Logit{Float64}(0.0, 1.0)
julia> b(0.6)
0.4054651081081642
julia> inv(b)(y)
Tracked 2-element Array{Float64,1}:
0.3078149833748082
0.72380041667891
julia> logabsdetjac(b, 0.6)
1.4271163556401458
julia> logabsdetjac(inv(b), y) # defaults to `- logabsdetjac(b, inv(b)(x))`
Tracked 2-element Array{Float64,1}:
-1.546158373866469
-1.6098711387913573
julia> forward(b, 0.6) # defaults to `(rv=b(x), logabsdetjac=logabsdetjac(b, x))`
(rv = 0.4054651081081642, logabsdetjac = 1.4271163556401458)
For further efficiency, one could manually implement forward(b::Logit, x)
:
julia> import Bijectors: forward, Logit
julia> function forward(b::Logit{<:Real}, x)
totally_worth_saving = @. (x - b.a) / (b.b - b.a) # spoiler: it's probably not
y = logit.(totally_worth_saving)
logjac = @. - log((b.b - x) * totally_worth_saving)
return (rv=y, logabsdetjac = logjac)
end
forward (generic function with 16 methods)
julia> forward(b, 0.6)
(rv = 0.4054651081081642, logabsdetjac = 1.4271163556401458)
julia> @which forward(b, 0.6)
forward(b::Logit{#s4} where #s4<:Real, x) in Main at REPL[43]:2
As you can see it's a very contrived example, but you get the idea.
We could also have implemented Logit
as an ADBijector
:
using StatsFuns: logit, logistic
using Bijectors: ADBackend
struct ADLogit{T, AD} <: ADBijector{AD, 0}
a::T
b::T
end
# ADBackend() returns ForwardDiffAD, which means we use ForwardDiff.jl for AD
ADLogit(a::T, b::T) where {T<:Real} = ADLogit{T, ADBackend()}(a, b)
(b::ADLogit)(x) = @. logit((x - b.a) / (b.b - b.a))
(ib::Inverse{<:ADLogit{<:Real}})(y) = @. (ib.orig.b - ib.orig.a) * logistic(y) + ib.orig.a
No implementation of logabsdetjac
, but:
julia> b_ad = ADLogit(0.0, 1.0)
ADLogit{Float64,Bijectors.ForwardDiffAD}(0.0, 1.0)
julia> logabsdetjac(b_ad, 0.6)
1.4271163556401458
julia> y = b_ad(0.6)
0.4054651081081642
julia> inv(b_ad)(y)
0.6
julia> logabsdetjac(inv(b_ad), y)
-1.4271163556401458
Neat! And just to verify that everything works:
julia> b = Logit(0.0, 1.0)
Logit{Float64}(0.0, 1.0)
julia> logabsdetjac(b, 0.6)
1.4271163556401458
julia> logabsdetjac(b_ad, 0.6) ≈ logabsdetjac(b, 0.6)
true
We can also use Tracker.jl for the AD, rather than ForwardDiff.jl:
julia> Bijectors.setadbackend(:reversediff)
:reversediff
julia> b_ad = ADLogit(0.0, 1.0)
ADLogit{Float64,Bijectors.TrackerAD}(0.0, 1.0)
julia> logabsdetjac(b_ad, 0.6)
1.4271163556401458
Most of the methods and types mention below will have docstrings with more elaborate explanation and examples, e.g.
help?> Bijectors.Composed
Composed(ts::A)
∘(b1::Bijector{N}, b2::Bijector{N})::Composed{<:Tuple}
composel(ts::Bijector{N}...)::Composed{<:Tuple}
composer(ts::Bijector{N}...)::Composed{<:Tuple}
where A refers to either
• Tuple{Vararg{<:Bijector{N}}}: a tuple of bijectors of dimensionality N
• AbstractArray{<:Bijector{N}}: an array of bijectors of dimensionality N
A Bijector representing composition of bijectors. composel and composer results in a Composed for which application occurs from left-to-right and right-to-left, respectively.
Note that all the alternative ways of constructing a Composed returns a Tuple of bijectors. This ensures type-stability of implementations of all relating methdos, e.g. inv.
If you want to use an Array as the container instead you can do
Composed([b1, b2, ...])
In general this is not advised since you lose type-stability, but there might be cases where this is desired, e.g. if you have a insanely large number of bijectors to compose.
Examples
≡≡≡≡≡≡≡≡≡≡
It's important to note that ∘ does what is expected mathematically, which means that the bijectors are applied to the input right-to-left, e.g. first applying b2 and then b1:
(b1 ∘ b2)(x) == b1(b2(x)) # => true
But in the Composed struct itself, we store the bijectors left-to-right, so that
cb1 = b1 ∘ b2 # => Composed.ts == (b2, b1)
cb2 = composel(b2, b1) # => Composed.ts == (b2, b1)
cb1(x) == cb2(x) == b1(b2(x)) # => true
If anything is lacking or not clear in docstrings, feel free to open an issue or PR.
The following are the bijectors available:
- Abstract:
Bijector
: super-type of all bijectors.ADBijector{AD} <: Bijector
: subtypes of this only require the user to implement(b::UserBijector)(x)
and(ib::Inverse{<:UserBijector})(y)
. Automatic differentation will be used to compute thejacobian(b, x)
and thus `logabsdetjac(b, x).
- Concrete:
Composed
: represents a composition of bijectors.Stacked
: stacks univariate and multivariate bijectorsIdentity
: does what it says, i.e. nothing.Logit
Exp
Log
Scale
: scaling by scalar value, though at the moment only well-definedlogabsdetjac
for univariate.Shift
: shifts by a scalar value.Permute
: permutes the input array using matrix multiplicationSimplexBijector
: mostly used as the constrained-to-unconstrained bijector forSimplexDistribution
, e.g.Dirichlet
.PlanarLayer
: §4.1 Eq. (10) in [1]RadialLayer
: §4.1 Eq. (14) in [1]
The distribution interface consists of:
TransformedDistribution <: Distribution
: implements theDistribution
interface from Distributions.jl. This meansrand
andlogpdf
are provided at the moment.
The following methods are implemented by all subtypes of Bijector
, this also includes bijectors such as Composed
.
(b::Bijector)(x)
: implements the transform of theBijector
inv(b::Bijector)
: returns the inverse ofb
, i.e.ib::Bijector
s.t.(ib ∘ b)(x) ≈ x
. In most cases this isInverse{<:Bijector}
.logabsdetjac(b::Bijector, x)
: computes log(abs(det(jacobian(b, x)))).forward(b::Bijector, x)
: returns named tuple(rv=b(x), logabsdetjac=logabsdetjac(b, x))
in the most efficient manner.∘
,composel
,composer
: convenient and type-safe constructors forComposed
.composel(bs...)
composes s.t. the resulting composition is evaluated left-to-right, whilecomposer(bs...)
is evaluated right-to-left.∘
is right-to-left, as excepted from standard mathematical notation.jacobian(b::Bijector, x)
[OPTIONAL]: returns the jacobian of the transformation. In some cases the analytical jacobian has been implemented for efficiency.dimension(b::Bijector)
: returns the dimensionality ofb
.isclosedform(b::Bijector)
: returnstrue
orfalse
depending on whether or notb(x)
has a closed-form implementation.
For TransformedDistribution
, together with default implementations for Distribution
, we have the following methods:
bijector(d::Distribution)
: returns the default constrained-to-unconstrained bijector ford
transformed(d::Distribution)
,transformed(d::Distribution, b::Bijector)
: constructs aTransformedDistribution
fromd
andb
.logpdf_forward(d::Distribution, x)
,logpdf_forward(d::Distribution, x, logjac)
: computes thelogpdf(td, td.transform(x))
using the forward pass, which is potentially faster depending on the transform at hand.forward(d::Distribution)
: returns(x = rand(dist), y = b(x), logabsdetjac = logabsdetjac(b, x), logpdf = logpdf_forward(td, x))
whereb = td.transform
. This combines sampling from base distribution and transforming into one function. The intention is that this entire process should be performed in the most efficient manner, e.g. thelogabsdetjac(b, x)
call might instead be implemented as- logabsdetjac(inv(b), b(x))
depending on which is most efficient.
- Rezende, D. J., & Mohamed, S. (2015). Variational Inference With Normalizing Flows. arXiv:1505.05770.
- Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2016). Automatic Differentiation Variational Inference. arXiv:1603.00788.