Implementation of Non-Linear Independent Component Estimation (NICE & RealNVP) in TF-2

This repository presents an implementation in TensorFlow v2 of NICE and RealNVP models, as described in the 2014 paper authored by Laurent Dinh, David Krueger, and Yoshua Bengio and in the 2016 paper authored by Laurent Dinh, Jascha Sohl-Dickstein, Samy Bengio. The NICE model serves as the foundational layer for subsequent normalizing flow models.

Model Explenantion

Main Components

The main idea behind the model is to:

Transform the inital unknown data distribution into a latent space with a known density via an invertible function.
Train the model by maximizing the known likelihood of the mapped data distribution by the change-of-variable rule.
Sample from the known density and invert the sampled points to reconstruct the original data space.

The following is the mathematical implementation of the previously discussed process:

Define the "latent" hidden space known distribution as the product of independent Logistic or Gaussian univariate densities:

$$ \mathbf{h} \sim p_{H}(\mathbf{h}) = \prod_{i}{p_{H_{i}}(h_{i})} $$

Map the initial data distribution to the hidden space distribution via $f$, parametrized by the parameters $\theta$:

Compute the latent representation and density

$$ f: \mathbf{X} \rightarrow \mathbf{H} \Rightarrow f_{\theta}(x) = h $$

$$f^{-1}: \mathbf{H} \rightarrow \mathbf{X} \Rightarrow f^{-1}_{\theta}(h) = x $$$$ $$p_{\mathbf{X}}(\mathbf{x}) = p_{\mathbf{H}}(f_{\theta}(\mathbf{x})) | det(\frac{\partial f_{\theta}(\mathbf{x}) }{\partial x}) | $$$$

Compute the likelihood of the latent space variables $h$ via the change of variable formula:

$$\mathcal{L} ( p_{\mathbf{X}}(\mathbf{x})) = \sum_{i} log (p_{\mathbf{H_{i}}}(f_{\theta}(\mathbf{x}_{i}))) + log (| det(\frac{\partial f_{\theta}(\mathbf{x}) }{\partial x}) |) $$$$

Samples of the initial data distribution are computed by inverting the samples from the hidden space distribution:

$$\mathbf{h} \sim p_{H}(\mathbf{h})$$

$$\mathbf{x} \sim f^{-1}(\mathbf{h})$$

Coupling function (Additive Coupling)

Since $f$ must be invertible in order to evaluate the likelihood, update the parameters, and invert the samples from the base prior distribution, the authors choose to implement and additive coupling rule which takes the following form:

Partition the initial data space into two partitions $x_{a}\in\mathbb{R}^{D-b}$ and $x_{b}\in\mathbb{R}^{D-a}$
Apply a transformation $g$ only on one partition:

$$h_{a} = x_{a}$$

$$h_{b} = x_{a} + g_{\theta}(x_{a})$$

The inverse of this coupling function will be:

$$x_{a} = h_{a}$$

$$x_{a} = h_{b} - g(x_{a})$$

The jacobian of this function is lower triangular and has unit determinant since:

$$\mathbb{J} = \begin{bmatrix} \frac{\partial{h_{a}}}{\partial{x_{a}}} & \frac{\partial{h_{a}}}{\partial{x_{b}}} \\ \frac{\partial{h_{b}}}{\partial{x_{a}}} & \frac{\partial{h_{b}}}{\partial{x_{b}}} \\ \end{bmatrix} = \begin{bmatrix} \mathbf{I} & \mathbf{0}\\ \frac{\partial{h_{b}}}{\partial{x_{a}}} & \mathbf{I} \\ \end{bmatrix}$$

and the resulting determinant is:

$$det(\mathbb{J}) = \mathbf{I} \cdot \mathbf{I} + \mathbf{0} \cdot \frac{\partial{h_{b}}}{\partial{x_{a}}} = \mathbf{I}$$

$$log(det(\mathbb{J})) = log(\mathbf{I}) = 0$$

Scaling function

To make the function more flexible the authors propose to multiply the output of the final coupling transformation with an invertible function which is applied element wise:

$$y_{i} = g_{\theta_{i}}(x_{i}) = x_{i} \cdot e^{\theta_{i}}$$

$$x_{i} = g^{-1}_{\theta_{i}}(y_{i}) = y_{i} \cdot e^{-\theta_{i}}$$

The jacobian of this function is diagonal and the resulting determinant is the product of the diagonal components:

$$\mathbb{J} = \begin{bmatrix} \frac{\partial{y_{a}}}{\partial{x_{a}}} & \frac{\partial{y_{a}}}{\partial{x_{b}}} \\ \frac{\partial{y_{b}}}{\partial{x_{a}}} & \frac{\partial{y_{b}}}{\partial{x_{b}}} \\ \end{bmatrix} = \begin{bmatrix} e^{\theta_{11}} & \mathbf{0}\\ \mathbf{0} & e^{\theta_{ii}} \\ \end{bmatrix}$$

$$det(\mathbb{J}) = \prod_{i} e^{\theta_{ii}}$$

$$log(det(\mathbb{J})) = \sum_{i}\theta_{ii}$$

Coupling Function (Affine Coupling)

In the paper RealNVP the authors combined the addition and scaling couplings to jointly learn to translate and scale the base density space with input dependent translation and scaling parameters. The coupling takes the following form:

$$h_{a} = x_{a}$$

$$h_{b} = x_{b} \cdot exp(s_{\theta}(x_{a})) + g_{\theta}(x_{a})$$

The inverse coupling function will be:

$$x_{a} = h_{a}$$

$$x_{b} = (h_{b} - g_{\theta}(x_{a})) \cdot exp(-s_{\theta}(x_{a}))$$

where $g_{\theta}$ and $s_{\theta}$ are neural networks.
The jacobian of this function is lower triangular and has unit determinant since:

$$\mathbb{J} = \begin{bmatrix} \frac{\partial{h_{a}}}{\partial{x_{a}}} & \frac{\partial{h_{a}}}{\partial{x_{b}}} \\ \frac{\partial{h_{b}}}{\partial{x_{a}}} & \frac{\partial{h_{b}}}{\partial{x_{b}}} \\ \end{bmatrix} = \begin{bmatrix} \mathbf{I} & \mathbf{0}\\ \frac{\partial{h_{b}}}{\partial{x_{a}}} & diag(exp(s_{\theta}(x_{a})) \\ \end{bmatrix}$$

$$det(\mathbb{J}) = \mathbf{I} \cdot diag(exp(s_{\theta}(x_{a})) = diag(exp(s_{\theta}(x_{a}))$$

$$log(det(\mathbb{J})) = \sum_{i} s_{\theta}(x_{a})_{i}$$

References

For further details, please refer to the NICE paper.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Implementation of Non-Linear Independent Component Estimation (NICE & RealNVP) in TF-2

Model Explenantion

Main Components

Coupling function (Additive Coupling)

Scaling function

Coupling Function (Affine Coupling)

References

Results

Circle Dataset

NICE Results

RealNVP Results

Half-Moons Dataset

NICE Results

RealNVP Results

Spirals Dataset

NICE Results

RealNVP Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Implementation of Non-Linear Independent Component Estimation (NICE & RealNVP) in TF-2

Model Explenantion

Main Components

Coupling function (Additive Coupling)

Scaling function

Coupling Function (Affine Coupling)

References

Results

Circle Dataset

NICE Results

RealNVP Results

Half-Moons Dataset

NICE Results

RealNVP Results

Spirals Dataset

NICE Results

RealNVP Results