Merge pull request #13 from fabian-sp/f-handle-differentiable

Handle differentiable functions and build docs
fabian-sp · May 6, 2024 · 05c8e90 · 05c8e90
2 parents db7c5e0 + 15f01ab
commit 05c8e90
Show file tree

Hide file tree

Showing 15 changed files with 482 additions and 82 deletions.
diff --git a/README.md b/README.md
@@ -1,49 +1,75 @@
 # ncOPT
+This repository is for solving **constrained optimization problems** where objective and/or constraint functions are (arbitrary) **Pytorch modules**. It is mainly intended for optimization with pre-trained networks, but might be useful also in other contexts.
 
-This repository is designed for solving constrained optimization problems where objective and/or constraint functions are Pytorch modules. It is mainly intended for optimization with pre-trained networks, but could be used for other purposes.
+## Short Description
 
-As main solver, this repository contains a `Python` implementation of the SQP-GS (*Sequential Quadratic Programming - Gradient Sampling*) algorithm by Curtis and Overton [1]. 
+The algorithms in this package can solve problems of the form
+
+```
+    min  f(x)
+    s.t. g(x) <= 0
+         h(x) = 0
+```
+
+where `f`, `g` and `h` are locally Lipschitz functions.
+
+Key functionalities of the package:
+
+* forward and backward pass of functions can be done on GPU
+* batched evaluation and Jacobian computation using Pytorch's `autograd`; no gradient implementation needed
+* support for inequality *and* equality constraints; constraints can use only a subset of the optimization variable as input
+
+
+### Table of contents
+
+1. [Installation](#installation)
+2. [Getting started](#getting-started)
+    - [Solver interface](#solver-interface)
+    - [The `ObjectiveOrConstraint` class](#the-objectiveorconstraint-class)
+    - [Functionalities](#functionalities)
+3. [Examples](#examples)
+4. [References](#references)
+
+### Disclaimer
+
+1) We have not (yet) extensively tested the solver on large-scale problems.
+2) The implemented solver is designed for nonsmooth, nonconvex problems, and as such, can solve a very general problem class. If your problem has a specific structure (e.g. convexity), then you will almost certainly get better performance by using software/solvers that are specifically written for the respective problem type. As starting point, check out [`cvxpy`](https://www.cvxpy.org/).
+3) The solver is not guaranteed to converge to a (global) solution (see the theoretical results in [1] for convergence to local solutions under certain assumptions).
 
-**DISCLAIMER:** 
 
-1) This implementation is a **prototype code**, it has been tested only for a simple problem and it is not (yet) performance-optimized.
-2) The implemented solver is designed for nonsmooth, nonconvex problems, and as such, can solve a very general problem class. If your problem has a specific structure (e.g. convexity), then you will almost certainly get better performance by using software/solvers that make use of this structure. As starting point, check out [`cvxpy`](https://www.cvxpy.org/).
-3) The main algorithm SQP-GS has been developed by Curtis and Overton in [1]. A Matlab implementation is available from the authors of the paper, see [2].
 
 ## Installation
 
-If you want to install an editable version of this package in your Python environment, run the command
+For an editable version of this package in your Python environment, run the command
 
 ```
     python -m pip install --editable .
 ```
 
-## Mathematical description
+## Getting started
 
-The algorithm can solve problems of the form
+### Solver interface
 
-```
-    min  f(x)
-    s.t. g(x) <= 0
-         h(x) = 0
-```
-
-where `f`, `g` and `h` are locally Lipschitz functions. The SQP-GS algorithm can solve problems with nonconvex and nonsmooth objective and constraints. For details, we refer to the original paper.
+The main solver implemented in this package is called SQP-GS, and has been developed by Curtis and Overton in [1]. See [the detailed documentation](src/ncopt/sqpgs/).
 
-## Implementation details
-The solver can be called via 
+The SQP-GS solver can be called via 
 
 ```python
     from ncopt.sqpgs import SQPGS
     problem = SQPGS(f, gI, gE)
     problem.solve()
 ```
-The three main arguments, called `f`, `gI` and `gE`, are the objective, the inequality and equality constaints respectively. Each argument should be passed as a list of instances of `ncopt.functions.ObjectiveOrConstraint` (see example below). 
+Here `f` is the objective function, and `gI` and `gE` are a list of inequality and equality constaints. An empty list can be passed if no (in)equality constraints are needed.
+
+### The `ObjectiveOrConstraint` class
+
+The objective `f` and each element of `gI` and `gE` should be passed as an instance of [`ncopt.functions.ObjectiveOrConstraint`](src/ncopt/functions/main.py) (a simple wrapper around a `torch.nn.Module`). 
 
 * Each constraint function is allowed to have multi-dimensional output (see example below).
-* An empty list can be passed if no (in)equality constraints are needed.
+* **IMPORTANT:** For the objective, we further need to specify the dimension of the optimization variable with the argument `dim`. For each constraint, we need to specify the output dimension with `dim_out`.
 
-For example, a linear constraint function `Ax <= b` could be implmented as follows:
+
+For example, a linear constraint function `Ax - b <= 0` can be implemented as follows:
 
 ```python
     from ncopt.functions import ObjectiveOrConstraint
@@ -54,22 +80,57 @@ For example, a linear constraint function `Ax <= b` could be implmented as follo
     g.model.bias.data = -b     # pass b
 ```
 
-Note the argument `dim_out`, which needs to be passed for all constraint functions: it tells the solver what the output dimension of this constraint is.
+### Functionalities
+
+Let's assume we have a `torch.nn.Module`, call it `model`, which we want to use as objective/constraint. For the solver, we can pass the function as  
+
+```
+f = ObjectiveOrConstraint(model)
+```
+
+* **Dimension handling:** Each function must be designed for batched evaluation (that is, the first dimension of the input is the batch size). Also, the output shape of `model` must be two-dimensional, that is `(batch_size, dim_out)`, where `dim_out` is the output dimension for a constraint, and `dim_out=1` for the objective.
 
-The main function class is `ncopt.functions.ObjectiveOrConstraint`. It is a simple wrapper around a given Pytorch module (e.g. the checkpoint of your trained network). We can evaluate the function and compute gradient using the standard Pytorch `autograd` functionalities. 
+* **Device handling:** The forward pass, and Jacobian calculation is done on the device on which the parameters of your model. For example, you can use `model.to(device)` before creating `f`. See this [Colab example](https://colab.research.google.com/drive/1scsusR4Fggo-vT-IPYsoa3ccROmGQkZ8?usp=sharing) how to use a GPU.
 
+* **Input preparation**: Different constraints might only need a part of the optimization variable as input, or might require additional preparation such as reshaping from vector to image. (Note that the optimization variable is handled always as vector). For this, you can specify a callable `prepare_input` when initializing a `ObjectiveOrConstraint` object. Any reshaping or cropping etc. can be handled with this function. Please note that `prepare_input` should be compatible with batched forward passes.
 
-## Example
+## Examples
+### 2D Nonsmooth Rosenbrock
 
-The code was tested for a 2-dim nonsmooth version of the Rosenbrock function, constrained with a maximum function. See Example 5.1 in [1]. For this problem, the analytical solution is known. The picture below shows the trajectory of SQP-GS for different starting points. The final iterates are marked with the black plus while the analytical solution is marked with the golden star. We can see that the algorithm finds the minimizer consistently.
+This example is taken from Example 5.1 in [1] and involves minimizing a nonsmooth Rosenbrock function, constrained with a maximum function. The picture below shows the trajectory of the SQP-GS solver for different starting points. The final iterates are marked with the black plus while the analytical solution is marked with the golden star. We can see that the algorithm finds the minimizer consistently.
 
-To reproduce this experiment, see the file `example_rosenbrock.py`.
+[Link to example script](examples/example_rosenbrock.py)
 
 ![SQP-GS trajectories for a 2-dim example](data/img/rosenbrock.png "SQP-GS trajectories for a 2-dim example")
 
+### Sparse signal recovery
+
+This example is taken from Example 5.3 in [1]. We minimize the q-norm $||x||_q$ under the constraint of approximate signal recovery $||Rx-y|| \leq \delta$. Here $R$ comes from the Discrete Cosine Transform.
+
+[Link to example script](examples/example_residual.py)
+
+When $q\geq 1$ the problem is convex and rather easy to solve. For $q<1$, we observed that the number of sample points need to be increased a lot in order to make the subproblems solvable in reasonable time.
+
+This problem has dimension 256, with one scalar constraint. To give a feeling for runtime, below is the runtime per iteration (for $q=1$), split into the main parts: sample points and compute gradients (`sample_and_grad`), solve the quadratic subproblem (`subproblem`), do the update step of iterate and approximate Hessian (`step`), and all other routines. 
+
+![Timings for SQP-GS with dim 256](data/img/timings_residual.png "Timings for SQP-GS with dim 256")
+
+### Pretrained neural network constraint
+
+This toy example illustrates how to use a pretrained neural network as constraint function in `ncOPT`. We train a simple model to learn the mapping $(x_1,x_2) \mapsto \max(\sqrt{2}x_1, 2x_2) -1 $. Then, we load the model checkpoint to use it as constraint.
+
+Below we show the feasible set (in blue), and the final iterate, if we use as objective the squared distance to the vector of ones.
+
+[Link to example script](examples/example_checkpoint.py)
+
+[Link to training script](scripts/train_max_fun.py)
+
+
+![SQP-GS final iterate with learned constraint](data/img/checkpoint.png "SQP-GS final iterate with learned constraint")
+
+
+
 
 ## References
 [1] Frank E. Curtis and Michael L. Overton, A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization, 
-SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
-
-[2] Frank E. Curtis and Michael L. Overton, MATLAB implementation of SLQP-GS, https://coral.ise.lehigh.edu/frankecurtis/software/.
+SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
diff --git a/data/img/checkpoint.png b/data/img/checkpoint.png
diff --git a/data/img/rosenbrock.png b/data/img/rosenbrock.png
diff --git a/data/img/timings_residual.png b/data/img/timings_residual.png
diff --git a/examples/example_checkpoint.py b/examples/example_checkpoint.py
@@ -0,0 +1,74 @@
+"""
+Illustrates how to use a pretrained neural network as constraint function.
+
+We load a checkpoint, that has been trained with the script in scripts/train_max_fun.py.
+"""
+
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.max_linear import MaxOfLinear
+from ncopt.functions.quadratic import Quadratic
+from ncopt.sqpgs import SQPGS
+
+# %% Load the checkpoint
+checkpoint_dir = "../data/checkpoints/max2d.pt"
+
+model = MaxOfLinear(input_dim=2, output_dim=2)
+
+checkpoint = torch.load(checkpoint_dir)
+model.load_state_dict(checkpoint["model_state_dict"])
+
+print("Weights:", model.linear.weight)
+print("Bias:", model.linear.bias)
+
+
+# %% Problem setup
+
+# Define the constraint
+g = ObjectiveOrConstraint(model, dim_out=1)
+
+# Define the objective: f(x) = 0.5*||x-(1,1)||^2
+params = (torch.eye(2), -torch.ones(2), torch.tensor(1.0))
+f = ObjectiveOrConstraint(Quadratic(params=params), dim=2, is_differentiable=True)
+
+problem = SQPGS(f, [g], [], x0=None, tol=1e-10, max_iter=500, verbose=True)
+x = problem.solve()
+
+print("Final iterate: ", x)
+
+# %% Plot the feasible region
+
+_x, _y = np.linspace(-2, 2, 100), np.linspace(-2, 2, 100)
+X, Y = np.meshgrid(_x, _y)
+Z = np.zeros_like(X)
+
+for j in np.arange(X.shape[0]):
+    for i in np.arange(X.shape[1]):
+        Z[i, j] = g.single_eval(np.array([X[i, j], Y[i, j]]))
+
+Z[Z > 0] = np.nan  # only show feasible set
+
+fig, ax = plt.subplots(figsize=(4, 4))
+
+# Plot contour of feasible set
+ax.contourf(X, Y, Z, cmap="Blues", levels=np.linspace(-4, 0, 20), antialiased=True, lw=0, zorder=0)
+
+# plot circle
+ax.scatter(1, 1, marker="o", s=50, c="k")
+dist_to_ones = np.linalg.norm(x - np.ones(2))
+c = plt.Circle((1, 1), radius=dist_to_ones, facecolor=None, fill=False, edgecolor="grey")
+ax.add_patch(c)
+# plot final iterate
+ax.scatter(x[0], x[1], marker="o", s=50, c="silver", label="Final iterate")
+
+
+ax.set_xlim(-2, 2)
+ax.set_ylim(-2, 2)
+ax.legend(loc="upper left")
+ax.set_title("Minimize distance to black dot")
+
+fig.tight_layout()
+fig.savefig("../data/img/checkpoint.png")
diff --git a/scripts/example_residual.py → examples/example_residual.py b/scripts/example_residual.py → examples/example_residual.py
@@ -1,6 +1,4 @@
 """
-@author: Fabian Schaipp
-
 Implements Example 5.3 in
 
     Frank E. Curtis and Michael L. Overton, A sequential quadratic programming
@@ -24,7 +22,7 @@
 
 d = 256  # problem dimension
 m = 32  # number of samples
-q = 1.0  # residual norm order
+q = 1.0  # residual norm order. Value less than 1.0 makes problem non-convex.
 
 device = "cuda" if torch.cuda.is_available() else "cpu"
 
@@ -61,9 +59,13 @@
 gI = [ObjectiveOrConstraint(const, dim_out=1)]
 gE = []
 
-options = {"num_points_obj": 5, "num_points_gI": 5, "qp_solver": "osqp"}
+if q >= 1:
+    options = {"num_points_obj": 5, "num_points_gI": 5, "qp_solver": "osqp"}
+else:
+    # only tested for q=0.7
+    options = {"num_points_obj": 100, "num_points_gI": 100, "qp_solver": "osqp"}
 
-problem = SQPGS(f, gI, gE, x0=None, tol=1e-10, max_iter=500, options=options, verbose=True)
+problem = SQPGS(f, gI, gE, x0=None, tol=1e-10, max_iter=450, options=options, verbose=True)
 
 # %% Solve
 
@@ -74,6 +76,7 @@
 # %% Plotting
 
 fig, ax = problem.plot_timings()
+fig.savefig("../data/img/timings_residual.png")
 fig, ax = problem.plot_metrics()
 
 

diff --git a/example_rosenbrock.py → examples/example_rosenbrock.py b/example_rosenbrock.py → examples/example_rosenbrock.py
@@ -1,5 +1,10 @@
 """
-@author: Fabian Schaipp
+Implements Example 5.1 in
+
+    Frank E. Curtis and Michael L. Overton, A sequential quadratic programming
+    algorithm for nonconvex, nonsmooth constrained optimization,
+    SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
+
 """
 
 import matplotlib.pyplot as plt
@@ -12,8 +17,6 @@
 from ncopt.functions.rosenbrock import NonsmoothRosenbrock
 from ncopt.sqpgs import SQPGS
 
-np.random.seed(1234)
-
 # %% Setup
 
 f = ObjectiveOrConstraint(NonsmoothRosenbrock(a=8.0), dim=2)
@@ -37,7 +40,7 @@
 
 # %% How to use the solver
 
-problem = SQPGS(f, gI, gE, x0=None, tol=1e-6, max_iter=100, verbose=True)
+problem = SQPGS(f, gI, gE, x0=None, tol=1e-10, max_iter=100, verbose=True)
 x = problem.solve()
 
 print("Distance to solution:", f"{np.linalg.norm(x - xstar):.9f}")
@@ -60,15 +63,11 @@
 
 # Plot contour and solution
 ax.contourf(X, Y, Z, cmap="gist_heat", levels=20, alpha=0.7, antialiased=True, lw=0, zorder=0)
-# ax.contourf(X, Y, Z, colors='lightgrey', levels=20, alpha=0.7,
-#             antialiased=True, lw=0, zorder=0)
-# ax.contour(X, Y, Z, cmap='gist_heat', levels=8, alpha=0.7,
-#             antialiased=True, linewidths=4, zorder=0)
 ax.scatter(xstar[0], xstar[1], marker="*", s=200, c="gold", zorder=1)
 
 for i in range(7):
     x0 = np.random.randn(2)
-    problem = SQPGS(f, gI, gE, x0, tol=1e-6, max_iter=100, verbose=False, store_history=True)
+    problem = SQPGS(f, gI, gE, x0, tol=1e-10, max_iter=100, verbose=False, store_history=True)
     x_k = problem.solve()
     print(x_k)
 
@@ -90,4 +89,4 @@
 ax.legend(handles=legend_elements, ncol=3, fontsize=8)
 
 fig.tight_layout()
-fig.savefig("data/img/rosenbrock.png")
+fig.savefig("../data/img/rosenbrock.png")