diff --git a/CITATION.cff b/CITATION.cff
index 221380e..5ed186f 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -2,6 +2,8 @@ cff-version: 1.2.0
 authors:
   - family-names: Schaipp
     given-names: Fabian
-title: "Python implementation of the SQP-GS algorithm"
+  - family-names: Schiele
+    given-names: Philipp
+title: "Constrained optimization for Pytorch using the SQP-GS algorithm."
 version: 0.1.0
-date-released: 2021-09-03
\ No newline at end of file
+date-released: 2024-xx-xx
\ No newline at end of file
diff --git a/LICENSE b/LICENSE
index df9141e..a07f450 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,29 +1,9 @@
-BSD 3-Clause License
+MIT License
 
-Copyright (c) 2021, Fabian Schaipp
-All rights reserved.
+Copyright 2024 Fabian Schaipp, Philipp Schiele
 
-Redistribution and use in source and binary forms, with or without
-modification, are permitted provided that the following conditions are met:
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 
-1. Redistributions of source code must retain the above copyright notice, this
-   list of conditions and the following disclaimer.
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
 
-2. Redistributions in binary form must reproduce the above copyright notice,
-   this list of conditions and the following disclaimer in the documentation
-   and/or other materials provided with the distribution.
-
-3. Neither the name of the copyright holder nor the names of its
-   contributors may be used to endorse or promote products derived from
-   this software without specific prior written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
-AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
-FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
-DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
-SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
-CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
-OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
-OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
\ No newline at end of file
diff --git a/README.md b/README.md
index bc330e9..d87d244 100644
--- a/README.md
+++ b/README.md
@@ -1,64 +1,142 @@
 # ncOPT
-This repository contains a `Python` implementation of the SQP-GS (*Sequential Quadratic Programming - Gradient Sampling*) algorithm by Curtis and Overton [1]. 
+This repository is for solving **constrained optimization problems** where objective and/or constraint functions are (arbitrary) **PyTorch modules**. It is mainly intended for optimization with pre-trained networks, but might be useful also in other contexts.
 
-**Note:** this implementation is a **prototype code**, it has been tested only for a simple problem and it is not performance-optimized. A Matlab implementation is available from the authors of the paper, see [2].
+## Short Description
 
-## Installation
-
-If you want to install an editable version of this package in your Python environment, run the command
+The algorithms in this package can solve problems of the form
 
 ```
-    python -m pip install --editable .
+    min  f(x)
+    s.t. g(x) <= 0
+         h(x) = 0
 ```
 
-## Mathematical description
+where `f`, `g` and `h` are locally Lipschitz functions.
+
+Key functionalities of the package:
+
+* forward and backward pass of functions can be done on GPU
+* batched evaluation and Jacobian computation using PyTorch's `autograd`; no gradient implementation needed
+* support for inequality *and* equality constraints; constraints can use only a subset of the optimization variable as input
 
-The algorithm can solve problems of the form
+
+### Table of contents
+
+1. [Installation](#installation)
+2. [Getting started](#getting-started)
+    - [Solver interface](#solver-interface)
+    - [The `ObjectiveOrConstraint` class](#the-objectiveorconstraint-class)
+    - [Functionalities](#functionalities)
+3. [Examples](#examples)
+4. [References](#references)
+
+### Disclaimer
+
+1) We have not (yet) extensively tested the solver on large-scale problems.
+2) The implemented solver is designed for nonsmooth, nonconvex problems, and as such, can solve a very general problem class. If your problem has a specific structure (e.g. convexity), then you will almost certainly get better performance by using software/solvers that are specifically written for the respective problem type. As starting point, check out [`cvxpy`](https://www.cvxpy.org/).
+3) The solver is not guaranteed to converge to a (global) solution (see the theoretical results in [1] for convergence to local solutions under certain assumptions).
+
+
+
+## Installation
+
+The package is available from PyPi with
 
 ```
-    min  f(x)
-    s.t. g(x) <= 0
-         h(x) = 0
+    python -m pip install ncopt
 ```
 
-where `f`, `g` and `h` are locally Lipschitz functions. Hence, the algorithm can solve problems with nonconvex and nonsmooth objective and constraints. For details, we refer to the original paper.
-
-## Example
+Alternatively, for an editable version of this package in your Python environment, clone this repository and run the command
 
-The code was tested for a 2-dim nonsmooth version of the Rosenbrock function, constrained with a maximum function. See Example 5.1 in [1]. For this problem, the analytical solution is known. The picture below shows the trajectory of SQP-GS for different starting points. The final iterates are marked with the black plus while the analytical solution is marked with the golden star. We can see that the algorithm finds the minimizer consistently.
+```
+   python -m pip install --editable .
+```
 
-To reproduce this experiment, see the file `example_rosenbrock.py`.
+## Getting started
 
-![SQP-GS trajectories for a 2-dim example](data/img/rosenbrock.png "SQP-GS trajectories for a 2-dim example")
+### Solver interface
 
+The main solver implemented in this package is called SQP-GS, and has been developed by Curtis and Overton in [1]. See [the detailed documentation](src/ncopt/sqpgs/).
 
-## Implementation details
-The solver can be called via 
+The SQP-GS solver can be called via 
 
 ```python
     from ncopt.sqpgs import SQPGS
     problem = SQPGS(f, gI, gE)
     problem.solve()
 ```
-It has three main arguments, called `f`, `gI` and `gE`. `f` is the objective. `gI` and `gE` are lists of inequality and equality constraint functions. Each element of `gI` and `gE` as well as the objective `f` needs to be an instance of a class which contains the following properties. The constraint functions are allowed to have multi-dimensional output.
+Here `f` is the objective function, and `gI` and `gE` are a list of inequality and equality constraints. An empty list can be passed if no (in)equality constraints are needed.
+
+### The `ObjectiveOrConstraint` class
+
+The objective `f` and each element of `gI` and `gE` should be passed as an instance of [`ncopt.functions.ObjectiveOrConstraint`](src/ncopt/functions/main.py) (a simple wrapper around a `torch.nn.Module`). 
+
+* Each constraint function is allowed to have multi-dimensional output (see example below).
+* **IMPORTANT:** For the objective, we further need to specify the dimension of the optimization variable with the argument `dim`. For each constraint, we need to specify the output dimension with `dim_out`.
+
+
+For example, a linear constraint function `Ax - b <= 0` can be implemented as follows:
+
+```python
+    from ncopt.functions import ObjectiveOrConstraint
+    A = ..                      # your data
+    b = ..                      # your data
+    g = ObjectiveOrConstraint(torch.nn.Linear(2, 2), dim_out=2)
+    g.model.weight.data = A    # pass A
+    g.model.bias.data = -b     # pass b
+```
+
+### Functionalities
+
+Let's assume we have a `torch.nn.Module`, call it `model`, which we want to use as objective/constraint. For the solver, we can pass the function as  
+
+```
+f = ObjectiveOrConstraint(model)
+```
+
+* **Dimension handling:** Each function must be designed for batched evaluation (that is, the first dimension of the input is the batch size). Also, the output shape of `model` must be two-dimensional, that is `(batch_size, dim_out)`, where `dim_out` is the output dimension for a constraint, and `dim_out=1` for the objective.
+
+* **Device handling:** The forward pass, and Jacobian calculation is done on the device on which the parameters of your model. For example, you can use `model.to(device)` before creating `f`. See this [Colab example](https://colab.research.google.com/drive/1scsusR4Fggo-vT-IPYsoa3ccROmGQkZ8?usp=sharing) how to use a GPU.
 
-### Attributes
+* **Input preparation**: Different constraints might only need a part of the optimization variable as input, or might require additional preparation such as reshaping from vector to image. (Note that the optimization variable is handled always as vector). For this, you can specify a callable `prepare_input` when initializing a `ObjectiveOrConstraint` object. Any reshaping or cropping etc. can be handled with this function. Please note that `prepare_input` should be compatible with batched forward passes.
 
-* `self.dim`: integer, specifies dimension of the input argument.
-* `self.dimOut`: integer, specifies dimension of the output.
+## Examples
+### 2D Nonsmooth Rosenbrock
 
-### Methods
+This example is taken from Example 5.1 in [1] and involves minimizing a nonsmooth Rosenbrock function, constrained with a maximum function. The picture below shows the trajectory of the SQP-GS solver for different starting points. The final iterates are marked with the black plus while the analytical solution is marked with the golden star. We can see that the algorithm finds the minimizer consistently.
+
+[Link to example script](examples/example_rosenbrock.py)
+
+![SQP-GS trajectories for a 2-dim example](data/img/rosenbrock.png "SQP-GS trajectories for a 2-dim example")
 
-* `self.eval`: evaluates the function at a point `x`.
-* `self.grad`: evaluates the gradient at a point `x`. Here, the Jacobian must be returned, i.e. an array of shape `dimOut x dim`.
+### Sparse signal recovery
+
+This example is taken from Example 5.3 in [1]. We minimize the q-norm $||x||_q$ under the constraint of approximate signal recovery $||Rx-y|| \leq \delta$. Here $R$ comes from the Discrete Cosine Transform.
+
+[Link to example script](examples/example_residual.py)
+
+When $q\geq 1$ the problem is convex and rather easy to solve. For $q<1$, we observed that the number of sample points need to be increased a lot in order to make the subproblems solvable in reasonable time.
+
+This problem has dimension 256, with one scalar constraint. To give a feeling for runtime, below is the runtime per iteration (for $q=1$), split into the main parts: sample points and compute gradients (`sample_and_grad`), solve the quadratic subproblem (`subproblem`), do the update step of iterate and approximate Hessian (`step`), and all other routines. 
+
+![Timings for SQP-GS with dim 256](data/img/timings_residual.png "Timings for SQP-GS with dim 256")
+
+### Pretrained neural network constraint
+
+This toy example illustrates how to use a pretrained neural network as constraint function in `ncOPT`. We train a simple model to learn the mapping $(x_1,x_2) \mapsto \max(\sqrt{2}x_1, 2x_2) -1$. Then, we load the model checkpoint to use it as constraint.
+
+Below we show the feasible set (in blue), and the final iterate, if we use as objective the squared distance to the vector of ones.
+
+[Link to example script](examples/example_checkpoint.py)
+
+[Link to training script](scripts/train_max_fun.py)
+
+
+![SQP-GS final iterate with learned constraint](data/img/checkpoint.png "SQP-GS final iterate with learned constraint")
 
-For an example, see the classes defined in `ncopt/funs.py`.
-Moreover, we implemented a class for a constraint coming from a Pytorch neural network (i.e. `g_i(x)` is an already trained neural network). For this, see `ncopt/torch_obj.py`.
 
 
 
 ## References
 [1] Frank E. Curtis and Michael L. Overton, A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization, 
 SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
-
-[2] Frank E. Curtis and Michael L. Overton, MATLAB implementation of SLQP-GS, https://coral.ise.lehigh.edu/frankecurtis/software/.
diff --git a/data/checkpoints/max2d.pt b/data/checkpoints/max2d.pt
index 8a0b456..f6f37cc 100644
Binary files a/data/checkpoints/max2d.pt and b/data/checkpoints/max2d.pt differ
diff --git a/data/img/checkpoint.png b/data/img/checkpoint.png
new file mode 100644
index 0000000..62a09fe
Binary files /dev/null and b/data/img/checkpoint.png differ
diff --git a/data/img/rosenbrock.png b/data/img/rosenbrock.png
index c7f7564..656b981 100644
Binary files a/data/img/rosenbrock.png and b/data/img/rosenbrock.png differ
diff --git a/data/img/timings_residual.png b/data/img/timings_residual.png
new file mode 100644
index 0000000..6752833
Binary files /dev/null and b/data/img/timings_residual.png differ
diff --git a/examples/example_checkpoint.py b/examples/example_checkpoint.py
new file mode 100644
index 0000000..d46782c
--- /dev/null
+++ b/examples/example_checkpoint.py
@@ -0,0 +1,74 @@
+"""
+Illustrates how to use a pretrained neural network as constraint function.
+
+We load a checkpoint, that has been trained with the script in scripts/train_max_fun.py.
+"""
+
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.max_linear import MaxOfLinear
+from ncopt.functions.quadratic import Quadratic
+from ncopt.sqpgs import SQPGS
+
+# %% Load the checkpoint
+checkpoint_dir = "../data/checkpoints/max2d.pt"
+
+model = MaxOfLinear(input_dim=2, output_dim=2)
+
+checkpoint = torch.load(checkpoint_dir)
+model.load_state_dict(checkpoint["model_state_dict"])
+
+print("Weights:", model.linear.weight)
+print("Bias:", model.linear.bias)
+
+
+# %% Problem setup
+
+# Define the constraint
+g = ObjectiveOrConstraint(model, dim_out=1)
+
+# Define the objective: f(x) = 0.5*||x-(1,1)||^2
+params = (torch.eye(2), -torch.ones(2), torch.tensor(1.0))
+f = ObjectiveOrConstraint(Quadratic(params=params), dim=2, is_differentiable=True)
+
+problem = SQPGS(f, [g], [], x0=None, tol=1e-10, max_iter=500, verbose=True)
+x = problem.solve()
+
+print("Final iterate: ", x)
+
+# %% Plot the feasible region
+
+_x, _y = np.linspace(-2, 2, 100), np.linspace(-2, 2, 100)
+X, Y = np.meshgrid(_x, _y)
+Z = np.zeros_like(X)
+
+for j in np.arange(X.shape[0]):
+    for i in np.arange(X.shape[1]):
+        Z[i, j] = g.single_eval(np.array([X[i, j], Y[i, j]]))
+
+Z[Z > 0] = np.nan  # only show feasible set
+
+fig, ax = plt.subplots(figsize=(4, 4))
+
+# Plot contour of feasible set
+ax.contourf(X, Y, Z, cmap="Blues", levels=np.linspace(-4, 0, 20), antialiased=True, lw=0, zorder=0)
+
+# plot circle
+ax.scatter(1, 1, marker="o", s=50, c="k")
+dist_to_ones = np.linalg.norm(x - np.ones(2))
+c = plt.Circle((1, 1), radius=dist_to_ones, facecolor=None, fill=False, edgecolor="grey")
+ax.add_patch(c)
+# plot final iterate
+ax.scatter(x[0], x[1], marker="o", s=50, c="silver", label="Final iterate")
+
+
+ax.set_xlim(-2, 2)
+ax.set_ylim(-2, 2)
+ax.legend(loc="upper left")
+ax.set_title("Minimize distance to black dot")
+
+fig.tight_layout()
+fig.savefig("../data/img/checkpoint.png")
diff --git a/examples/example_residual.py b/examples/example_residual.py
new file mode 100644
index 0000000..622f7d1
--- /dev/null
+++ b/examples/example_residual.py
@@ -0,0 +1,91 @@
+"""
+Implements Example 5.3 in
+
+    Frank E. Curtis and Michael L. Overton, A sequential quadratic programming
+    algorithm for nonconvex, nonsmooth constrained optimization,
+    SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
+
+Useful script for testing and performance optimization.
+"""
+
+import matplotlib.pyplot as plt
+import numpy as np
+import torch
+from scipy.fft import dct
+
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.norm_residual import NormResidual
+from ncopt.sqpgs import SQPGS
+
+# %%
+np.random.seed(1234)
+
+d = 256  # problem dimension
+m = 32  # number of samples
+q = 1.0  # residual norm order. Value less than 1.0 makes problem non-convex.
+
+device = "cuda" if torch.cuda.is_available() else "cpu"
+
+# %% Objective function: ||x||_q
+
+Id, zeros = torch.eye(d, d), torch.zeros(d)
+obj = NormResidual(params=(Id, zeros), q=q, offset=0.0)
+
+obj.to(device)
+
+# %% Constraint: ||Rx-y|| <= delta
+
+num_zeros = int(0.9 * d)
+oracle = np.concatenate((np.zeros(num_zeros), np.random.randn(d - num_zeros)))
+np.random.shuffle(oracle)
+
+# first m rows of discrete dxd cosine transformation matrix
+R = torch.from_numpy(dct(np.eye(d), axis=0)[:m, :]).type(torch.float32)
+y = (R @ oracle).type(torch.float32)
+delta = 1.0
+
+const = NormResidual(params=(R, y), q=2, offset=delta)
+const.to(device)
+
+assert np.allclose(
+    const.forward(torch.from_numpy(oracle).type(torch.float32).reshape(1, -1).to(device)).item(),
+    -delta,
+)
+
+# %% Set up problem
+
+f = ObjectiveOrConstraint(obj, dim=d)
+
+gI = [ObjectiveOrConstraint(const, dim_out=1)]
+gE = []
+
+if q >= 1:
+    options = {"num_points_obj": 5, "num_points_gI": 5, "qp_solver": "osqp"}
+else:
+    # only tested for q=0.7
+    options = {"num_points_obj": 100, "num_points_gI": 100, "qp_solver": "osqp"}
+
+problem = SQPGS(f, gI, gE, x0=None, tol=1e-10, max_iter=450, options=options, verbose=True)
+
+# %% Solve
+
+np.random.seed(123)
+torch.manual_seed(123)
+x = problem.solve()
+
+# %% Plotting
+
+fig, ax = problem.plot_timings()
+fig.savefig("../data/img/timings_residual.png")
+fig, ax = problem.plot_metrics()
+
+
+# plot solution vs oracle
+fig, ax = plt.subplots(figsize=(7, 3))
+ax.plot(oracle, c="k", lw=1, label="Oracle")
+ax.plot(x, c="steelblue", lw=2, label="Final iterate")
+ax.set_xlabel("Coordinate")
+ax.set_ylabel(r"$x_i$")
+fig.tight_layout()
+
+# %%
diff --git a/example_rosenbrock.py b/examples/example_rosenbrock.py
similarity index 57%
rename from example_rosenbrock.py
rename to examples/example_rosenbrock.py
index 83c4056..ef16ec0 100755
--- a/example_rosenbrock.py
+++ b/examples/example_rosenbrock.py
@@ -1,32 +1,46 @@
 """
-@author: Fabian Schaipp
+Implements Example 5.1 in
+
+    Frank E. Curtis and Michael L. Overton, A sequential quadratic programming
+    algorithm for nonconvex, nonsmooth constrained optimization,
+    SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
+
 """
 
 import matplotlib.pyplot as plt
 import numpy as np
+import torch
 from matplotlib.lines import Line2D
 
-from ncopt.funs import f_rosenbrock, g_max
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.max_linear import MaxOfLinear
+from ncopt.functions.rosenbrock import NonsmoothRosenbrock
 from ncopt.sqpgs import SQPGS
 
 # %% Setup
 
-f = f_rosenbrock()
-g = g_max()
-
-# could add this equality constraint
-# A = np.eye(2); b = np.ones(2)*5; g1 = g_linear(A, b)
+f = ObjectiveOrConstraint(NonsmoothRosenbrock(a=8.0), dim=2)
+g = MaxOfLinear(
+    params=(torch.diag(torch.tensor([torch.sqrt(torch.tensor(2.0)), 2.0])), -torch.ones(2))
+)
 
 # inequality constraints (list of functions)
-gI = [g]
-# equality constraints (list of functions)
+gI = [ObjectiveOrConstraint(g, dim_out=1)]
+
+# equality constraints
 gE = []
 
+# Optional equality constraint (for testing and education purpose)
+# g2 = ObjectiveOrConstraint(torch.nn.Linear(2, 2), dim_out=2)
+# g2.model.weight.data = torch.eye(2)
+# g2.model.bias.data = torch.zeros(2)
+# gE = [g2]
+
 xstar = np.array([1 / np.sqrt(2), 0.5])  # solution
 
 # %% How to use the solver
 
-problem = SQPGS(f, gI, gE, x0=None, tol=1e-6, max_iter=100, verbose=True)
+problem = SQPGS(f, gI, gE, x0=None, tol=1e-10, max_iter=100, verbose=True)
 x = problem.solve()
 
 print("Distance to solution:", f"{np.linalg.norm(x - xstar):.9f}")
@@ -39,25 +53,21 @@
 
 for j in np.arange(X.shape[0]):
     for i in np.arange(X.shape[1]):
-        Z[i, j] = f.eval(np.array([X[i, j], Y[i, j]]))
+        Z[i, j] = f.single_eval(np.array([X[i, j], Y[i, j]]))
 
 # %%
+np.random.seed(1)
+torch.manual_seed(1)
 
 fig, ax = plt.subplots(figsize=(5, 4))
 
-np.random.seed(123)
-
 # Plot contour and solution
 ax.contourf(X, Y, Z, cmap="gist_heat", levels=20, alpha=0.7, antialiased=True, lw=0, zorder=0)
-# ax.contourf(X, Y, Z, colors='lightgrey', levels=20, alpha=0.7,
-#             antialiased=True, lw=0, zorder=0)
-# ax.contour(X, Y, Z, cmap='gist_heat', levels=8, alpha=0.7,
-#             antialiased=True, linewidths=4, zorder=0)
 ax.scatter(xstar[0], xstar[1], marker="*", s=200, c="gold", zorder=1)
 
-for i in range(20):
+for i in range(7):
     x0 = np.random.randn(2)
-    problem = SQPGS(f, gI, gE, x0, tol=1e-6, max_iter=100, verbose=False, store_history=True)
+    problem = SQPGS(f, gI, gE, x0, tol=1e-10, max_iter=100, verbose=False, store_history=True)
     x_k = problem.solve()
     print(x_k)
 
@@ -79,4 +89,4 @@
 ax.legend(handles=legend_elements, ncol=3, fontsize=8)
 
 fig.tight_layout()
-fig.savefig("data/img/rosenbrock.png")
+fig.savefig("../data/img/rosenbrock.png")
diff --git a/pyproject.toml b/pyproject.toml
index adacb29..48faccb 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -5,13 +5,14 @@ build-backend = "hatchling.build"
 [project]
 name = "ncopt"
 dynamic = ["version"]
-description = 'This repository contains a Python implementation of the SQP-GS (Sequential Quadratic Programming - Gradient Sampling) algorithm by Curtis and Overton.'
+description = 'Constrained optimization for Pytorch using the SQP-GS algorithm.'
 readme = "README.md"
 requires-python = ">=3.9"
 license = "BSD-3-Clause"
 keywords = []
 authors = [
-  { name = "fabian-sp", email = "fabian.schaipp@gmail.com" },
+  { name = "Fabian Schaipp", email = "fabian.schaipp@tum.de" },
+  {name = "Philipp Schiele", email = ""},
 ]
 classifiers = [
   "Development Status :: 4 - Beta",
@@ -27,7 +28,7 @@ dependencies = [
   "numpy",
   "torch",
   "cvxpy-base",
-  "clarabel",
+  "osqp",
 ]
 
 [project.urls]
diff --git a/scripts/timing_rosenbrock.py b/scripts/timing_rosenbrock.py
index 5c0d19e..7f1a818 100644
--- a/scripts/timing_rosenbrock.py
+++ b/scripts/timing_rosenbrock.py
@@ -5,16 +5,22 @@
 import timeit
 
 import numpy as np
+import torch
 
-from ncopt.funs import f_rosenbrock, g_linear, g_max
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.max_linear import MaxOfLinear
+from ncopt.functions.rosenbrock import NonsmoothRosenbrock
 from ncopt.sqpgs import SQPGS
 
 # %%
-f = f_rosenbrock()
-g = g_max()
+f = ObjectiveOrConstraint(NonsmoothRosenbrock(a=8.0), dim=2)
+g = MaxOfLinear(
+    params=(torch.diag(torch.tensor([torch.sqrt(torch.tensor(2.0)), 2.0])), -torch.ones(2))
+)
+
 
 # inequality constraints (list of functions)
-gI = [g]
+gI = [ObjectiveOrConstraint(g, dim_out=1)]
 
 xstar = np.array([1 / np.sqrt(2), 0.5])
 
@@ -34,9 +40,9 @@
 
 # %% Timing equality constraint
 
-A = np.eye(2)
-b = np.zeros(2)
-g1 = g_linear(A, b)
+g1 = ObjectiveOrConstraint(torch.nn.Linear(2, 2), dim_out=2)
+g1.model.weight.data = torch.eye(2)
+g1.model.bias.data = torch.zeros(2)
 
 # equality constraints (list of scalar functions)
 gE = [g1]
diff --git a/scripts/train_max_fun.py b/scripts/train_max_fun.py
index cab709c..386e40c 100755
--- a/scripts/train_max_fun.py
+++ b/scripts/train_max_fun.py
@@ -11,6 +11,10 @@
 import torch
 from torch.optim.lr_scheduler import StepLR
 
+from ncopt.functions.max_linear import MaxOfLinear
+
+# %% Generate data
+
 c1 = np.sqrt(2)
 c2 = 2.0
 
@@ -44,28 +48,15 @@ def generate_data(grid_points):
 num_samples = len(tX)  # number of training points
 
 # %%
-
-
-class Max2D(torch.nn.Module):
-    def __init__(self):
-        super().__init__()
-        self.l1 = torch.nn.Linear(2, 2)
-
-    def forward(self, x):
-        x = self.l1(x)
-        x, _ = torch.max(x, dim=-1)
-        return x
-
-
 loss_fn = torch.nn.MSELoss(reduction="mean")
-model = Max2D()
+model = MaxOfLinear(input_dim=2, output_dim=2)
 
-print(model.l1.weight)
-print(model.l1.bias)
+print(model.linear.weight.data)
+print(model.linear.bias.data)
 
 # testing
-x = torch.tensor([1.0, 4.0])
-print("True value: ", g(x[0], x[1]), ". Predicted value: ", model(x).item())
+x = torch.tensor([[1.0, 4.0]])
+print("True value: ", g(x[0, 0], x[0, 1]), ". Predicted value: ", model(x)[0].item())
 
 
 # %% Training
@@ -88,7 +79,7 @@ def sample_batch(num_samples, b):
     for t in range(num_samples // batch_size):
         S = sample_batch(num_samples, batch_size)
         x_batch = tX[S]
-        z_batch = tZ[S]
+        z_batch = tZ[S][:, None]  # dummy dimension to match model output
 
         optimizer.zero_grad()
 
@@ -101,8 +92,8 @@ def sample_batch(num_samples, b):
     scheduler.step()
 
 print("Learned parameters:")
-print(model.l1.weight)
-print(model.l1.bias)
+print(model.linear.weight.data)
+print(model.linear.bias.data)
 
 
 # %% Save checkpoint
diff --git a/src/ncopt/functions/__init__.py b/src/ncopt/functions/__init__.py
new file mode 100644
index 0000000..1b9edd8
--- /dev/null
+++ b/src/ncopt/functions/__init__.py
@@ -0,0 +1 @@
+from .main import ObjectiveOrConstraint  # noqa
diff --git a/src/ncopt/functions/main.py b/src/ncopt/functions/main.py
new file mode 100644
index 0000000..df87516
--- /dev/null
+++ b/src/ncopt/functions/main.py
@@ -0,0 +1,128 @@
+from typing import Callable, Optional, Union
+
+import numpy as np
+import torch
+
+
+class ObjectiveOrConstraint(torch.nn.Module):
+    def __init__(
+        self,
+        model: torch.nn.Module,
+        dim: Optional[int] = None,
+        dim_out: int = 1,
+        name: Optional[str] = None,
+        device: Optional[Union[str, torch.device]] = None,
+        dtype: Optional[torch.dtype] = torch.float32,
+        prepare_inputs: Optional[Callable] = None,
+        is_differentiable: bool = False,
+    ) -> None:
+        """Wrapper for objective and constraint functions.
+
+        The function g(x) is represented by ``model`` which computes g(x) when given x as input.
+        Important:
+            * The mapping ``model`` should be compatible with batched inputs,
+                where the first dimension is the batch dimension.
+            * The output of ``model`` must be of shape ``(batch_size, dim_out)``,
+                where ``dim_out`` must be one if the function is used as objective.
+                In particular, if the output shape is ``(batch_size, )``,
+                we don't guarantee that the solver will work.
+
+        Note that we set ``model`` into ``.eval()`` mode during initialization.
+
+        Parameters
+        ----------
+        model : torch.nn.Module
+            The Pytorch module for evaluating the function.
+        dim : Optional[int], optional
+            Input dimension of the function, by default None.
+            Needs to be specified for objective functions for the solver.
+        dim_out : int, optional
+            Output dimension of the function, by default 1.
+            Needs to be specified for each constraint function for the solver.
+        name : Optional[str], optional
+            A name for the function, by default None.
+            Not needed for solver, only for convenience.
+        device : Optional[Union[str, torch.device]], optional
+            A device for the forward pass, by default None.
+            You normally don't need to specify this: we automatically use the device
+            where the parameters of the model lie on.
+            If you specify the device, please make sure, that the parameters of the model
+            lie on the correct device.
+        dtype : Optional[torch.dtype], optional
+            Format to convert when evaluating a tensor that was converted
+            from ``np.ndarray``, by default torch.float32.
+        prepare_inputs : Optional[Callable], optional
+            Callable to prepare the model inpt, by default None.
+            Note that the solver always inputs the full optimization variable (as batch).
+            If your function only needs a subset of the vector as input, or
+            needs reshaping operations etc, then you can do this via this callable.
+            Note that the callable should be compatible with batched tensors.
+        is_differentiable : bool, optional
+            Whether the function is (continuously) differentiable, by default False.
+            If ``True``, then no extra points will be sampled for this function.
+
+        """
+        super().__init__()
+
+        # TODO: Load from checkpoint if this is a string
+        self.model = model
+        self.dim = dim
+        self.dim_out = dim_out
+        self.name = name
+        self.device = device
+        self.dtype = dtype
+        self.prepare_inputs = prepare_inputs
+        self.is_differentiable = is_differentiable
+
+        # If no device is provided, set it to the same as the model parameters
+        # if model has no parameters, we set device to cpu
+        if not self.device:
+            if sum(p.numel() for p in model.parameters() if p.requires_grad) > 0:
+                devices = {p.device for p in model.parameters()}
+                if len(devices) == 1:
+                    self.device = devices.pop()
+                else:
+                    raise KeyError(
+                        "Model parameters lie on more than one device. Currently not supported."
+                    )
+            else:
+                self.device = torch.device("cpu")
+
+        # Go into eval mode
+        self.model.eval()
+        self.eval()
+
+        return
+
+    def forward(self, x: torch.tensor) -> torch.tensor:
+        """Method for evaluating the function.
+        This is a simple wrapper around ``model.forward()``.
+        We apply the ``prepare_inputs`` method, and move ``x`` to the correct device.
+
+        Parameters
+        ----------
+        x : torch.tensor
+            Batched input tensor. This will of shape ``(batch_size, dim)``,
+            where ``dim`` is the dimension of the optimization variable.
+            Note that ``x`` can be cropped or reshaped via ``prepare_inputs`` if necessary.
+
+        Returns
+        -------
+        torch.tensor
+            Output of the function. Of shape ``(batch_size, dim_out)``.
+        """
+        x = self.prepare_inputs(x) if self.prepare_inputs else x
+        out = self.model.forward(x.to(self.device))
+        return out
+
+    def single_eval(self, x: Union[torch.tensor, np.ndarray]) -> np.ndarray:
+        """Convenience function for (only!) evaluating at a single point.
+        This is needed for the Armijo line search in SQP-GS.
+        """
+        if isinstance(x, np.ndarray):
+            x = torch.from_numpy(x).to(self.dtype)
+
+        with torch.no_grad():
+            out = self.forward(x.reshape(1, -1))
+
+        return out.squeeze(dim=0).detach().cpu().numpy()
diff --git a/src/ncopt/functions/max_linear.py b/src/ncopt/functions/max_linear.py
new file mode 100644
index 0000000..78ee73c
--- /dev/null
+++ b/src/ncopt/functions/max_linear.py
@@ -0,0 +1,62 @@
+import torch
+
+
+class MaxOfLinear(torch.nn.Module):
+    """A composition of the maximum function with an affine mapping. Implements the mapping
+
+        x --> max(Ax + b)
+
+    where the maximum is taken over the components of Ax + b.
+    """
+
+    def __init__(
+        self,
+        input_dim: int = None,
+        output_dim: int = None,
+        params: tuple = None,
+        dtype=torch.float32,
+    ):
+        """
+        Either specify input_dim and output_dim, or specify params in the form
+        (A, b) where A and b are both a torch.tensor.
+
+        Parameters
+        ----------
+        input_dim : int, optional
+            Input dimension of the linear layer, by default None
+        output_dim : int, optional
+            Output dimension of the linear layer, by default None
+        params : tuple, optional
+            If you want to fix the linear mapping, specify as (A, b).
+            By default None.
+        dtype : _type_, optional
+            Will convert params to this type for the linear layer, by default torch.float32
+        """
+        super().__init__()
+
+        assert params is not None or (
+            input_dim is not None and output_dim is not None
+        ), "Specify either dimensions or parameters."
+
+        if params is not None:
+            assert (
+                len(params) == 2
+            ), f"params should contains two elements (A,b), but contains only {len(params)}."
+            output_dim, input_dim = params[0].shape
+            assert params[1].shape == torch.Size([output_dim]), "Shape of bias term does not match."
+
+        self.linear = torch.nn.Linear(input_dim, output_dim)
+
+        # Set the weights if the mapping is given
+        # Default type of torch.nn.Linear is float32
+        if params is not None:
+            self.linear.weight.data = params[0].type(dtype)
+            self.linear.bias.data = params[1].type(dtype)
+
+        return
+
+    def forward(self, x):
+        x = self.linear(x)
+        # make sure to have output shape [batch_size, 1] by keepdim=True
+        x, _ = torch.max(x, dim=-1, keepdim=True)
+        return x
diff --git a/src/ncopt/functions/norm_residual.py b/src/ncopt/functions/norm_residual.py
new file mode 100644
index 0000000..d99f80f
--- /dev/null
+++ b/src/ncopt/functions/norm_residual.py
@@ -0,0 +1,75 @@
+import torch
+
+
+class NormResidual(torch.nn.Module):
+    """A residual error mapping. Implements the function
+
+    x --> ||Ax-b||_q^p - offset
+    """
+
+    def __init__(
+        self,
+        input_dim: int = None,
+        output_dim: int = None,
+        params: tuple = None,
+        q: float = 2.0,
+        p: float = 1.0,
+        offset: float = 1.0,
+        dtype=torch.float32,
+    ):
+        """
+        Either specify input_dim and output_dim, or specify params in the form
+        (A, b) where A and b are both a torch.tensor.
+
+        Parameters
+        ----------
+        input_dim : int, optional
+            Input dimension of the linear layer, by default None
+        output_dim : int, optional
+            Output dimension of the linear layer, by default None
+        params : tuple, optional
+            If you want to fix the linear mapping, specify as (A, b).
+            By default None.
+        q : float, optional
+            Order of the norm, by default 2.0 (standard Euclidean norm).
+        q : float, optional
+            Power of the residual, by default 1.0 .
+        offset : float, optional
+            A constant value to offset the function, by default 1.0.
+        dtype : _type_, optional
+            Will convert params to this type for the linear layer, by default torch.float32
+        """
+        super().__init__()
+
+        assert q > 0, f"Order must be positive, but is given as {q}."
+        assert p > 0, f"Power must be positive, but is given as {p}."
+
+        self.p = p
+        self.q = q
+        self.offset = offset
+
+        assert params is not None or (
+            input_dim is not None and output_dim is not None
+        ), "Specify either dimensions or parameters."
+
+        if params is not None:
+            assert (
+                len(params) == 2
+            ), f"params should contains two elements (A,b), but contains only {len(params)}."
+            output_dim, input_dim = params[0].shape
+            assert params[1].shape == torch.Size([output_dim]), "Shape of bias term does not match."
+
+        self.linear = torch.nn.Linear(input_dim, output_dim)
+
+        # Set the weights if the mapping is given
+        # Default type of torch.nn.Linear is float32
+        if params is not None:
+            self.linear.weight.data = params[0].type(dtype)
+            self.linear.bias.data = (-1) * params[1].type(dtype)
+
+        return
+
+    def forward(self, x):
+        x = self.linear(x)
+        x = torch.linalg.norm(x, ord=self.q, dim=1, keepdim=True) ** (self.p) - self.offset
+        return x
diff --git a/src/ncopt/functions/quadratic.py b/src/ncopt/functions/quadratic.py
new file mode 100644
index 0000000..23364a3
--- /dev/null
+++ b/src/ncopt/functions/quadratic.py
@@ -0,0 +1,63 @@
+import torch
+
+
+class Quadratic(torch.nn.Module):
+    """A quadratic function. Implements the mapping
+
+    x --> (1/2)*x.T*A*x + b.T*x + c
+    """
+
+    def __init__(self, input_dim: int = None, params: tuple = None):
+        """
+
+        Parameters
+        ----------
+        input_dim : int, optional
+            If no params are specified this specifies the dimension of the tensors, by default None
+        params : tuple, optional
+           3-Tuple of tensors, by default None.
+           If specified, should contain the values of A, b and c.
+        """
+        super().__init__()
+
+        assert params is not None or (
+            input_dim is not None
+        ), "Specify either a dimension or parameters"
+        if params is not None:
+            assert (
+                len(params) == 3
+            ), f"params should contains three elements (A,b,c), but contains only {len(params)}."
+
+        if params is None:
+            self.A = torch.nn.Parameter(torch.randn(input_dim, input_dim))
+            self.b = torch.nn.Parameter(
+                torch.randn(
+                    input_dim,
+                )
+            )
+            self.c = torch.nn.Parameter(
+                torch.randn(
+                    1,
+                )
+            )
+        else:
+            self.A = torch.nn.Parameter(params[0])
+            self.b = torch.nn.Parameter(params[1])
+            self.c = torch.nn.Parameter(params[2])
+
+    def forward(self, x: torch.tensor) -> torch.tensor:
+        """
+
+        Parameters
+        ----------
+        x : torch.tensor
+            Should be batched, and have shape [batch_size, input_dim]
+
+        Returns
+        -------
+        torch.tensor of shape [batch_size, 1]
+        """
+        out = 0.5 * torch.sum((x @ self.A) * x, dim=1, keepdim=True)
+        out = out + (x * self.b).sum(dim=1, keepdim=True)
+        out = out + self.c
+        return out
diff --git a/src/ncopt/functions/rosenbrock.py b/src/ncopt/functions/rosenbrock.py
new file mode 100644
index 0000000..eef01d8
--- /dev/null
+++ b/src/ncopt/functions/rosenbrock.py
@@ -0,0 +1,22 @@
+import torch
+
+
+class NonsmoothRosenbrock(torch.nn.Module):
+    """A 2D nonsmooth Rosenbrock function, given by
+
+    x --> a*|x_1^2 - x_2| + (1-x_1)^2
+
+    """
+
+    def __init__(self, a: float = 8.0):
+        super().__init__()
+
+        self.a = a
+        return
+
+    def forward(self, x: torch.tensor) -> torch.tensor:
+        """x should be of shape (batch_size, 2)"""
+        out = self.a * torch.abs(x[:, 0] ** 2 - x[:, 1])
+        out += (1 - x[:, 0]) ** 2
+        out = out[:, None]  # need output shape (batch_size, 1)
+        return out
diff --git a/src/ncopt/funs.py b/src/ncopt/funs.py
deleted file mode 100755
index 0880023..0000000
--- a/src/ncopt/funs.py
+++ /dev/null
@@ -1,96 +0,0 @@
-"""
-author: Fabian Schaipp
-"""
-
-import numpy as np
-
-
-class f_rosenbrock:
-    """
-    Nonsmooth Rosenbrock function (see 5.1 in Curtis, Overton
-    "SQP FOR NONSMOOTH CONSTRAINED OPTIMIZATION")
-
-    x -> w|x_1^2 - x_2| + (1 - x_1)^2
-    """
-
-    def __init__(self, w=8):
-        self.name = "rosenbrock"
-        self.dim = 2
-        self.dimOut = 1
-        self.w = w
-
-    def eval(self, x):
-        return self.w * np.abs(x[0] ** 2 - x[1]) + (1 - x[0]) ** 2
-
-    def differentiable(self, x):
-        return np.abs(x[0] ** 2 - x[1]) > 1e-10
-
-    def grad(self, x):
-        a = np.array([-2 + x[0], 0])
-        sign = np.sign(x[0] ** 2 - x[1])
-
-        if sign == 1:
-            b = np.array([2 * x[0], -1])
-        elif sign == -1:
-            b = np.array([-2 * x[0], 1])
-        else:
-            b = np.array([-2 * x[0], 1])
-
-        # b = np.sign(x[0]**2 -x[1]) * np.array([2*x[0], -1])
-        return a + b
-
-
-class g_max:
-    """
-    maximum function (see 5.1 in Curtis, Overton "SQP FOR NONSMOOTH CONSTRAINED OPTIMIZATION")
-
-    x -> max(c1*x_1, c2*x_2) - 1
-    """
-
-    def __init__(self, c1=np.sqrt(2), c2=2.0):
-        self.name = "max"
-        self.c1 = c1
-        self.c2 = c2
-        self.dimOut = 1
-        return
-
-    def eval(self, x):
-        return np.maximum(self.c1 * x[0], self.c2 * x[1]) - 1
-
-    def differentiable(self, x):
-        return np.abs(self.c1 * x[0] - self.c2 * x[1]) > 1e-10
-
-    def grad(self, x):
-        sign = np.sign(self.c1 * x[0] - self.c2 * x[1])
-        if sign == 1:
-            g = np.array([self.c1, 0])
-        elif sign == -1:
-            g = np.array([0, self.c2])
-        else:
-            g = np.array([0, self.c2])
-        return g
-
-
-class g_linear:
-    """
-    linear constraint:
-
-    x -> Ax - b
-    """
-
-    def __init__(self, A, b):
-        self.name = "linear"
-        self.A = A
-        self.b = b
-        self.dim = A.shape[0]
-        self.dimOut = A.shape[1]
-        return
-
-    def eval(self, x):
-        return self.A @ x - self.b
-
-    def differentiable(self, x):
-        return True
-
-    def grad(self, x):
-        return self.A
diff --git a/src/ncopt/plot_utils.py b/src/ncopt/plot_utils.py
new file mode 100644
index 0000000..eca408b
--- /dev/null
+++ b/src/ncopt/plot_utils.py
@@ -0,0 +1,61 @@
+import numpy as np
+
+
+def plot_timings(timings, ax=None):
+    try:
+        import matplotlib.pyplot as plt
+    except ModuleNotFoundError:
+        raise KeyError("Matplotlib is needed for the plotting functionailites.")
+
+    if ax is None:
+        fig, ax = plt.subplots(figsize=(6, 5))
+    else:
+        fig = ax.get_figure()
+
+    summed = np.zeros_like(timings["total"])
+    for key, val in timings.items():
+        ax.plot(val, lw=1, marker="o", markevery=(1, 10), markersize=5, label=key)
+
+        if key == "total":
+            mean_total = np.mean(val)
+            ax.set_ylim(0, mean_total)
+        else:
+            if None not in val:
+                summed += np.array(val)
+
+    # ax.plot(summed, lw=1, ls="--", color="grey", label="summed")
+
+    ax.set_xlabel("Iteration")
+    ax.set_ylabel("Runtime [sec]")
+    ax.grid(axis="both", lw=0.2, ls="--", zorder=-10)
+    fig.legend(loc="upper right")
+
+    return fig, ax
+
+
+def plot_metrics(metrics, log_every, ax=None):
+    try:
+        import matplotlib.pyplot as plt
+    except ModuleNotFoundError:
+        raise KeyError("Matplotlib is needed for the plotting functionailites.")
+
+    fig, ax = plt.subplots(figsize=(6, 5))
+
+    for key, val in metrics.items():
+        ax.plot(
+            np.arange(len(val)) * log_every,
+            val,
+            lw=2,
+            marker="o",
+            markevery=(-1, len(val)),
+            markersize=7,
+            label=key,
+        )
+
+    ax.set_xlabel("Iteration")
+    ax.set_ylabel("Value")
+    ax.set_yscale("log")
+    ax.grid(axis="both", lw=0.2, ls="--", zorder=-10)
+    fig.legend(loc="upper right")
+
+    return fig, ax
diff --git a/src/ncopt/sqpgs/README.md b/src/ncopt/sqpgs/README.md
new file mode 100644
index 0000000..55c00d5
--- /dev/null
+++ b/src/ncopt/sqpgs/README.md
@@ -0,0 +1,71 @@
+# The SQP-GS Algorithm
+
+SQP-GS is a method for solving nonsmooth, nonconvex constrained optimization problems. It has been proposed in [1]. 
+As the name suggests, it combines Sequential Quadratic Programming (SQP) with Gradient Sampling (GS) to handle nonconvexity as well as nonsmoothness. 
+
+An example for how to call the SQP-GS solver:
+
+```python
+from ncopt.sqpgs import SQPGS
+
+problem = SQPGS(f, gI, gE, x0=None, tol=1e-6, max_iter=100, verbose=True)
+x = problem.solve()
+```
+
+## How SQP-GS works in short
+
+Below we briefly describe the SQP-GS algorithm. **For more details on the algorithm, we refer to the paper [1].** The iteration cost of the algorithm splits mainly into two steps:
+
+1) Evaluate and compute gradient/Jacobian for each function at multiple points in a neighborhood of the current iterate.
+
+2) Approximate the original problem by a quadratic program (QP). Solve this QP to compute the update direction. 
+
+The technique in 1. is called Gradient Sampling (GS) and is a widely used, robust technique for handling nonsmooth objective or constraint functions. 
+As all functions are Pytorch modules in this package, this amounts to **batch evaluation and Jacobian computation**. This can be done efficiently using the `autograd` functionalities of Pytorch.
+
+For 2., we solve the QP with the package `osqp`. We also implement a general interface to `cvxpy`, which seems slightly slower due to overhead costs, but more flexible as the solver can be exchanged easily. 
+Further, the quadratic approximation of SQP naturally involves an approximation of the Hessian, which is done in L-BFGS style.
+
+
+
+### Options
+
+SQP-GS has many hyperparameters, for all of which we use the default values from the paper [1] and the Matlab code [2]. The values can be controlled via the argument `options`, which can be a dictionary with all values that need to be overwritten. See the default values [here](defaults.py).
+
+One important hyperparameter is the number of sample points, see the next section for details.
+
+The starting point can be specified with the argument `x0`, and we use the zero vector if not specified.
+
+### Gradient Sampling with `autograd`
+
+For the objective and each constraint function, in every iteration, a number of points is sampled in a neighborhood to the currrent iterate. The function is then evaluated at those points, plus at the iterate itself, and the Jacobian of the function is computed at all points.
+This is done with the Pytorch `autograd` functionalities `jacrev` and `vmap`. See the function [`compute_value_and_jacobian`](main.py#L435).
+
+* If a function is differentiable, setting `is_differentiable=True` when calling `ObjectiveOrConstraint` will have the effect that for this function only the current iterate itself is evaluated. 
+* The authors of [1] suggest to set the number of sample points to roughly 2 times the (effective!) input dimension of each function. See end of section 4 in [1]. As this is hard to implement by default, we recommend to set the number of points manually. This can be done by adjusting (before calling `problem.solve()`) the values `problem.p0` (number of points for objective), `problem.pI_` (number of points for each inequality constraint) and `problem.pE_` (number of points for each equality constraint).
+
+### Solving the QP
+
+Solving the QP will likely amount to most of the runtime per iteration. There are two options for solving the QP:
+
+1. (Default): Use the `osqp` solver [4]. This calls directly the `osqp` solver after constructing the subproblem. See the [implementation](osqp_subproblem.py).
+2. Use `cvxpy` [3]. This creates the subproblem with `cvxpy` and then solves the QP with the specified solver. This options has some overhead costs, but it allows to exchange the solver flexibly. See the [implementation](cvxpy_subproblem.py).  
+
+The authors of [1] report that MOSEK works well for their numerical examples, but here we use `osqp` by default.
+
+
+### Further notes
+
+
+* For numerical stability, we add a Tikhonov regularization to the approximate Hessian. It's magnitude can be controlled over the key `reg_H` in `options`.
+
+## References
+[1] Frank E. Curtis and Michael L. Overton, A sequential quadratic programming algorithm for nonconvex, nonsmooth constrained optimization, 
+SIAM Journal on Optimization 2012 22:2, 474-500, https://doi.org/10.1137/090780201.
+
+[2] Frank E. Curtis and Michael L. Overton, MATLAB implementation of SLQP-GS, https://coral.ise.lehigh.edu/frankecurtis/software/.
+
+[3] Steven Diamond and Stephen Boyd, CVXPY: A Python-embedded modeling language for convex optimization, Journal of Machine Learning Research 2016, https://www.cvxpy.org/.
+
+[4] Bartolomeo Stellato and Goran Banjac and Paul Goulart and Alberto Bemporad and Stephen Boyd, OSQP: an operator splitting solver for quadratic programs, Mathematical Programming Computation 2020, https://osqp.org/docs/index.html.
+
diff --git a/src/ncopt/sqpgs/cvxpy_subproblem.py b/src/ncopt/sqpgs/cvxpy_subproblem.py
new file mode 100644
index 0000000..d7496b9
--- /dev/null
+++ b/src/ncopt/sqpgs/cvxpy_subproblem.py
@@ -0,0 +1,177 @@
+from typing import List
+
+import cvxpy as cp
+import numpy as np
+
+from ncopt.sqpgs.defaults import DEFAULT_OPTION
+
+CVXPY_SOLVER_DICT = {
+    "osqp-cvxpy": cp.OSQP,
+    "clarabel": cp.CLARABEL,
+    "cvxopt": cp.CVXOPT,
+    "mosek": cp.MOSEK,
+    "gurobi": cp.GUROBI,
+}
+
+
+class CVXPYSubproblemSQPGS:
+    def __init__(
+        self,
+        dim: int,
+        p0: np.ndarray,
+        pI: np.ndarray,
+        pE: np.ndarray,
+        assert_tol: float,
+        solver: str = DEFAULT_OPTION.qp_solver,
+    ) -> None:
+        """
+        dim : solution space dimension
+        p0 : number of sample points for f (excluding x_k itself)
+        pI : array, number of sample points for inequality constraint (excluding x_k itself)
+        pE : array, number of sample points for equality constraint (excluding x_k itself)
+        """
+
+        self.dim = dim
+        self.p0 = p0
+        self.pI = pI
+        self.pE = pE
+
+        self.assert_tol = assert_tol
+
+        self.d = cp.Variable(self.dim)
+        self._problem = None
+        self._qp_solver = CVXPY_SOLVER_DICT.get(solver, cp.OSQP)
+
+    @property
+    def nI(self) -> int:
+        return len(self.pI)
+
+    @property
+    def nE(self) -> int:
+        return len(self.pE)
+
+    @property
+    def has_ineq_constraints(self) -> bool:
+        return self.nI > 0
+
+    @property
+    def has_eq_constraints(self) -> bool:
+        return self.nE > 0
+
+    @property
+    def problem(self) -> cp.Problem:
+        assert self._problem is not None, "Problem not yet initialized."
+        return self._problem
+
+    @property
+    def status(self) -> str:
+        return self.problem.status
+
+    @property
+    def objective_val(self) -> float:
+        return self.problem.value
+
+    @property
+    def setup_time(self) -> float:
+        return self.problem.solver_stats.setup_time
+
+    @property
+    def solve_time(self) -> float:
+        return self.problem.solver_stats.solve_time
+
+    def solve(
+        self,
+        L: np.ndarray,
+        rho: float,
+        D_f: np.ndarray,
+        D_gI: List[np.ndarray],
+        D_gE: List[np.ndarray],
+        f_k: float,
+        gI_k: np.ndarray,
+        gE_k: np.ndarray,
+    ) -> None:
+        """
+        This solves the quadratic program
+
+        Parameters
+
+        L : array
+            Cholesky factor of Hessian approximation
+        rho : float
+            parameter
+        D_f : array
+            gradient of f at the sampled points
+        D_gI : list
+            j-th element is the gradient array of c^j at the sampled points.
+        D_gE : list
+            j-th element is the gradient array of h^j at the sampled points.
+        f_k : float
+            evaluation of f at x_k.
+        gI_k : array
+            evaluation of inequality constraints at x_k.
+        gE_k : array
+            evaluation of equality constraints at x_k.
+
+        Updates
+        self.d: Variable
+            search direction
+
+        self.lambda_f: array
+            KKT multipier for objective.
+
+        self.lambda_gE: list
+            KKT multipier for equality constraints.
+
+        self.lambda_gI: list
+            KKT multipier for inequality constraints.
+
+        """
+
+        d = cp.Variable(self.dim)
+        z = cp.Variable()
+        if self.has_ineq_constraints:
+            r_I = cp.Variable(gI_k.size, nonneg=True)
+        if self.has_eq_constraints:
+            r_E = cp.Variable(gE_k.size, nonneg=True)
+
+        objective = rho * z + (1 / 2) * cp.sum_squares(L.T @ d)
+
+        obj_constraint = f_k + D_f @ d <= z
+        constraints = [obj_constraint]
+
+        if self.has_ineq_constraints:
+            ineq_constraints = [gI_k[j] + D_gI[j] @ d <= r_I[j] for j in range(self.nI)]
+            constraints += ineq_constraints
+            objective = objective + cp.sum(r_I)
+
+        if self.has_eq_constraints:
+            eq_constraints_plus = [gE_k[j] + D_gE[j] @ d <= r_E[j] for j in range(self.nE)]
+            eq_constraints_neg = [gE_k[j] + D_gE[j] @ d >= r_E[j] for j in range(self.nE)]
+            constraints += eq_constraints_plus + eq_constraints_neg
+            objective = objective + cp.sum(r_E)
+
+        problem = cp.Problem(cp.Minimize(objective), constraints)
+        problem.solve(solver=self._qp_solver, verbose=False)
+
+        assert problem.status in {cp.OPTIMAL, cp.OPTIMAL_INACCURATE}
+        self._problem = problem
+
+        # Extract primal solution
+        self.d = d.value.copy()
+
+        # Extract dual variables
+        duals = problem.solution.dual_vars
+        self.lambda_f = duals[obj_constraint.id]
+
+        if self.has_ineq_constraints:
+            self.lambda_gI = [duals[c.id] for c in ineq_constraints]
+        else:
+            self.lambda_gI = []
+
+        if self.has_eq_constraints:
+            self.lambda_gE = [
+                duals[c_plus.id] - duals[c_neg.id]
+                for c_plus, c_neg in zip(eq_constraints_plus, eq_constraints_neg)
+            ]
+        else:
+            self.lambda_gE = []
diff --git a/src/ncopt/sqpgs/defaults.py b/src/ncopt/sqpgs/defaults.py
index 783ac0a..0c880bb 100644
--- a/src/ncopt/sqpgs/defaults.py
+++ b/src/ncopt/sqpgs/defaults.py
@@ -29,9 +29,11 @@ class Dotdict(dict):
     "xi_y": 1e3,
     "xi_sy": 1e-6,
     "iter_H": 10,
-    "num_points_obj": 2,
-    "num_points_gI": 3,
+    "num_points_obj": 4,
+    "num_points_gI": 4,
     "num_points_gE": 4,
+    "qp_solver": "osqp",
+    "reg_H": 1e-03,
 }
 
 DEFAULT_ARG = Dotdict(_defaults)
diff --git a/src/ncopt/sqpgs/main.py b/src/ncopt/sqpgs/main.py
index cbf81f2..fea0f23 100644
--- a/src/ncopt/sqpgs/main.py
+++ b/src/ncopt/sqpgs/main.py
@@ -1,6 +1,4 @@
 """
-@author: Fabian Schaipp
-
 Implements the SQP-GS algorithm from
 
     Frank E. Curtis and Michael L. Overton, A sequential quadratic programming
@@ -12,21 +10,25 @@
 
 import copy
 import time
-from typing import Optional
+from typing import List, Optional, Tuple, Union
 
-import cvxpy as cp
 import numpy as np
+import torch
 
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.plot_utils import plot_metrics, plot_timings
+from ncopt.sqpgs.cvxpy_subproblem import CVXPYSubproblemSQPGS
 from ncopt.sqpgs.defaults import DEFAULT_ARG, DEFAULT_OPTION
-from ncopt.utils import get_logger
+from ncopt.sqpgs.osqp_subproblem import OSQPSubproblemSQPGS
+from ncopt.utils import compute_batch_jacobian_vmap, get_logger
 
 
 class SQPGS:
     def __init__(
         self,
-        f,
-        gI,
-        gE,
+        f: ObjectiveOrConstraint,
+        gI: List[ObjectiveOrConstraint],
+        gE: List[ObjectiveOrConstraint],
         x0: Optional[np.array] = None,
         tol: float = DEFAULT_ARG.tol,
         max_iter: int = DEFAULT_ARG.max_iter,
@@ -36,6 +38,47 @@ def __init__(
         store_history: bool = DEFAULT_ARG.store_history,
         log_every: int = DEFAULT_ARG.log_every,
     ) -> None:
+        """The problem object for using the SQP-GS method.
+
+        Parameters
+        ----------
+        f : ObjectiveOrConstraint
+            The objective funtion. Argument ``dim`` must be specified.
+            Output must be of shape ``(batch_size, 1)``.
+        gI : List[ObjectiveOrConstraint]
+            List of inequality constraint functions.
+            Argument ``dim_out`` must be specified for each.
+            Output must be of shape ``(batch_size, dim_out)``.
+            Pass empty list for no inequality constraints.
+        gE : List[ObjectiveOrConstraint]
+            List of equality constraint functions.
+            Argument ``dim_out`` must be specified for each.
+            Output must be of shape ``(batch_size, dim_out)``.
+            Pass empty list for no equality constraints.
+        x0 : Optional[np.array], optional
+            Starting point, by default vector of zeros.
+        tol : float, optional
+            Tolerance for stopping, measured by E_k in [Curtis and Overton, 2012].
+            By default DEFAULT_ARG.tol.
+        max_iter : int, optional
+            Maximum number of iterations, by default DEFAULT_ARG.max_iter.
+        verbose : bool, optional
+            Whether to print status updates, by default DEFAULT_ARG.verbose.
+        options : dict, optional
+            Dictionary with options, by default {}.
+            All specified entries will overwrite the default value.
+            See ``./defaults.py`` for possible key names and default values.
+        assert_tol : float, optional
+            Assertion tolerance used in mathematical checks, by default DEFAULT_ARG.assert_tol.
+            You can avoid errors raised by assertions by increasing this number.
+            Note that in that case, the algorihtm might be not producing correct output.
+        store_history : bool, optional
+            Whether to store the history of iterates in every iteration, by default False.
+            Only use this for small examples, as it might produce memory overflow.
+        log_every : int, optional
+            Frequency of status updates (if ``verbose=True``) by default DEFAULT_ARG.log_every.
+
+        """
         if tol < 0:
             raise ValueError(f"Tolerance must be non-negative, but was specified as {tol}.")
         if max_iter < 0:
@@ -69,9 +112,16 @@ def __init__(
         # Extract dimensions
 
         # extract dimensions of constraints
+        if not hasattr(f, "dim"):
+            raise KeyError(
+                "Input dimension needs to be specified for the objective function \
+                           Make sure to pass `dim` when initializing the \
+                            `ObjectiveOrConstraint` object."
+            )
+
         self.dim = self.f.dim
-        self.dimI = np.array([g.dimOut for g in self.gI], dtype=int)
-        self.dimE = np.array([g.dimOut for g in self.gE], dtype=int)
+        self.dimI = np.array([g.dim_out for g in self.gI], dtype=int)
+        self.dimE = np.array([g.dim_out for g in self.gE], dtype=int)
 
         self.nI_ = len(self.gI)  # number of inequality function objects
         self.nE_ = len(self.gE)  # number of equality function objects
@@ -79,6 +129,8 @@ def __init__(
         self.nI = sum(self.dimI)  # number of inequality costraints
         self.nE = sum(self.dimE)  # number of equality costraints
 
+        # Construct number of sample point arrays
+        self.p0, self.pI_, self.pE_ = self._init_sample_points()
         ###############################################################
         # Initialize
 
@@ -91,14 +143,81 @@ def __init__(
         else:
             self.x_k = x0.copy()
 
-    def solve(self):
+    def _init_sample_points(self):
+        # sample points for objective
+        p0 = self.options["num_points_obj"] if not self.f.is_differentiable else 0
+
+        pI_ = self.options["num_points_gI"] * np.ones(
+            self.nI_, dtype=int
+        )  # sample points for ineq constraint
+        pE_ = self.options["num_points_gE"] * np.ones(
+            self.nE_, dtype=int
+        )  # sample points for eq constraint
+
+        # if function is differentiable, set to zero here --> only sample at x_k
+        _is_differentiable_I = [g.is_differentiable for g in self.gI]
+        _is_differentiable_E = [g.is_differentiable for g in self.gE]
+
+        pI_[_is_differentiable_I] = 0
+        pE_[_is_differentiable_E] = 0
+
+        return p0, pI_, pE_
+
+    def plot_timings(self, ax=None):
+        """Plotting the timings of each algorithm step, over the course of iterations.
+
+        Parameters
+        ----------
+        ax : matplotlib.axes._axes.Axes, optional
+            Axis to plot on, by default None.
+            If not specified, a new figure and axis will be created.
+
+        Returns
+        -------
+        fig: matplotlib.figure.Figure
+            Figure object of the plot.
+        ax: matplotlib.axes._axes.Axes
+            Axis object of the plot
+        """
+        fig, ax = plot_timings(self.info["timings"], ax=ax)
+        return fig, ax
+
+    def plot_metrics(self, ax=None):
+        """Plotting several metrics of each logged step, over the course of iterations.
+
+        Parameters
+        ----------
+        ax : matplotlib.axes._axes.Axes, optional
+            Axis to plot on, by default None.
+            If not specified, a new figure and axis will be created.
+
+        Returns
+        -------
+        fig: matplotlib.figure.Figure
+            Figure object of the plot.
+        ax: matplotlib.axes._axes.Axes
+            Axis object of the plot
+        """
+        fig, ax = plot_metrics(self.info["metrics"], self.log_every, ax=ax)
+        return fig, ax
+
+    def solve(self) -> np.ndarray:
+        """Method for solving the problem.
+        All metrics and timings will be stored in ``self.info``.
+
+        Note that repeatedly calling this method will use the last iterate as new starting point.
+
+        Returns
+        -------
+        np.ndarray
+            The final iterate (also stored in ``self.x_k``).
+        """
         ###############################################################
         # Set all hyperparameters
 
         eps = self.options["eps"]  # sampling radius
         rho = self.options["rho"]
         theta = self.options["theta"]
-
         eta = self.options["eta"]
         gamma = self.options["gamma"]
         beta_eps = self.options["beta_eps"]
@@ -110,19 +229,18 @@ def solve(self):
         xi_sy = self.options["xi_sy"]
         iter_H = self.options["iter_H"]
 
-        p0 = self.options["num_points_obj"]  # sample points for objective
-        pI_ = self.options["num_points_gI"] * np.ones(
-            self.nI_, dtype=int
-        )  # sample points for ineq constraint
-        pE_ = self.options["num_points_gE"] * np.ones(
-            self.nE_, dtype=int
-        )  # sample points for eq constraint
-
-        pI = np.repeat(pI_, self.dimI)
-        pE = np.repeat(pE_, self.dimE)
         ###############################################################
+        # helper arrays for subproblem
+        p0 = self.p0
+        pI = np.repeat(self.pI_, self.dimI)
+        pE = np.repeat(self.pE_, self.dimE)
 
-        self.SP = SubproblemSQPGS(self.dim, p0, pI, pE, self.assert_tol)
+        if self.options["qp_solver"] == "osqp":
+            self.SP = OSQPSubproblemSQPGS(self.dim, p0, pI, pE, self.assert_tol)
+        else:
+            self.SP = CVXPYSubproblemSQPGS(
+                self.dim, p0, pI, pE, self.assert_tol, self.options["qp_solver"]
+            )
 
         E_k = np.inf  # for stopping criterion
         x_kmin1 = None  # last iterate
@@ -136,9 +254,21 @@ def solve(self):
         do_step = False
 
         x_hist = [self.x_k] if self.store_history else None
-        self.timings = {"total": [], "sp_update": [], "sp_solve": []}
+        timings = {
+            "total": [],
+            "sample_and_grad": [],
+            "subproblem": [],
+            "step": [],
+            "other": [],
+        }
+        metrics = {
+            "objective": [],
+            "constraint_violation": [],
+            "accuracy": [],
+            "sampling_radius": [],
+        }
         ##############################################
-        # START OF LOOP
+        # Start of loop
         ##############################################
         for iter_k in range(self.max_iter):
             t0 = time.perf_counter()
@@ -147,58 +277,79 @@ def solve(self):
                 break
 
             ##############################################
-            # SAMPLING
+            # Sampling
             ##############################################
-            B_f = sample_points(self.x_k, eps, p0)
-            B_f = np.vstack((self.x_k, B_f))
+            B_f = sample_points(self.x_k, eps, p0, stack_x=True)
 
             B_gI = list()
             for j in np.arange(self.nI_):
-                B_j = sample_points(self.x_k, eps, pI_[j])
-                B_j = np.vstack((self.x_k, B_j))
+                B_j = sample_points(self.x_k, eps, pI[j], stack_x=True)
                 B_gI.append(B_j)
 
             B_gE = list()
             for j in np.arange(self.nE_):
-                B_j = sample_points(self.x_k, eps, pE_[j])
-                B_j = np.vstack((self.x_k, B_j))
+                B_j = sample_points(self.x_k, eps, pE[j], stack_x=True)
                 B_gE.append(B_j)
 
             ####################################
-            # COMPUTE GRADIENTS AND EVALUATE
+            # Compute gradients and evaluate
             ###################################
-            D_f = compute_gradients(self.f, B_f)[0]  # returns list, always has one element
+            D_f, V_f = compute_value_and_jacobian(self.f, B_f, as_numpy=True, split_jac=False)
+            assert V_f.shape[1] == 1, "Objective must be a scalar function."
+            f_k = V_f[0, 0]  # convert to float
+            # squeeze output dimension
+            D_f = D_f.squeeze(axis=1)
 
-            D_gI = list()
+            D_gI, V_gI = list(), list()
             for j in np.arange(self.nI_):
-                D_gI += compute_gradients(self.gI[j], B_gI[j])
+                this_jac, this_out = compute_value_and_jacobian(
+                    self.gI[j], B_gI[j], as_numpy=True, split_jac=True
+                )
+                D_gI += this_jac
+                V_gI.append(this_out)
 
-            D_gE = list()
+            D_gE, V_gE = list(), list()
             for j in np.arange(self.nE_):
-                D_gE += compute_gradients(self.gE[j], B_gE[j])
+                this_jac, this_out = compute_value_and_jacobian(
+                    self.gE[j], B_gE[j], as_numpy=True, split_jac=True
+                )
+                D_gE += this_jac
+                V_gE.append(this_out)
 
-            f_k = self.f.eval(self.x_k)
+            # Get value at x_k
             # hstack cannot handle empty lists!
             if self.nI_ > 0:
-                gI_k = np.hstack([self.gI[j].eval(self.x_k) for j in range(self.nI_)])
+                gI_k = np.hstack([v[0] for v in V_gI])
             else:
                 gI_k = np.array([])
 
             if self.nE_ > 0:
-                gE_k = np.hstack([self.gE[j].eval(self.x_k) for j in range(self.nE_)])
+                gE_k = np.hstack([v[0] for v in V_gE])
             else:
                 gE_k = np.array([])
 
             ##############################################
-            # SUBPROBLEM
+            # Subproblem solve
             ##############################################
+            _reg_H = self.options["reg_H"]
+            t1 = time.perf_counter()
+            if isinstance(self.SP, OSQPSubproblemSQPGS):
+                self.SP.solve(H + _reg_H * np.eye(self.dim), rho, D_f, D_gI, D_gE, f_k, gI_k, gE_k)
+            else:
+                self.SP.solve(
+                    np.linalg.cholesky(H + _reg_H * np.eye(self.dim)),
+                    rho,
+                    D_f,
+                    D_gI,
+                    D_gE,
+                    f_k,
+                    gI_k,
+                    gE_k,
+                )
 
-            self.SP.solve(H, rho, D_f, D_gI, D_gE, f_k, gI_k, gE_k)
-
-            self.timings["sp_update"].append(self.SP.setup_time)
-            self.timings["sp_solve"].append(self.SP.solve_time)
+            d_k = self.SP.d.copy()
+            t2 = time.perf_counter()
 
-            d_k = self.SP.d.value.copy()
             # compute g_k from paper
             g_k = (
                 self.SP.lambda_f @ D_f
@@ -210,7 +361,6 @@ def solve(self):
             v_k = np.maximum(gI_k, 0).sum() + np.sum(np.abs(gE_k))
             phi_k = rho * f_k + v_k
             delta_q = phi_k - q_rho(d_k, rho, H, f_k, gI_k, gE_k, D_f, D_gI, D_gE)
-
             assert (
                 delta_q >= -self.assert_tol
             ), f"Value is supposed to be non-negative, but is {delta_q}."
@@ -219,25 +369,30 @@ def solve(self):
             ), f"Value is supposed to be negative, but is {np.abs(self.SP.lambda_f.sum() - rho)}."
 
             # Logging, start after first iteration
-            if iter_k % self.log_every == 1:
+            if iter_k % self.log_every == 0:
                 violI_k = np.maximum(gI_k, 0)
                 violE_k = np.abs(gE_k)
                 viol_k = np.max(np.hstack((violI_k, violE_k)))
+
                 self.logger.info(
-                    f"Iter {iter_k}, objective {f_k:.3E}, constraint violation {viol_k:.3E}, \
-                    accuracy {E_k:.3E}, \
-                    avg runtime/iter {(1e3) * np.mean(self.timings['total']):.3E} ms."
+                    f"Iter {iter_k}, objective {f_k:.3E}, constraint violation {viol_k:.3E}, "
+                    + f"accuracy {E_k:.3E}, "
+                    + f"avg runtime/iter {(1e3) * np.mean(timings['total']):.3E} ms."
                 )
+                metrics["objective"].append(f_k)
+                metrics["constraint_violation"].append(viol_k)
+                metrics["accuracy"].append(E_k)
+                metrics["sampling_radius"].append(eps)
 
             new_E_k = stop_criterion(
-                self.gI, self.gE, g_k, self.SP, gI_k, gE_k, B_gI, B_gE, self.nI_, self.nE_, pI, pE
+                self.gI, self.gE, g_k, self.SP, gI_k, gE_k, V_gI, V_gE, self.nI_, self.nE_
             )
             E_k = min(E_k, new_E_k)
 
             ##############################################
-            # STEP
+            # Step
             ##############################################
-
+            t3 = time.perf_counter()
             do_step = delta_q > nu * eps**2  # Flag whether step is taken or not
             if do_step:
                 alpha = 1.0
@@ -258,7 +413,7 @@ def solve(self):
                     y_hist = np.roll(y_hist, 1, axis=1)
                     y_hist[:, 0] = y_k
 
-                    hH = np.eye(self.dim)
+                    H = np.eye(self.dim)
                     for l in np.arange(iter_H):
                         sl = s_hist[:, l]
                         yl = y_hist[:, l]
@@ -270,17 +425,15 @@ def solve(self):
                         cond = cond1 and cond2
 
                         if cond:
-                            Hs = hH @ sl
-                            hH = (
-                                hH
+                            Hs = H @ sl
+                            H = (
+                                H
                                 - np.outer(Hs, Hs) / (sl @ Hs + 1e-16)
                                 + np.outer(yl, yl) / (yl @ sl + 1e-16)
                             )
 
-                    H = hH.copy()
-
                 ####################################
-                # ACTUAL STEP
+                # Actual step
                 ###################################
                 x_kmin1 = self.x_k.copy()
                 g_kmin1 = g_k.copy()
@@ -288,7 +441,7 @@ def solve(self):
                 self.x_k = self.x_k + alpha * d_k
 
             ##############################################
-            # NO STEP
+            # No step
             ##############################################
             else:
                 if v_k <= theta:
@@ -300,14 +453,19 @@ def solve(self):
 
             if self.store_history:
                 x_hist.append(self.x_k)
-            t1 = time.perf_counter()
-            self.timings["total"].append(t1 - t0)
+
+            t4 = time.perf_counter()
+            timings["total"].append(t4 - t0)
+            timings["sample_and_grad"].append(t1 - t0)
+            timings["subproblem"].append(t2 - t1)
+            timings["other"].append(t3 - t2)
+            timings["step"].append(t4 - t3)
 
         ##############################################
-        # END OF LOOP
+        # End of loop
         ##############################################
         self.x_hist = np.vstack(x_hist) if self.store_history else None
-
+        self.info = {"timings": timings, "metrics": metrics, "log_metrics_every": self.log_every}
         if E_k > self.tol:
             self.status = "max iterations reached"
 
@@ -318,17 +476,80 @@ def solve(self):
         return self.x_k
 
 
-def sample_points(x, eps, N):
-    """
-    sample N points uniformly distributed in eps-ball around x
+def sample_points(x: np.ndarray, eps: float, n_points: int, stack_x: bool = True) -> torch.Tensor:
+    """Sample ``n_points`` uniformly from the ``eps``-ball around ``x``.
+
+    Parameters
+    ----------
+    x : np.ndarray
+        The centre of the ball.
+    eps : float
+        Sampling radius.
+    n_points : int
+        Number of sampled points.
+    stack_x : bool
+        Whether to stack ``x`` itself at the top. Default true.
+    Returns
+    -------
+    torch.Tensor
+        Shape (n_points, len(x)).
     """
     dim = len(x)
-    U = np.random.randn(N, dim)
-    norm_U = np.linalg.norm(U, axis=1)
-    R = np.random.rand(N) ** (1 / dim)
-    Z = eps * (R / norm_U)[:, np.newaxis] * U
+    if n_points == 0:
+        # return only x
+        X = torch.zeros(1, dim)
+    else:
+        U = torch.randn(n_points, dim)
+        norm_U = torch.linalg.norm(U, axis=1)
+        R = torch.rand(n_points) ** (1 / dim)
+        X = eps * (R / norm_U).reshape(-1, 1) * U
 
-    return x + Z
+        if stack_x:
+            X = torch.vstack((torch.zeros(1, dim), X))
+
+    X += torch.from_numpy(x).reshape(1, -1)
+    return X
+
+
+def compute_value_and_jacobian(
+    f: ObjectiveOrConstraint, X: torch.Tensor, as_numpy: bool = True, split_jac: bool = True
+) -> Tuple[Union[torch.tensor, np.ndarray], Union[torch.tensor, np.ndarray]]:
+    """Evaluates function value and Jacobian for a batched input ``X``.
+
+
+    Parameters
+    ----------
+    f : ObjectiveOrConstraint
+        The function to evaluate. Should map from ``dim`` to R^m (with m integer)
+    X : torch.Tensor
+        Input points. Should be of shape ``(batch_size, dim)``.
+    as_numpy : bool, optional
+        Whether to convert Jacobian into numpy array, by default True
+    split_jac : bool, optional
+        Whether to split Jacobian into a list, by default True.
+        The splitting happens wrt the output dimension.
+
+    Returns
+    -------
+    Tuple[Union[torch.tensor, np.ndarray], Union[torch.tensor, float]]
+        Jacobian of shape (batch_size, dim), and output values of shape (batch_size, dim_out).
+    """
+    # Observation:
+    # The jacobian is still computed correctly for compute_batch_jacobian_vmap,
+    # even if the model would output a tensor of shape (b,) instead of (b,1)
+    # We could make the solver more flexible to handle this case, but then might
+    # lose the ability to switch out compute_batch_jacobian_vmap
+
+    jac, out = compute_batch_jacobian_vmap(f, X)
+
+    if as_numpy:
+        jac = jac.detach().cpu().numpy()
+        out = out.detach().cpu().numpy()
+
+    if split_jac:
+        jac = [jac[:, j, :] for j in range(jac.shape[1])]
+
+    return jac, out
 
 
 def q_rho(d, rho, H, f_k, gI_k, gE_k, D_f, D_gI, D_gE):
@@ -347,23 +568,23 @@ def q_rho(d, rho, H, f_k, gI_k, gE_k, D_f, D_gI, D_gE):
 
 
 def phi_rho(x, f, gI, gE, rho):
-    term1 = rho * f.eval(x)
+    term1 = rho * f.single_eval(x).squeeze()  # want a float here
 
     # inequalities
     if len(gI) > 0:
-        term2 = np.sum(np.hstack([np.maximum(gI[j].eval(x), 0) for j in range(len(gI))]))
+        term2 = np.sum(np.hstack([np.maximum(gI[j].single_eval(x), 0) for j in range(len(gI))]))
     else:
         term2 = 0
     # equalities: max(x,0) + max(-x,0) = abs(x)
     if len(gE) > 0:
-        term3 = np.sum(np.hstack([np.abs(gE[l].eval(x)) for l in range(len(gE))]))
+        term3 = np.sum(np.hstack([np.abs(gE[l].single_eval(x)) for l in range(len(gE))]))
     else:
         term3 = 0
 
     return term1 + term2 + term3
 
 
-def stop_criterion(gI, gE, g_k, SP, gI_k, gE_k, B_gI, B_gE, nI_, nE_, pI, pE):
+def stop_criterion(gI, gE, g_k, SP, gI_k, gE_k, V_gI, V_gE, nI_, nE_):
     """
     computes E_k in the paper
     """
@@ -375,7 +596,8 @@ def stop_criterion(gI, gE, g_k, SP, gI_k, gE_k, B_gI, B_gE, nI_, nE_, pI, pE):
 
     gI_vals = list()
     for j in np.arange(nI_):
-        gI_vals += eval_ineq(gI[j], B_gI[j])
+        V = V_gI[j]
+        gI_vals += [V[:, i] for i in range(gI[j].dim_out)]
 
     val4 = -np.inf
     for j in np.arange(len(gI_vals)):
@@ -383,204 +605,11 @@ def stop_criterion(gI, gE, g_k, SP, gI_k, gE_k, B_gI, B_gE, nI_, nE_, pI, pE):
 
     gE_vals = list()
     for j in np.arange(nE_):
-        gE_vals += eval_ineq(gE[j], B_gE[j])
+        V = V_gE[j]
+        gE_vals += [V[:, i] for i in range(gE[j].dim_out)]
 
     val5 = -np.inf
     for j in np.arange(len(gE_vals)):
         val5 = np.maximum(val5, np.max(SP.lambda_gE[j] * gE_vals[j]))
 
     return np.max(np.array([val1, val2, val3, val4, val5]))
-
-
-def eval_ineq(fun, X):
-    """
-    evaluate function at multiple inputs
-    needed in stop_criterion
-
-    Returns
-    -------
-    list of array, number of entries = fun.dimOut
-    """
-    (N, _) = X.shape
-    D = np.zeros((N, fun.dimOut))
-    for i in np.arange(N):
-        D[
-            i,
-            :,
-        ] = fun.eval(X[i, :])
-
-    return [D[:, j] for j in range(fun.dimOut)]
-
-
-def compute_gradients(fun, X):
-    """
-    computes gradients of function object f at all rows of array X
-
-    Returns
-    -------
-    list of 2d-matrices, length of fun.dimOut
-    """
-    (N, dim) = X.shape
-
-    # fun.grad returns Jacobian, i.e. dimOut x dim
-    D = np.zeros((N, fun.dimOut, dim))
-    for i in np.arange(N):
-        D[i, :, :] = fun.grad(X[i, :])
-
-    return [D[:, j, :] for j in np.arange(fun.dimOut)]
-
-
-# %%
-
-
-class SubproblemSQPGS:
-    def __init__(
-        self, dim: int, p0: np.ndarray, pI: np.ndarray, pE: np.ndarray, assert_tol: float
-    ) -> None:
-        """
-        dim : solution space dimension
-        p0 : number of sample points for f (excluding x_k itself)
-        pI : array, number of sample points for inequality constraint (excluding x_k itself)
-        pE : array, number of sample points for equality constraint (excluding x_k itself)
-        """
-
-        self.dim = dim
-        self.p0 = p0
-        self.pI = pI
-        self.pE = pE
-
-        self.assert_tol = assert_tol
-
-        self.d = cp.Variable(self.dim)
-        self._problem = None
-
-    @property
-    def nI(self) -> int:
-        return len(self.pI)
-
-    @property
-    def nE(self) -> int:
-        return len(self.pE)
-
-    @property
-    def has_ineq_constraints(self) -> bool:
-        return self.nI > 0
-
-    @property
-    def has_eq_constraints(self) -> bool:
-        return self.nE > 0
-
-    @property
-    def problem(self) -> cp.Problem:
-        assert self._problem is not None, "Problem not yet initialized."
-        return self._problem
-
-    @property
-    def status(self) -> str:
-        return self.problem.status
-
-    @property
-    def objective_val(self) -> float:
-        return self.problem.value
-
-    @property
-    def setup_time(self) -> float:
-        return self.problem.solver_stats.setup_time
-
-    @property
-    def solve_time(self) -> float:
-        return self.problem.solver_stats.solve_time
-
-    def solve(
-        self,
-        H: np.ndarray,
-        rho: float,
-        D_f: np.ndarray,
-        D_gI: list[np.ndarray],
-        D_gE: list[np.ndarray],
-        f_k: float,
-        gI_k: np.ndarray,
-        gE_k: np.ndarray,
-    ) -> None:
-        """
-        This solves the quadratic program
-
-        Parameters
-
-        H : array
-            Hessian approximation
-        rho : float
-            parameter
-        D_f : array
-            gradient of f at the sampled points
-        D_gI : list
-            j-th element is the gradient array of c^j at the sampled points.
-        D_gE : list
-            j-th element is the gradient array of h^j at the sampled points.
-        f_k : float
-            evaluation of f at x_k.
-        gI_k : array
-            evaluation of inequality constraints at x_k.
-        gE_k : array
-            evaluation of equality constraints at x_k.
-
-        Updates
-        self.d: Variable
-            search direction
-
-        self.lambda_f: array
-            KKT multipier for objective.
-
-        self.lambda_gE: list
-            KKT multipier for equality constraints.
-
-        self.lambda_gI: list
-            KKT multipier for inequality constraints.
-
-        """
-
-        d = self.d
-        z = cp.Variable()
-        if self.has_ineq_constraints:
-            r_I = cp.Variable(gI_k.size, nonneg=True)
-        if self.has_eq_constraints:
-            r_E = cp.Variable(gE_k.size, nonneg=True)
-
-        objective = rho * z + (1 / 2) * cp.quad_form(d, H)
-
-        obj_constraint = f_k + D_f @ d <= z
-        constraints = [obj_constraint]
-
-        if self.has_ineq_constraints:
-            ineq_constraints = [gI_k[j] + D_gI[j] @ d <= r_I[j] for j in range(self.nI)]
-            constraints += ineq_constraints
-            objective = objective + cp.sum(r_I)
-
-        if self.has_eq_constraints:
-            eq_constraints_plus = [gE_k[j] + D_gE[j] @ d <= r_E[j] for j in range(self.nE)]
-            eq_constraints_neg = [gE_k[j] + D_gE[j] @ d >= r_E[j] for j in range(self.nE)]
-            constraints += eq_constraints_plus + eq_constraints_neg
-            objective = objective + cp.sum(r_E)
-
-        problem = cp.Problem(cp.Minimize(objective), constraints)
-        problem.solve(solver=cp.CLARABEL)
-
-        assert problem.status in {cp.OPTIMAL, cp.OPTIMAL_INACCURATE}
-        self._problem = problem
-
-        # Extract dual variables
-        duals = problem.solution.dual_vars
-        self.lambda_f = duals[obj_constraint.id]
-
-        if self.has_ineq_constraints:
-            self.lambda_gI = [duals[c.id] for c in ineq_constraints]
-        else:
-            self.lambda_gI = []
-
-        if self.has_eq_constraints:
-            self.lambda_gE = [
-                duals[c_plus.id] - duals[c_neg.id]
-                for c_plus, c_neg in zip(eq_constraints_plus, eq_constraints_neg)
-            ]
-        else:
-            self.lambda_gE = []
diff --git a/src/ncopt/sqpgs/osqp_subproblem.py b/src/ncopt/sqpgs/osqp_subproblem.py
new file mode 100644
index 0000000..524ae64
--- /dev/null
+++ b/src/ncopt/sqpgs/osqp_subproblem.py
@@ -0,0 +1,307 @@
+from typing import List
+
+import numpy as np
+import osqp
+from scipy import sparse
+
+# see: https://osqp.org/docs/interfaces/status_values.html
+OSQP_ALLOWED_STATUS = [
+    "solved",
+    "solved inaccurate",
+    "maximum iterations reached",
+    "run time limit reached",
+]
+
+
+class OSQPSubproblemSQPGS:
+    def __init__(
+        self,
+        dim: int,
+        p0: np.ndarray,
+        pI: np.ndarray,
+        pE: np.ndarray,
+        assert_tol: float,
+    ) -> None:
+        """
+        dim : solution space dimension
+        p0 : number of sample points for f (excluding x_k itself)
+        pI : array, number of sample points for inequality constraint (excluding x_k itself)
+        pE : array, number of sample points for equality constraint (excluding x_k itself)
+        """
+
+        self.dim = dim
+        self.p0 = p0
+        self.pI = pI
+        self.pE = pE
+
+        self.assert_tol = assert_tol
+
+        self._problem = None
+        self.P, self.q, self.inG, self.inh, self.nonnegG, self.nonnegh = self._initialize()
+
+    @property
+    def nI(self) -> int:
+        return len(self.pI)
+
+    @property
+    def nE(self) -> int:
+        return len(self.pE)
+
+    @property
+    def has_ineq_constraints(self) -> bool:
+        return self.nI > 0
+
+    @property
+    def has_eq_constraints(self) -> bool:
+        return self.nE > 0
+
+    @property
+    def problem(self) -> osqp.interface.OSQP:
+        assert self._problem is not None, "Problem not yet initialized."
+        return self._problem
+
+    @property
+    def status(self) -> str:
+        return self._res.info.status
+
+    def _initialize(self):
+        """
+        The quadratic subrpoblem we solve in every iteration is of the form:
+
+        min_y 1/2 * y.T @ P @ y + q.T @ y subject to G @ y <= h
+
+        variable structure: y = (d,z,rI,rE) with
+        d = search direction
+        z = helper variable for objective
+        rI = helper variable for inequality constraints
+        rI = helper variable for equality constraints
+
+        This function initializes the variables P,q,G,h.
+        The entries which change in every iteration are then updated in self.update()
+
+        G and h consist of two parts:
+            1) inG, inh: the inequalities from the paper
+            2) nonnegG, nonnegh: nonnegativity bounds rI >= 0, rE >= 0
+        """
+
+        dimQP = self.dim + 1 + self.nI + self.nE
+
+        P = np.zeros((dimQP, dimQP))
+        q = np.zeros(dimQP)
+
+        inG = np.zeros((1 + self.p0 + np.sum(1 + self.pI) + 2 * np.sum(1 + self.pE), dimQP))
+        inh = np.zeros(1 + self.p0 + np.sum(1 + self.pI) + 2 * np.sum(1 + self.pE))
+
+        # structure of inG (p0+1, sum(1+pI), sum(1+pE), sum(1+pE))
+        inG[: self.p0 + 1, self.dim] = -1
+
+        for j in range(self.nI):
+            inG[
+                self.p0 + 1 + (1 + self.pI)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI)[:j].sum()
+                + self.pI[j]
+                + 1,
+                self.dim + 1 + j,
+            ] = -1
+
+        for j in range(self.nE):
+            inG[
+                self.p0 + 1 + (1 + self.pI).sum() + (1 + self.pE)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE)[:j].sum()
+                + self.pE[j]
+                + 1,
+                self.dim + 1 + self.nI + j,
+            ] = -1
+            inG[
+                self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE).sum()
+                + (1 + self.pE)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE).sum()
+                + (1 + self.pE)[:j].sum()
+                + self.pE[j]
+                + 1,
+                self.dim + 1 + self.nI + j,
+            ] = +1
+
+        # we have nI+nE r-variables
+        nonnegG = np.hstack(
+            (np.zeros((self.nI + self.nE, self.dim + 1)), -np.eye(self.nI + self.nE))
+        )
+        nonnegh = np.zeros(self.nI + self.nE)
+
+        return P, q, inG, inh, nonnegG, nonnegh
+
+    def _update(self, H, rho, D_f, D_gI, D_gE, f_k, gI_k, gE_k):
+        self.P[: self.dim, : self.dim] = H
+        self.q = np.hstack((np.zeros(self.dim), rho, np.ones(self.nI), np.ones(self.nE)))
+
+        self.inG[: self.p0 + 1, : self.dim] = D_f
+        self.inh[: self.p0 + 1] = -f_k
+
+        for j in range(self.nI):
+            self.inG[
+                self.p0 + 1 + (1 + self.pI)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI)[:j].sum()
+                + self.pI[j]
+                + 1,
+                : self.dim,
+            ] = D_gI[j]
+            self.inh[
+                self.p0 + 1 + (1 + self.pI)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI)[:j].sum()
+                + self.pI[j]
+                + 1
+            ] = -gI_k[j]
+
+        for j in range(self.nE):
+            self.inG[
+                self.p0 + 1 + (1 + self.pI).sum() + (1 + self.pE)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE)[:j].sum()
+                + self.pE[j]
+                + 1,
+                : self.dim,
+            ] = D_gE[j]
+            self.inG[
+                self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE).sum()
+                + (1 + self.pE)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE).sum()
+                + (1 + self.pE)[:j].sum()
+                + self.pE[j]
+                + 1,
+                : self.dim,
+            ] = -D_gE[j]
+
+            self.inh[
+                self.p0 + 1 + (1 + self.pI).sum() + (1 + self.pE)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE)[:j].sum()
+                + self.pE[j]
+                + 1
+            ] = -gE_k[j]
+            self.inh[
+                self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE).sum()
+                + (1 + self.pE)[:j].sum() : self.p0
+                + 1
+                + (1 + self.pI).sum()
+                + (1 + self.pE).sum()
+                + (1 + self.pE)[:j].sum()
+                + self.pE[j]
+                + 1
+            ] = gE_k[j]
+
+        return
+
+    def solve(
+        self,
+        H: np.ndarray,
+        rho: float,
+        D_f: np.ndarray,
+        D_gI: List[np.ndarray],
+        D_gE: List[np.ndarray],
+        f_k: float,
+        gI_k: np.ndarray,
+        gE_k: np.ndarray,
+    ) -> None:
+        """
+        This solves the quadratic program.
+
+        Updates
+        self.d: array
+            search direction
+
+        self.lambda_f: array
+            KKT multipier for objective.
+
+        self.lambda_gE: list
+            KKT multipier for equality constraints.
+
+        self.lambda_gI: list
+            KKT multipier for inequality constraints.
+
+        """
+
+        # update params
+        self._update(H, rho, D_f, D_gI, D_gE, f_k, gI_k, gE_k)
+
+        A = np.vstack((self.inG, self.nonnegG))
+        u = np.hstack((self.inh, self.nonnegh))
+        l = np.ones_like(u) * (-np.inf)
+
+        problem = osqp.OSQP()
+
+        problem.setup(
+            P=sparse.csc_matrix(self.P),
+            q=self.q,
+            A=sparse.csc_matrix(A),
+            l=l,
+            u=u,
+            eps_abs=1e-05,
+            eps_rel=1e-05,
+            verbose=False,
+            polish=1,
+        )
+
+        self._res = problem.solve()
+
+        assert (
+            self._res.info.status in OSQP_ALLOWED_STATUS
+        ), f"OSQP results in status {self._res.info.status}."
+        self._problem = problem
+
+        primal_solution = self._res.x
+
+        self.d = primal_solution[: self.dim]
+        self.z = primal_solution[self.dim]
+
+        self.rI = primal_solution[self.dim + 1 : self.dim + 1 + self.nI]
+        self.rE = primal_solution[self.dim + 1 + self.nI :]
+
+        # extract dual variables = KKT multipliers
+        dual_solution = self._res.y
+        lambda_f = dual_solution[: self.p0 + 1]
+
+        lambda_gI = list()
+        for j in np.arange(self.nI):
+            start_ix = self.p0 + 1 + (1 + self.pI)[:j].sum()
+            lambda_gI.append(dual_solution[start_ix : start_ix + 1 + self.pI[j]])
+
+        lambda_gE = list()
+        for j in np.arange(self.nE):
+            start_ix = self.p0 + 1 + (1 + self.pI).sum() + (1 + self.pE)[:j].sum()
+
+            # from ineq with +
+            vec1 = dual_solution[start_ix : start_ix + 1 + self.pE[j]]
+
+            # from ineq with -
+            vec2 = dual_solution[
+                start_ix + (1 + self.pE).sum() : start_ix + (1 + self.pE).sum() + 1 + self.pE[j]
+            ]
+
+            # see Direction.m line 620
+            lambda_gE.append(vec1 - vec2)
+
+        self.lambda_f = lambda_f.copy()
+        self.lambda_gI = lambda_gI.copy()
+        self.lambda_gE = lambda_gE.copy()
+
+        return
diff --git a/src/ncopt/torch_obj.py b/src/ncopt/torch_obj.py
deleted file mode 100755
index 9deb282..0000000
--- a/src/ncopt/torch_obj.py
+++ /dev/null
@@ -1,46 +0,0 @@
-"""
-author: Fabian Schaipp
-"""
-
-import torch
-
-
-class Net:
-    def __init__(self, D, dimOut=None):
-        self.name = "pytorch_Net"
-        self.D = D
-
-        self.D.zero_grad()
-
-        self.dimIn = self.D[0].weight.shape[1]
-
-        # set mode to evaluation
-        self.D.train(False)
-
-        # if type(self.D[-1]) == torch.nn.ReLU:
-        if dimOut is None:
-            print("Caution: output dimension of Net is not specified and derived from last module!")
-            self.dimOut = self.D[-1].weight.shape[0]
-        else:
-            self.dimOut = dimOut
-        return
-
-    def eval(self, x):
-        assert (
-            len(x) == self.dimIn
-        ), f"Input for Net has wrong dimension, required dimension is {self.dimIn}."
-
-        return self.D.forward(torch.tensor(x, dtype=torch.float32)).detach().numpy()
-
-    def grad(self, x):
-        assert (
-            len(x) == self.dimIn
-        ), f"Input for Net has wrong dimension, required dimension is {self.dimIn}."
-
-        x_torch = torch.tensor(x, dtype=torch.float32)
-        x_torch.requires_grad_(True)
-
-        y_torch = self.D(x_torch)
-        y_torch.backward()
-
-        return x_torch.grad.data.numpy()
diff --git a/src/ncopt/utils.py b/src/ncopt/utils.py
index fb38bad..bd9b7c4 100644
--- a/src/ncopt/utils.py
+++ b/src/ncopt/utils.py
@@ -1,37 +1,33 @@
 import logging
-from typing import Optional
+from typing import Optional, Tuple
 
 import torch
 from torch.autograd.functional import jacobian
-from torch.func import functional_call, jacfwd, jacrev, vmap
+from torch.func import jacfwd, jacrev, vmap
 
 # %% Computing Jacobians
 """
 Important: jacobian and jacrev do the forward pass for each row of the input, that is,
 WITHOUT the batch dimension!
 
-E.g. if input has shape (d1, d2), then normal forward pas has input
+E.g. if input has shape (d1, d2), then normal forward pass has input
 (b, d1, d2); but jacobian/jacrev will do the forward pass with a (d1,d2) tensor.
 
 This becomes an issue when there are reshape/view modules, because they often will
 have different results when there is no batch dimension.
 
-* If the forward method uses rehshape/view, the batch dimension should be
-    specified with -1, and not with x.shape[0] or similar!
-* For the Jacobian, we get an extra dimension in such cases --> needs to be
-     removed later on
+So we fix this by adding dummy dimensions!
 """
 
-""" 
-
+"""
     Functions for computing the Jacobian of model(inputs) wrt. inputs.
-    Assume inputs has shape (batch, *d), where d itself could be a tuple. 
+    We compute Jacobian and output simultaneously.
+    
+    Assume that a batched input has shape (b, *d), where d itself could be a tuple,
+    and b is the batch size. 
     Assume that model output has shape m (with m integer).
 
-    Then the Jacobian of model has shape (m,d). 
-    The output of the below functions should have 
-        - either shape (b,m,*d),
-        - or shape (b,1,m,*d) (see above for explanation).
+    Then we want to compute a batched Jacobian of shape (b, m, *d).
 
     For more info see:
         * https://discuss.pytorch.org/t/computing-batch-jacobian-efficiently/80771/11
@@ -39,8 +35,10 @@
 """
 
 
-def compute_batch_jacobian_naive(model: torch.nn.Module, inputs: torch.Tensor):
-    """
+def compute_batch_jacobian_naive(
+    model: torch.nn.Module, inputs: torch.Tensor
+) -> Tuple[torch.tensor, torch.tensor]:
+    """Naive way for computing Jacobian. Used for testing.
 
     Parameters
     ----------
@@ -48,16 +46,26 @@ def compute_batch_jacobian_naive(model: torch.nn.Module, inputs: torch.Tensor):
         The function of which to compute the Jacobian.
     inputs : torch.Tensor
         The inputs for model. First dimension should be batch dimension.
+
+    Returns
+    -------
+    tuple[torch.tensor, torch.tensor]
+        Jacobian and output.
     """
     b = inputs.size(0)
+    out = model.forward(inputs)
     # want to have batch dimension --> double brackets
-    return torch.stack([jacobian(model, inputs[i]) for i in range(b)])
+    jac = torch.stack([jacobian(model, inputs[[i]]) for i in range(b)])
+    # Now jac has shape [b, 1, out_dim, 1, in_dim]  --> squeeze
+    # This only works if output shape is scalar/vector.
+    return jac.squeeze(dim=(1, 3)), out
 
 
 def compute_batch_jacobian_vmap(
     model: torch.nn.Module, inputs: torch.Tensor, forward: bool = False
-):
-    """
+) -> Tuple[torch.tensor, torch.tensor]:
+    """Vmap over batch dimension. This has the issue that the inputs are given to
+    model.forward() without the first dimension. We counteract by adding a dummy dimension.
 
     Parameters
     ----------
@@ -67,18 +75,63 @@ def compute_batch_jacobian_vmap(
         The inputs for model. First dimension should be batch dimension.
     forward: bool.
         Whether to compute in forward mode (jacrev or jacfwd). By default False.
+
+    Returns
+    -------
+    tuple[torch.tensor, torch.tensor]
+        Jacobian and output.
     """
-    params = dict(model.named_parameters())
 
-    def fmodel(params, inputs):  # functional version of model
-        return functional_call(model, params, inputs)
+    # functional version of model; dummy dim because vmap removes batch_dim
+    def fmodel(model, inputs):
+        out = model(inputs[None, :])
+        return out, out
 
     # argnums specifies which argument to compute jacobian wrt
     # in_dims: dont map over params (None), map over first dim of inputs (0)
     if not forward:
-        return vmap(jacrev(fmodel, argnums=(1)), in_dims=(None, 0))(params, inputs)
+        jac, out = vmap(jacrev(fmodel, argnums=(1), has_aux=True), in_dims=(None, 0))(model, inputs)
     else:
-        return vmap(jacfwd(fmodel, argnums=(1)), in_dims=(None, 0))(params, inputs)
+        jac, out = vmap(jacfwd(fmodel, argnums=(1), has_aux=True), in_dims=(None, 0))(model, inputs)
+
+    # now remove dummy dimension again
+    return jac.squeeze(dim=1), out.squeeze(dim=1)
+
+
+def compute_batch_jacobian(
+    model: torch.nn.Module, inputs: torch.Tensor, forward: bool = False
+) -> Tuple[torch.tensor, torch.tensor]:
+    """Not using vmap. This results in the Jacobian being of shape
+    [batch_size, dim_out, batch_size, *dim_in]
+
+    Parameters
+    ----------
+    model : torch.nn.Module
+        The function of which to compute the Jacobian.
+    inputs : torch.Tensor
+        The inputs for model. First dimension should be batch dimension.
+    forward: bool.
+        Whether to compute in forward mode (jacrev or jacfwd). By default False.
+
+    Returns
+    -------
+    tuple[torch.tensor, torch.tensor]
+        Jacobian and output.
+    """
+
+    def fmodel(model, inputs):  # functional version of model
+        out = model(inputs)
+        return out, out
+
+    # argnums specifies which argument to compute jacobian wrt
+    if not forward:
+        jac, out = jacrev(fmodel, argnums=(1), has_aux=True)(model, inputs)
+    else:
+        jac, out = jacfwd(fmodel, argnums=(1), has_aux=True)(model, inputs)
+
+    # Now only take the "diagonal". This only works if output shape is scalar/vector.
+    jac = torch.stack([jac[i, :, i, :] for i in range(jac.shape[0])])
+    return jac, out
 
 
 # %%
diff --git a/tests/test_functions.py b/tests/test_functions.py
new file mode 100644
index 0000000..e4a66b0
--- /dev/null
+++ b/tests/test_functions.py
@@ -0,0 +1,49 @@
+import torch
+
+from ncopt.functions.quadratic import Quadratic
+from ncopt.functions.rosenbrock import NonsmoothRosenbrock
+from ncopt.utils import compute_batch_jacobian_vmap
+
+d = 10
+b = 4
+
+
+def test_quadratic():
+    params = (torch.eye(d), torch.zeros(d), torch.tensor(1.0))
+    model = Quadratic(params=params)
+    inputs = torch.randn(b, d)
+
+    expected = 0.5 * (inputs * inputs).sum(1) + 1.0
+    out = model(inputs)
+
+    assert torch.allclose(out, expected[:, None])
+    assert out.shape == torch.Size([b, 1])
+    return
+
+
+def test_rosenbrock():
+    inputs = torch.tensor(
+        [
+            [1.0866, 0.9348],
+            [0.2838, 0.1436],
+            [-0.4327, -1.1307],
+            [-0.4813, -0.9599],
+            [-0.4523, 0.7851],
+        ],
+        dtype=torch.float64,
+    )
+
+    expected_out = torch.tensor([[1.9746, 1.0174, 12.5960, 11.7266, 6.7533]], dtype=torch.float64).T
+
+    expected_jac = torch.tensor(
+        [[17.5588, -8.0], [-5.9732, 8.0], [-9.7886, -8.0], [-10.6634, -8.0], [4.3322, 8.0]],
+        dtype=torch.float64,
+    ).view(5, 1, 2)
+
+    model = NonsmoothRosenbrock(a=8.0)
+    jac, out = compute_batch_jacobian_vmap(model, inputs)
+
+    assert torch.allclose(out, expected_out, atol=1e-3)
+    assert torch.allclose(jac, expected_jac)
+
+    return
diff --git a/tests/test_jacobians.py b/tests/test_jacobians.py
index d5833d4..47fd76c 100644
--- a/tests/test_jacobians.py
+++ b/tests/test_jacobians.py
@@ -1,94 +1,126 @@
+import numpy as np
+import pytest
 import torch
 
-from ncopt.utils import compute_batch_jacobian_naive, compute_batch_jacobian_vmap
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.quadratic import Quadratic
+from ncopt.utils import (
+    compute_batch_jacobian,
+    compute_batch_jacobian_naive,
+    compute_batch_jacobian_vmap,
+)
 
 d = 10
 m = 3
 b = 4
 
 
-class Quadratic(torch.nn.Module):
-    def __init__(self, input_dim):
-        super().__init__()
-        self.A = torch.nn.Parameter(torch.randn((input_dim, input_dim)))
-
-    def forward(self, x):
-        return (1 / 2) * x @ self.A @ x
+######################################################
+# Tests start here
 
 
-class DummyNet(torch.nn.Module):
-    def __init__(self, d=7, C=1, num_classes=10):
-        super(DummyNet, self).__init__()
+@pytest.mark.parametrize(
+    "jac_fun", [compute_batch_jacobian_naive, compute_batch_jacobian_vmap, compute_batch_jacobian]
+)
+def test_linear_jacobian(jac_fun):
+    torch.manual_seed(1)
+    inputs = torch.randn(b, d)
+    model = torch.nn.Linear(d, m)
 
-        self.conv = torch.nn.Conv2d(1, 8, kernel_size=5, stride=1, padding=2)
-        self.linear_input_dim = 8 * C * d * d
-        self.linear = torch.nn.Linear(self.linear_input_dim, num_classes)
+    expected = torch.stack([model.weight.data for _ in range(b)])
+    expected_out = model(inputs)
 
-    def forward(self, x):
-        x = torch.nn.functional.relu(self.conv(x))
-        # x = x.view(x.shape[0], -1) # This would result in errors when computing Jacobian.
-        x = x.view(-1, self.linear_input_dim)  # Batch dimension specified by -1.
-        x = self.linear(x)
-        return x
+    jac, out = jac_fun(model, inputs)
 
+    assert torch.allclose(expected, jac, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(expected_out, out)
+    assert jac.shape == torch.Size([b, m, d])
 
-######################################################
-# Tests start here
+    return
 
 
-def test_quadratic_jacobian():
+@pytest.mark.parametrize(
+    "jac_fun", [compute_batch_jacobian_naive, compute_batch_jacobian_vmap, compute_batch_jacobian]
+)
+def test_quadratic_jacobian(jac_fun):
+    """Multi-dim input, scalar output"""
     torch.manual_seed(1)
     inputs = torch.randn(b, d)
-    model = Quadratic(d)
+    model = Quadratic(input_dim=d)
 
-    # f(x) = 0.5 x^T A x --> Df(x) = 0.5*(A+A.T)x
-    expected = 0.5 * inputs @ (model.A.T + model.A)
+    # f(x) = 0.5 x^T A x + b.T x + c --> Df(x) = 0.5*(A+A.T)x + b
+    expected = 0.5 * inputs @ (model.A.T + model.A) + model.b
+    expected = expected.view(b, 1, d)
+    expected_out = model(inputs)
 
-    jac1 = compute_batch_jacobian_naive(model, inputs)
-    jac2 = compute_batch_jacobian_vmap(model, inputs)
+    jac, out = jac_fun(model, inputs)
 
-    assert torch.allclose(expected, jac1, rtol=1e-5, atol=1e-5)
-    assert torch.allclose(jac1, jac2, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(expected, jac, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(expected_out, out)
+    assert jac.shape == torch.Size([b, 1, d])
 
     return
 
 
-def test_forward_backward_jacobian():
+@pytest.mark.parametrize("jac_fun", [compute_batch_jacobian_vmap, compute_batch_jacobian])
+def test_forward_backward_jacobian(jac_fun):
     torch.manual_seed(1)
     inputs = torch.randn(b, d)
     model = Quadratic(d)
 
-    # f(x) = 0.5 x^T A x --> Df(x) = 0.5*(A+A.T)x
-    expected = 0.5 * inputs @ (model.A.T + model.A)
+    expected = 0.5 * inputs @ (model.A.T + model.A) + model.b
+    expected = expected.view(b, 1, d)
 
-    jac1 = compute_batch_jacobian_vmap(model, inputs, forward=False)
-    jac2 = compute_batch_jacobian_vmap(model, inputs, forward=True)
+    jac1, out1 = jac_fun(model, inputs, forward=False)
+    jac2, out2 = jac_fun(model, inputs, forward=True)
 
     assert torch.allclose(expected, jac1, rtol=1e-5, atol=1e-5)
     assert torch.allclose(jac1, jac2, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(out1, out2, rtol=1e-5, atol=1e-5)
 
     return
 
 
 def test_multidim_output():
+    """Multi-dim input, multi-dim output"""
     model = torch.nn.Sequential(torch.nn.Linear(d, m), torch.nn.Softmax(dim=-1))
 
     inputs = torch.randn(b, d)
     output = model(inputs)
     assert output.shape == torch.Size([b, m])
 
-    jac1 = compute_batch_jacobian_naive(model, inputs)
-    jac2 = compute_batch_jacobian_vmap(model, inputs)
+    jac1, _ = compute_batch_jacobian_naive(model, inputs)
+    jac2, _ = compute_batch_jacobian_vmap(model, inputs)
+    jac3, _ = compute_batch_jacobian(model, inputs)
 
     assert jac1.shape == torch.Size([b, m, d])
     assert jac2.shape == torch.Size([b, m, d])
+    assert jac3.shape == torch.Size([b, m, d])
 
     assert torch.allclose(jac1, jac2, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(jac1, jac3, rtol=1e-5, atol=1e-5)
 
     return
 
 
+class DummyNet(torch.nn.Module):
+    def __init__(self, d=7, C=1, num_classes=10):
+        super(DummyNet, self).__init__()
+
+        self.conv = torch.nn.Conv2d(1, 8, kernel_size=5, stride=1, padding=2)
+        self.linear_input_dim = 8 * C * d * d
+        self.linear = torch.nn.Linear(self.linear_input_dim, num_classes)
+
+    def forward(self, x):
+        x = torch.nn.functional.relu(self.conv(x))
+        # x = x.view(x.shape[0], -1) # This would result in errors when computing Jacobian.
+        x = x.view(-1, self.linear_input_dim)  # Batch dimension specified by -1.
+        x = self.linear(x)
+        return x
+
+
 def test_multidim_output_multiaxis_input():
+    """Multi-axis input (channel, pixel, pixel), multi-dim output"""
     pixel = 7
     channel = 1
     num_classes = 9
@@ -100,11 +132,69 @@ def test_multidim_output_multiaxis_input():
 
     assert output.shape == torch.Size([b, num_classes])
 
-    jac1 = compute_batch_jacobian_naive(model, inputs)
-    jac2 = compute_batch_jacobian_vmap(model, inputs)
+    jac1, _ = compute_batch_jacobian_naive(model, inputs)
+    jac2, _ = compute_batch_jacobian_vmap(model, inputs)
+    jac3, _ = compute_batch_jacobian(model, inputs)
 
-    # this case has an extra dimension, due to the view operation
-    assert jac1.shape == torch.Size([b, 1, num_classes, *input_dim])
-    assert jac2.shape == torch.Size([b, 1, num_classes, *input_dim])
+    assert jac1.shape == torch.Size([b, num_classes, *input_dim])
+    assert jac2.shape == torch.Size([b, num_classes, *input_dim])
+    assert jac3.shape == torch.Size([b, num_classes, *input_dim])
 
     assert torch.allclose(jac1, jac2, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(jac1, jac3, rtol=1e-5, atol=1e-5)
+
+    return
+
+
+def test_input_cropping():
+    """Jacobians computed correctly after cropping the input tensor."""
+    model = torch.nn.Linear(d, m)
+    inputs = torch.randn(b, d)
+
+    jac, out = compute_batch_jacobian_vmap(model, inputs)
+
+    def crop_inputs(x):
+        return x[:, :d]
+
+    f = ObjectiveOrConstraint(model, prepare_inputs=crop_inputs)
+
+    garbage = torch.randn(b, 2 * d)
+    inputs2 = torch.hstack((inputs, garbage))
+
+    jac2, out2 = compute_batch_jacobian_vmap(f, inputs2)
+
+    assert torch.allclose(out, out2, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(jac2[:, :, :d], jac, rtol=1e-5, atol=1e-5)
+    assert torch.allclose(jac2[:, :, d:], torch.zeros(b, m, 2 * d))
+
+    return
+
+
+def test_input_reshape():
+    """Jacobians computed correctly after reshaping the input tensor."""
+    pixel = 7
+    channel = 1
+    num_classes = 9
+    input_dim = (channel, pixel, pixel)
+    inputs = torch.randn(b, *input_dim)
+    inputs_flat = inputs.reshape(b, -1)
+
+    model = DummyNet(d=pixel, C=channel, num_classes=num_classes)
+
+    jac, out = compute_batch_jacobian_vmap(model, inputs)
+
+    # it is important to use x.shape[0], as batch size can be different in vmap
+    def reshape_inputs(x):
+        return x.reshape(x.shape[0], *input_dim)
+
+    assert torch.allclose(inputs, reshape_inputs(inputs_flat))
+
+    f = ObjectiveOrConstraint(model, prepare_inputs=reshape_inputs)
+
+    jac2, out2 = compute_batch_jacobian_vmap(f, inputs_flat)
+
+    assert torch.allclose(out, out2, rtol=1e-5, atol=1e-5)
+    assert jac2.shape == torch.Size([b, num_classes, np.prod(input_dim)])
+    assert torch.allclose(jac2.reshape(b, num_classes, *input_dim), jac, rtol=1e-5, atol=1e-5)
+
+    return
diff --git a/tests/test_quadratic.py b/tests/test_quadratic.py
new file mode 100644
index 0000000..0f9a0f9
--- /dev/null
+++ b/tests/test_quadratic.py
@@ -0,0 +1,32 @@
+import numpy as np
+import torch
+
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.quadratic import Quadratic
+from ncopt.sqpgs.main import SQPGS
+
+d = 3
+
+
+def test_quadratic_problem():
+    torch.manual_seed(2)
+    params = (torch.eye(d), torch.zeros(d), torch.tensor(0.0))
+    model = Quadratic(params=params)
+
+    f = ObjectiveOrConstraint(model, dim=d, is_differentiable=True)
+    g = ObjectiveOrConstraint(torch.nn.Linear(d, d), dim_out=d, is_differentiable=True)
+    # make zero infeasible
+    g.model.weight.data = -torch.eye(d)
+    g.model.bias.data = torch.ones(d)
+
+    gI = [g]
+    gE = []
+
+    options = {}
+    x0 = (-1) * torch.ones(d).numpy()
+    problem = SQPGS(
+        f, gI, gE, x0=x0, tol=1e-5, max_iter=50, verbose=False, options=options, log_every=10
+    )
+    sol = problem.solve()
+
+    assert np.allclose(sol, np.ones(d))
diff --git a/tests/test_rosenbrock.py b/tests/test_rosenbrock.py
index 9e6620c..5682aa6 100755
--- a/tests/test_rosenbrock.py
+++ b/tests/test_rosenbrock.py
@@ -3,16 +3,22 @@
 """
 
 import numpy as np
+import torch
 
-from ncopt.funs import f_rosenbrock, g_linear, g_max
+from ncopt.functions import ObjectiveOrConstraint
+from ncopt.functions.max_linear import MaxOfLinear
+from ncopt.functions.rosenbrock import NonsmoothRosenbrock
 from ncopt.sqpgs.main import SQPGS
 
-f = f_rosenbrock()
-g = g_max()
+f = ObjectiveOrConstraint(NonsmoothRosenbrock(a=8.0), dim=2)
+g = MaxOfLinear(
+    params=(torch.diag(torch.tensor([torch.sqrt(torch.tensor(2.0)), 2.0])), -torch.ones(2))
+)
 
 
 def test_rosenbrock_from_zero():
-    gI = [g]
+    torch.manual_seed(1)
+    gI = [ObjectiveOrConstraint(g, dim_out=1)]
     gE = []
     xstar = np.array([1 / np.sqrt(2), 0.5])
     problem = SQPGS(f, gI, gE, tol=1e-8, max_iter=200, verbose=False)
@@ -21,7 +27,8 @@ def test_rosenbrock_from_zero():
 
 
 def test_rosenbrock_from_rand():
-    gI = [g]
+    torch.manual_seed(1)
+    gI = [ObjectiveOrConstraint(g, dim_out=1)]
     gE = []
     xstar = np.array([1 / np.sqrt(2), 0.5])
     rng = np.random.default_rng(0)
@@ -32,7 +39,10 @@ def test_rosenbrock_from_rand():
 
 
 def test_rosenbrock_with_eq():
-    g1 = g_linear(A=np.eye(2), b=np.ones(2))
+    torch.manual_seed(12)
+    g1 = ObjectiveOrConstraint(torch.nn.Linear(2, 2), dim_out=2)
+    g1.model.weight.data = torch.eye(2)
+    g1.model.bias.data = -torch.ones(2)
     gI = []
     gE = [g1]
     xstar = np.ones(2)
diff --git a/tests/test_subproblem.py b/tests/test_subproblem.py
index c8572fa..de62890 100644
--- a/tests/test_subproblem.py
+++ b/tests/test_subproblem.py
@@ -2,25 +2,26 @@
 import numpy as np
 import pytest
 
-from ncopt.sqpgs.main import SubproblemSQPGS
+from ncopt.sqpgs.cvxpy_subproblem import CVXPYSubproblemSQPGS
+from ncopt.sqpgs.osqp_subproblem import OSQPSubproblemSQPGS
 
 
 @pytest.fixture
-def subproblem_ineq() -> SubproblemSQPGS:
+def subproblem_ineq() -> CVXPYSubproblemSQPGS:
     dim = 2
     p0 = 2
     pI = np.array([3])
     pE = np.array([], dtype=int)
     assert_tol = 1e-5
-    subproblem = SubproblemSQPGS(dim, p0, pI, pE, assert_tol)
+    subproblem = CVXPYSubproblemSQPGS(dim, p0, pI, pE, assert_tol)
     return subproblem
 
 
-def test_subproblem_ineq(subproblem_ineq: SubproblemSQPGS):
+def test_subproblem_ineq(subproblem_ineq: CVXPYSubproblemSQPGS):
     D_f = np.array([[-2.0, 1.0], [-2.04236205, -1.0], [-1.92172864, -1.0]])
     D_gI = [np.array([[0.0, 2.0], [0.0, 2.0], [1.41421356, 0.0], [1.41421356, 0.0]])]
     subproblem_ineq.solve(
-        H=np.eye(2, dtype=float),
+        L=np.eye(2, dtype=float),
         rho=0.1,
         D_f=D_f,
         D_gI=D_gI,
@@ -34,23 +35,23 @@ def test_subproblem_ineq(subproblem_ineq: SubproblemSQPGS):
 
 
 @pytest.fixture
-def subproblem_eq() -> SubproblemSQPGS:
+def subproblem_eq() -> CVXPYSubproblemSQPGS:
     dim = 2
     p0 = 2
     pI = np.array([], dtype=int)
     pE = np.array([4, 4])
     assert_tol = 1e-5
-    subproblem = SubproblemSQPGS(dim, p0, pI, pE, assert_tol)
+    subproblem = CVXPYSubproblemSQPGS(dim, p0, pI, pE, assert_tol)
     return subproblem
 
 
-def test_subproblem_eq(subproblem_eq: SubproblemSQPGS):
+def test_subproblem_eq(subproblem_eq: CVXPYSubproblemSQPGS):
     D_gE = [
         np.array([[1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0], [1.0, 0.0]]),
         np.array([[0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0], [0.0, 1.0]]),
     ]
     subproblem_eq.solve(
-        H=np.eye(2, dtype=float),
+        L=np.eye(2, dtype=float),
         rho=0.1,
         D_f=np.array([-2.0, 1.0]),
         D_gI=[],
@@ -61,3 +62,104 @@ def test_subproblem_eq(subproblem_eq: SubproblemSQPGS):
     )
     assert subproblem_eq.status == cp.OPTIMAL
     assert np.isclose(subproblem_eq.objective_val, 1.0)
+
+
+def test_subproblem_consistency_ineq():
+    dim = 2
+    p0 = 2
+    pI = np.array([3])
+    pE = np.array([], dtype=int)
+    assert_tol = 1e-5
+    subproblem = CVXPYSubproblemSQPGS(dim, p0, pI, pE, assert_tol)
+    subproblem2 = OSQPSubproblemSQPGS(dim, p0, pI, pE, assert_tol)
+    D_f = np.array([[-2.0, 1.0], [-2.04236205, -1.0], [-1.92172864, -1.0]])
+    D_gI = [np.array([[0.0, 2.0], [0.0, 2.0], [1.41421356, 0.0], [1.41421356, 0.0]])]
+
+    subproblem.solve(
+        L=np.eye(dim, dtype=float),
+        rho=0.1,
+        D_f=D_f,
+        D_gI=D_gI,
+        D_gE=[],
+        f_k=1.0,
+        gI_k=np.array([-1.0]),
+        gE_k=np.array([], dtype=float),
+    )
+
+    subproblem2.solve(
+        H=np.eye(dim, dtype=float),
+        rho=0.1,
+        D_f=D_f,
+        D_gI=D_gI,
+        D_gE=[],
+        f_k=1.0,
+        gI_k=np.array([-1.0]),
+        gE_k=np.array([], dtype=float),
+    )
+
+    assert np.allclose(subproblem.d, subproblem2.d)
+    assert np.allclose(subproblem.lambda_f, subproblem2.lambda_f)
+    assert np.allclose(subproblem.lambda_gI, subproblem2.lambda_gI)
+    assert np.allclose(subproblem.lambda_gE, subproblem2.lambda_gE)
+
+
+def test_subproblem_consistency_ineq_eq():
+    dim = 4
+    p0 = 2
+    pI = np.array([1])
+    pE = np.array([1])
+    assert_tol = 1e-5
+    subproblem = CVXPYSubproblemSQPGS(dim, p0, pI, pE, assert_tol)
+    subproblem2 = OSQPSubproblemSQPGS(dim, p0, pI, pE, assert_tol)
+    D_f = np.array(
+        [
+            [1.83529234, 2.51893663, -0.87507966, 0.53305111],
+            [0.7042857, -0.19426588, 1.26820232, 0.59255224],
+            [-0.87356341, -0.24994689, -0.82073493, -1.18734854],
+        ]
+    )
+
+    D_gI = [
+        np.array(
+            [
+                [-1.13249659, 0.67854141, -0.0316317, -1.37963152],
+                [-1.64858759, 0.65296873, -0.72343526, 0.60976315],
+            ]
+        )
+    ]
+
+    D_gE = [
+        np.array(
+            [
+                [1.05532136, -0.12589961, 0.49469938, 0.22879848],
+                [2.10668334, -0.00816628, -0.43333072, 0.22656999],
+            ]
+        )
+    ]
+
+    subproblem.solve(
+        L=np.eye(dim, dtype=float),
+        rho=0.1,
+        D_f=D_f,
+        D_gI=D_gI,
+        D_gE=D_gE,
+        f_k=1.0,
+        gI_k=np.array([-1.0]),
+        gE_k=np.array([-1.0]),
+    )
+
+    subproblem2.solve(
+        H=np.eye(dim, dtype=float),
+        rho=0.1,
+        D_f=D_f,
+        D_gI=D_gI,
+        D_gE=D_gE,
+        f_k=1.0,
+        gI_k=np.array([-1.0]),
+        gE_k=np.array([-1.0]),
+    )
+
+    assert np.allclose(subproblem.d, subproblem2.d)
+    assert np.allclose(subproblem.lambda_f, subproblem2.lambda_f)
+    assert np.allclose(subproblem.lambda_gI, subproblem2.lambda_gI)
+    assert np.allclose(subproblem.lambda_gE, subproblem2.lambda_gE)