FAILURE with L-BFGS depending on system size and starting point #198

Liozou · 2023-04-27T10:18:37Z

Hi! I am facing a FAILURE when trying to minimize a function with 62339366 variables using L-BFGS with a box constraint. The issue does not appear when using just one less variable.

The function is $f: \vec x\mapsto \sum_i -50x_i + 300 x_i(\log(x_i) - 1)$ where $\vec x$ is a vector of 62339366 variables $x_i$. The box constraint is that the $x_i$ should be above a fixed SMALL_CONSTANT, for example 0.01, so that $(\vec\nabla f(\vec x))_i = -50 + 300\log(x_i)$ does not diverge. The exact value of SMALL_CONSTANT does not matter here, the same issue occurs using SMALL_CONSTANT = 1e-100.

The numerical solution should be a vector of ω = 1.1813604128656459, since $-50 + 300\log(\omega) \approx 0$. The initial point is a vector of numbers uniformly taken in $[(1-η)ω;\ (1+η)ω]$ where η = 0.001 for instance.

Here is the minimal reproducer (I think it requires having 16GiB RAM):

module TestNLoptLBFGSfailure
using NLopt

import LinearAlgebra: norm

const SMALL_CONSTANT = 1e-2
const ω = 1.1813604128656459
const η = 1e-3

function objective(point::Vector{Float64}, gradient::Vector{Float64})
    if length(gradient) > 0
        gradient .= -50 .+ 300 .* log.(point)
    end
    value = sum(-50*r + 300*(r*(log(r) - 1)) for r in point)
    @show value, norm(gradient)
    value
end

function test(n)
    opt = Opt(:LD_LBFGS, n)
    opt.ftol_rel = 1e-10 # or 1e-3, it does not matter
    opt.lower_bounds = fill(SMALL_CONSTANT, n)
    opt.min_objective = objective

    init = ω .* rand(1 .+ (-η:η), n)
    # init = fill(ω, n) # actual target

    minf, minx, ret = optimize(opt, init)
    ret
end

end

and the observations:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093566731912422e10, 2369.8435877962493)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365) # just 62339366 - 1
(value, norm(gradient)) = (-2.2093566377504475e10, 2369.843568788648)
(value, norm(gradient)) = (-2.143919631036808e10, 534364.1274153551)
(value, norm(gradient)) = (-2.209355118106552e10, 3646.620883772766)
(value, norm(gradient)) = (-2.209357736243934e10, 2.2617428385919083)
(value, norm(gradient)) = (-2.209357736243935e10, 0.001131058801969029)
(value, norm(gradient)) = (-2.209357736243935e10, 6.7321322189234e-10)
:SUCCESS

If I use the target value ω directly as initialization (i.e. taking η = 0), then it works:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093577716847473e10, 2.2440440909730785e-10)
:SUCCESS

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 2.2440440729744666e-10)
:SUCCESS

And if I set the starting point too close to the target, i.e. $η$ too small, for instance η = 1e-9, then it fails for both cases, but it does not perform the same number of steps:

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.2093577716847473e10, 0.0023686585521977225)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 0.0023686585331996264)
(value, norm(gradient)) = (-2.209357736243935e10, 0.5991391139757002)
(value, norm(gradient)) = (-2.209357736243935e10, 0.3982398117902327)
(value, norm(gradient)) = (-2.209357736243935e10, 0.2643059496298344)
(value, norm(gradient)) = (-2.209357736243935e10, 0.17501521403019882)
(value, norm(gradient)) = (-2.209357736243935e10, 0.1154857940136743)
(value, norm(gradient)) = (-2.209357736243935e10, 0.07579606004479056)
(value, norm(gradient)) = (-2.209357736243935e10, 0.049330911796086924)
(value, norm(gradient)) = (-2.209357736243935e10, 0.03167915082330152)
(value, norm(gradient)) = (-2.209357736243935e10, 0.01989798378814819)
(value, norm(gradient)) = (-2.209357736243935e10, 0.012021766484118854)
(value, norm(gradient)) = (-2.209357736243935e10, 0.006732064617095256)
:FAILURE

May be related to #33. The last point (optimization fails if the initial point is too close to the optimum) reflects #33 (comment), but the difference between 62339365 and 62339366 variables is new I think? My real problem is of course more complex and even larger than 62339366 variables.

The text was updated successfully, but these errors were encountered:

odow · 2024-08-19T05:25:48Z

I can confirm the failure, but I don't have any suggestions. This is likely an issue in the upstream https://github.com/stevengj/nlopt.

My real problem is of course more complex and even larger than 62339366 variables.

I will note that this is a very large problem then!

You could try relaxing the different tolerances.

odow · 2024-08-22T00:44:40Z

I've reproduced this in C, so this is not a bug in the Julia library:

#include <nlopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double callback(unsigned int n, const double *x, double *grad, void* user_data) {
    if (grad != NULL) {
        for (int i = 0; i < n; i++) {
            grad[i] = -50 + 300 * log(x[i]);
        }
    }
    double ret = 0.0;
    for (int i = 0; i < n; i++) {
        ret += -50 * x[i] + 300 * x[i] * (log(x[i]) - 1);
    }
    printf(" obj = %f\n", ret);
    return ret;
}

int test(int n) {
    nlopt_result ret;
    double *lb = (double *)malloc(n * sizeof(double));
    double *x = (double *)malloc(n * sizeof(double));
    double target = 1.1813604128656459;
    nlopt_opt opt = nlopt_create(NLOPT_LD_LBFGS, n);
    ret = nlopt_set_ftol_rel(opt, 1e-10);
    for (int i = 0; i < n; i++) {
        lb[i] = 1e-2;
        x[i] = target * (1 + 1e-3 * (2 * (double)rand() / (double)RAND_MAX - 1));
    }
    ret = nlopt_set_lower_bounds(opt, lb);
    ret = nlopt_set_min_objective(opt, callback, NULL);
    double obj_f = 0.0;
    ret = nlopt_optimize(opt, x, &obj_f);
    printf("n = %d :  nlopt_optimize: %d\n", n, ret);
    nlopt_destroy(opt);
    free(lb);
    free(x);
    return 0;
}

int main() {
    test(62339366);
    test(62339365);
    return 0;
}

(base) oscar@Oscars-MBP build % clang \
    /tmp/nlopt.c \
    -I/Users/Oscar/Documents/Code/nlopt/src/api \
    -L/Users/Oscar/Documents/Code/nlopt/build/ -lnlopt \
    -lm \
    -Wl,-rpath,/Users/Oscar/Documents/Code/nlopt/build \
    -o /Users/Oscar/Documents/Code/nlopt/build/bug

./bug
 obj = -22093574063.589386
n = 62339366 :  nlopt_optimize: -1
 obj = -22093573709.270824
 obj = -21856479388.923695
 obj = -22093568665.627117
 obj = -22093577362.439350
 obj = -22093577362.439350
 obj = -22093577362.439350
n = 62339365 :  nlopt_optimize: 1

odow · 2024-08-22T22:08:00Z

Closing because this is not a bug in NLopt.jl. It seems likely that the 62_339_366 is somewhat arbitrary and relates to some numerical error in the solver. (This is a very large problem to solve.)

There are a few open issues for LBFGS that end in FAILURE: stevengj/nlopt#416, stevengj/nlopt#40

odow added the upstream A bug or issue in the upstream library label Aug 19, 2024

odow closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAILURE with L-BFGS depending on system size and starting point #198

FAILURE with L-BFGS depending on system size and starting point #198

Liozou commented Apr 27, 2023

odow commented Aug 19, 2024

odow commented Aug 22, 2024

odow commented Aug 22, 2024

FAILURE with L-BFGS depending on system size and starting point #198

FAILURE with L-BFGS depending on system size and starting point #198

Comments

Liozou commented Apr 27, 2023

odow commented Aug 19, 2024

odow commented Aug 22, 2024

odow commented Aug 22, 2024