Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAILURE with L-BFGS depending on system size and starting point #198

Closed
Liozou opened this issue Apr 27, 2023 · 3 comments
Closed

FAILURE with L-BFGS depending on system size and starting point #198

Liozou opened this issue Apr 27, 2023 · 3 comments
Labels
upstream A bug or issue in the upstream library

Comments

@Liozou
Copy link

Liozou commented Apr 27, 2023

Hi! I am facing a FAILURE when trying to minimize a function with 62339366 variables using L-BFGS with a box constraint. The issue does not appear when using just one less variable.

The function is $f: \vec x\mapsto \sum_i -50x_i + 300 x_i(\log(x_i) - 1)$ where $\vec x$ is a vector of 62339366 variables $x_i$. The box constraint is that the $x_i$ should be above a fixed SMALL_CONSTANT, for example 0.01, so that $(\vec\nabla f(\vec x))_i = -50 + 300\log(x_i)$ does not diverge. The exact value of SMALL_CONSTANT does not matter here, the same issue occurs using SMALL_CONSTANT = 1e-100.

The numerical solution should be a vector of ω = 1.1813604128656459, since $-50 + 300\log(\omega) \approx 0$. The initial point is a vector of numbers uniformly taken in $[(1-η)ω;\ (1+η)ω]$ where η = 0.001 for instance.

Here is the minimal reproducer (I think it requires having 16GiB RAM):

module TestNLoptLBFGSfailure
using NLopt

import LinearAlgebra: norm

const SMALL_CONSTANT = 1e-2
const ω = 1.1813604128656459
const η = 1e-3

function objective(point::Vector{Float64}, gradient::Vector{Float64})
    if length(gradient) > 0
        gradient .= -50 .+ 300 .* log.(point)
    end
    value = sum(-50*r + 300*(r*(log(r) - 1)) for r in point)
    @show value, norm(gradient)
    value
end

function test(n)
    opt = Opt(:LD_LBFGS, n)
    opt.ftol_rel = 1e-10 # or 1e-3, it does not matter
    opt.lower_bounds = fill(SMALL_CONSTANT, n)
    opt.min_objective = objective

    init = ω .* rand(1 .+ (-η), n)
    # init = fill(ω, n) # actual target

    minf, minx, ret = optimize(opt, init)
    ret
end

end

and the observations:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093566731912422e10, 2369.8435877962493)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365) # just 62339366 - 1
(value, norm(gradient)) = (-2.2093566377504475e10, 2369.843568788648)
(value, norm(gradient)) = (-2.143919631036808e10, 534364.1274153551)
(value, norm(gradient)) = (-2.209355118106552e10, 3646.620883772766)
(value, norm(gradient)) = (-2.209357736243934e10, 2.2617428385919083)
(value, norm(gradient)) = (-2.209357736243935e10, 0.001131058801969029)
(value, norm(gradient)) = (-2.209357736243935e10, 6.7321322189234e-10)
:SUCCESS

If I use the target value ω directly as initialization (i.e. taking η = 0), then it works:

julia> TestNLoptLBFGSfailure.test(62339366)
(value, norm(gradient)) = (-2.2093577716847473e10, 2.2440440909730785e-10)
:SUCCESS

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 2.2440440729744666e-10)
:SUCCESS

And if I set the starting point too close to the target, i.e. $η$ too small, for instance η = 1e-9, then it fails for both cases, but it does not perform the same number of steps:

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.2093577716847473e10, 0.0023686585521977225)
:FAILURE

julia> TestNLoptLBFGSfailure.test(62339365)
(value, norm(gradient)) = (-2.209357736243935e10, 0.0023686585331996264)
(value, norm(gradient)) = (-2.209357736243935e10, 0.5991391139757002)
(value, norm(gradient)) = (-2.209357736243935e10, 0.3982398117902327)
(value, norm(gradient)) = (-2.209357736243935e10, 0.2643059496298344)
(value, norm(gradient)) = (-2.209357736243935e10, 0.17501521403019882)
(value, norm(gradient)) = (-2.209357736243935e10, 0.1154857940136743)
(value, norm(gradient)) = (-2.209357736243935e10, 0.07579606004479056)
(value, norm(gradient)) = (-2.209357736243935e10, 0.049330911796086924)
(value, norm(gradient)) = (-2.209357736243935e10, 0.03167915082330152)
(value, norm(gradient)) = (-2.209357736243935e10, 0.01989798378814819)
(value, norm(gradient)) = (-2.209357736243935e10, 0.012021766484118854)
(value, norm(gradient)) = (-2.209357736243935e10, 0.006732064617095256)
:FAILURE

May be related to #33. The last point (optimization fails if the initial point is too close to the optimum) reflects #33 (comment), but the difference between 62339365 and 62339366 variables is new I think? My real problem is of course more complex and even larger than 62339366 variables.

@odow
Copy link
Member

odow commented Aug 19, 2024

I can confirm the failure, but I don't have any suggestions. This is likely an issue in the upstream https://github.com/stevengj/nlopt.

My real problem is of course more complex and even larger than 62339366 variables.

I will note that this is a very large problem then!

You could try relaxing the different tolerances.

@odow odow added the upstream A bug or issue in the upstream library label Aug 19, 2024
@odow
Copy link
Member

odow commented Aug 22, 2024

I've reproduced this in C, so this is not a bug in the Julia library:

#include <nlopt.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

double callback(unsigned int n, const double *x, double *grad, void* user_data) {
    if (grad != NULL) {
        for (int i = 0; i < n; i++) {
            grad[i] = -50 + 300 * log(x[i]);
        }
    }
    double ret = 0.0;
    for (int i = 0; i < n; i++) {
        ret += -50 * x[i] + 300 * x[i] * (log(x[i]) - 1);
    }
    printf(" obj = %f\n", ret);
    return ret;
}

int test(int n) {
    nlopt_result ret;
    double *lb = (double *)malloc(n * sizeof(double));
    double *x = (double *)malloc(n * sizeof(double));
    double target = 1.1813604128656459;
    nlopt_opt opt = nlopt_create(NLOPT_LD_LBFGS, n);
    ret = nlopt_set_ftol_rel(opt, 1e-10);
    for (int i = 0; i < n; i++) {
        lb[i] = 1e-2;
        x[i] = target * (1 + 1e-3 * (2 * (double)rand() / (double)RAND_MAX - 1));
    }
    ret = nlopt_set_lower_bounds(opt, lb);
    ret = nlopt_set_min_objective(opt, callback, NULL);
    double obj_f = 0.0;
    ret = nlopt_optimize(opt, x, &obj_f);
    printf("n = %d :  nlopt_optimize: %d\n", n, ret);
    nlopt_destroy(opt);
    free(lb);
    free(x);
    return 0;
}

int main() {
    test(62339366);
    test(62339365);
    return 0;
}
(base) oscar@Oscars-MBP build % clang \
    /tmp/nlopt.c \
    -I/Users/Oscar/Documents/Code/nlopt/src/api \
    -L/Users/Oscar/Documents/Code/nlopt/build/ -lnlopt \
    -lm \
    -Wl,-rpath,/Users/Oscar/Documents/Code/nlopt/build \
    -o /Users/Oscar/Documents/Code/nlopt/build/bug

./bug
 obj = -22093574063.589386
n = 62339366 :  nlopt_optimize: -1
 obj = -22093573709.270824
 obj = -21856479388.923695
 obj = -22093568665.627117
 obj = -22093577362.439350
 obj = -22093577362.439350
 obj = -22093577362.439350
n = 62339365 :  nlopt_optimize: 1

@odow
Copy link
Member

odow commented Aug 22, 2024

Closing because this is not a bug in NLopt.jl. It seems likely that the 62_339_366 is somewhat arbitrary and relates to some numerical error in the solver. (This is a very large problem to solve.)

There are a few open issues for LBFGS that end in FAILURE: stevengj/nlopt#416, stevengj/nlopt#40

@odow odow closed this as completed Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream A bug or issue in the upstream library
Development

No branches or pull requests

2 participants