mlswarm is a robust package designed to train neural networks and minimize functions through innovative swarm-like optimization algorithms. It features a unique gradient-free variant that eliminates the need for backpropagation, showcasing significant computational efficiency and flexibility advantages.
Our algorithm has been benchmarked against traditional methods like Nelder-Mead, Differential Evolution, Mesh Search, and Simulated Annealing. It demonstrated superior performance, achieving optimal solutions with 10x to 50x fewer function evaluations.
Install mlswarm easily using pip:
pip install mlswarm
Unlike traditional gradient descent which uses a single path, swarm-like optimization algorithms employ a group of "particles" that explore the parameter space collectively.
Auckley function | Particle swarm | Gradient descent |
---|---|---|
This method is particularly adept at handling non-convex functions where gradient descent may fail by getting stuck in local minima. The second plot shows the cloud of 25 points on the top right and its evolution through the iterations until reaching the minimum at the origin. The third plot shows the results using gradient descent (only one particle), where the optimization was stuck on a local minimum.
The core strength of our approach lies in the particles' ability to "communicate" and share information, significantly enhancing the optimization process. We replace the problem of minimizing
where
In practice we consider a discrete measure
There is also an algorithm implementation based on Nesterov's accelerated method.
By flattening the weights of a neural network, network training can be seen as a problem of directly minimizing a multivariate (cost) function. Then, particle swarm optimization algorithms can be used to minimize this multivariate function, where each particle will have a set of neural network weights associated with it.
Gomes, Alexandra A., and Diogo A. Gomes. "Derivative-Free Global Minimization in One Dimension: Relaxation, Monte Carlo, and Sampling." arXiv preprint (2023).
mlswarm contains two operable classes:
- neuralnet - train neural networks
- function - minimize functions
For a function object there are three main methods (see examples):
- func = function(lambda x: ...) - create a function object
- func = init_cloud(...) - Define array of initial particle positions
- func.minimize(...) - Define the algorithm's parameters and start the algorithm
For a neuralnet object there are three main methods (see examples):
- nn = neuralnet(...) - define neural network architecture and create neural network
- nn = init_cloud(N) - Initialize cloud with N particles
- nn.train(...) - Define the training data, algorithm parameter's and start the algorithm
There are three available optimization algorithms:
- gradient - swarm-like optimization algorithm
- gradient_free - similar to the former but derivative-free
- gradient_descent - gradient descent optimization
There are four ways of updating the particle cloud:
- euler - new_cloud = old_cloud - dt * gradient
- euler_adaptive - same as euler but with adaptive step size (dt)
- nesterov - nesterov update
- nesterov_adaptive - Nesterov update with adaptive restart
Jupyter notebook examples can be found on the github page that perform:
- Minimization of univariate and multivariate non-convex functions
- Linear Regression
- Logistic Regression
- Binary classification with 4-Layer Neural Network
- Binary classification with 4-Layer Neural Network using step activation functions