Algorithm Details
This section provides detailed explanations of the optimization algorithms implemented in the Minion library, along with their key parameters and how to configure them.
Each optimization algorithm has a set of parameters that can be adjusted to influence the optimization process. These parameters can be obtained and modified as follows:
C++ Example:
In C++, you can retrieve the default settings for an algorithm and adjust the parameters as needed:
minion::DefaultSettings settings;
// Retrieve the default settings for the LSHADE algorithm
std::map<std::string, minion::ConfigValue> options = settings.getDefaultSettings("LSHADE");
// Override default parameters
options["population_size"] = 50;
// Initialize the minimizer with custom settings
auto min = minion::Minimizer(rosenbrock_vect, bounds, {}, nullptr, nullptr, "LSHADE", 0.0, max_evals, -1, options);
// Perform optimization
auto min_result = min.optimize();
Python Example:
In Python, you can pass the parameters via a dictionary when creating the Minimizer object. Here’s an example for configuring and using the ARRDE algorithm:
import minionpy as mpy
options = {
"population_size" : 0, # Auto sizing based on dimension and budget when set to 0
"minimum_population_size" : 4, # Terminal population size (cannot be below 4)
"bound_strategy" : "reflect-random" # Boundary handling policy
}
# Use ARRDE algorithm for optimization
min = mpy.Minimizer(func=objective_function,
x0=x0,
bounds=[(-10, 10)] * dimension,
algo="ARRDE",
relTol=0.0,
maxevals=10000,
callback=None,
seed=None,
options=options)
# Run optimization
result = min.optimize()
Note
The optional x0 argument can contain multiple initial guesses.
Population-based algorithms, such as Differential Evolution variants, LSHADE, and PSO, use these samples to initialise their populations.
For algorithms that evolve a single incumbent solution (e.g., CMA-ES, Nelder-Mead, L-BFGS, L-BFGS-B), Minion evaluates the provided guesses first and starts from the best-performing one.
Each algorithm comes with a set of configurable parameters that affect its behavior during the optimization process. For example, the population_size parameter controls the number of candidate solutions in the population, while the mutation_rate defines the probability of modifying a candidate solution during the mutation process.
Please refer to the respective algorithm section for detailed descriptions of each parameter.
Differential Evolution (DE)
Differential Evolution (DE) is a population-based stochastic optimization algorithm that applies mutation, crossover, and selection to evolve a population of candidate solutions.
Reference : Storn, R and Price, K, Differential Evolution - a Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, Journal of Global Optimization, 1997, 11, 341 - 359.
Algorithm name : "DE"
Particle Swarm Optimization (PSO)
Minion implements several particle swarm variants. The canonical PSO uses a global-best topology and the standard inertia / acceleration update.
Reference: Kennedy, J. and Eberhart, R., “Particle Swarm Optimization,” Proc. IEEE International Conference on Neural Networks, 1995.
Algorithm name : "PSO"
Parameters :
population_size: 0Note
Swarm size. If set to
0, it defaults to5 * DwhereDis the problem dimension.inertia_weight: 0.7Note
Inertia weight \(\omega\) that controls how much momentum each particle carries between steps.
cognitive_coefficient: 1.5Note
Personal acceleration constant \(c_1\) steering particles toward their own best position.
social_coefficient: 1.5Note
Social acceleration constant \(c_2\) that pulls particles toward the global best.
velocity_clamp: 0.2Note
Fraction of the search range used as a velocity limit.
0disables clamping.bound_strategy:reflect-randomNote
Boundary handling policy. Available strategies:
"random","reflect-random","clip".
SPSO-2011
This implementation mirrors the stochastic PSO 2011 variant proposed by Clerc and Bratton, including adaptive informant topologies and hypersphere sampling.
Reference: M. Zambrano-Bigiarini, M. Clerc and R. Rojas, “Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements,” 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 2013, pp. 2337-2344, doi: 10.1109/CEC.2013.6557848.
Algorithm name : "SPSO2011"
Parameters :
population_size: 0Note
Swarm size.
0selects the heuristic default of5 * D.inertia_weight: 0.729844Note
Inertia parameter \(\omega\) controlling exploration versus exploitation.
cognitive_coefficient: 1.49618Note
Cognitive acceleration constant influencing attraction toward each particle’s personal best.
social_coefficient: 1.49618Note
Social acceleration constant drawing particles toward informants and the global best.
phi_personal: 1.49618Note
Parameter used in the stochastic hypersphere sampling for the personal component.
phi_social: 1.49618Note
Parameter used in the stochastic hypersphere sampling for the social component.
neighborhood_size: 3Note
Number of neighbors considered when building the adaptive informant topology.
informant_degree: 3Note
Degree of the communication graph; controls how many informants each particle has.
velocity_clamp: 0.0Note
Maximum velocity as a fraction of the search range.
0disables the clamp.normalize: FalseNote
Whether to transform search vectors into a normalised unit hypercube before mapping back to bounds.
bound_strategy:reflect-randomNote
Boundary handling policy. Available strategies:
"random","reflect-random","clip".
Dynamic Multi-Swarm PSO (DMSPSO)
Dynamic Multi-Swarm PSO partitions the swarm into co-operative sub-swarms that periodically regroup, blending global and local attraction.
Algorithm name : "DMSPSO"
Parameters :
population_size: 0Note
Swarm size.
0enables the adaptive default of5 * Dparticles.inertia_weight: 0.7Note
Inertia term balancing exploration and exploitation across sub-swarms.
cognitive_coefficient: 1.2Note
Personal attraction strength toward each particle’s historical best.
social_coefficient: 1.0Note
Global attraction weight toward the swarm-wide best.
local_coefficient: 1.4Note
Influence of the current sub-swarm best position.
global_coefficient: 0.8Note
Influence of the overall global best retained across regroupings.
subswarm_count: 4Note
Number of co-operative sub-swarms maintained before regrouping.
regroup_period: 5Note
Iterations between sub-swarm regrouping events.
velocity_clamp: 0.2Note
Fraction of the search range that caps particle velocities.
0disables clamping.bound_strategy:reflect-randomNote
Boundary handling policy. Available strategies:
"random","reflect-random","clip".
LSHADE-cnEpSin
Ensemble Sinusoidal LSHADE with covariance learning augments LSHADE with an ensemble of sinusoidal F-adaptation schemes and a local covariance matrix for crossover.
Reference: N. H. Awad, M. Z. Ali and P. N. Suganthan, “Ensemble sinusoidal differential covariance matrix adaptation with Euclidean neighborhood for solving CEC2017 benchmark problems,” 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 2017, pp. 372-379, doi: 10.1109/CEC.2017.7969336.
Algorithm name : "LSHADE_cnEpSin"
Parameters :
population_size: 0Note
Initial population size.
0chooses the default18 * Dheuristic.memory_size: 5Note
Number of entries stored in the shared scaling-factor, crossover-rate, and frequency memories.
archive_rate: 1.4Note
Ratio between archive size and current population size.
minimum_population_size: 4Note
Final population size reached by linear population reduction.
p_best_fraction: 0.11Note
Fraction of top individuals eligible for the current-to-
p-best mutation.rotation_probability: 0.4Note
Probability of switching to eigen-space crossover with covariance learning.
neighborhood_fraction: 0.5Note
Fraction of the population used to estimate the covariance matrix.
freq_init: 0.5Note
Initial value for the sinusoidal frequency adaptation.
learning_period: 20Note
Window size for tracking ensemble success statistics.
sin_freq_base: 0.5Note
Base frequency for the first sinusoidal strategy.
epsilon: 1e-8Note
Small constant used to stabilise probability updates.
bound_strategy:reflect-randomNote
Boundary handling policy. Available strategies:
"random","reflect-random","clip".
CMA-ES
Covariance Matrix Adaptation Evolution Strategy samples offspring from an evolving multivariate normal distribution whose covariance is adapted from successful steps.
Reference: N. Hansen and A. Ostermeier, “Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation,” Proceedings of IEEE International Conference on Evolutionary Computation, Nagoya, Japan, 1996, pp. 312-317, doi: 10.1109/ICEC.1996.542381.
Algorithm name : "CMAES"
Parameters :
population_size: 0Note
Number of offspring sampled each generation.
0derives the default4 + 3 \log(D)value.mu: 0Note
Number of parents contributing to the new mean.
0sets it topopulation_size / 2.initial_step: 0.3Note
Initial global step size, scaled by the average bound width.
cc: 0.0Note
Cumulation parameter for the covariance evolution path.
cs: 0.0Note
Cumulation parameter for the step-size control path.
c1: 0.0Note
Learning rate for rank-one covariance updates.
cmu: 0.0Note
Learning rate for the rank-
\mucovariance update.damps: 0.0Note
Damping parameter controlling how aggressively the step size adapts.
bound_strategy:reflect-randomNote
Boundary handling policy for infeasible samples. Available strategies:
"random","reflect-random","clip".
BIPOP aCMA-ES
BIPOP aCMA-ES is an adaptive bi-population variant of CMA-ES that alternates between runs with small and large population sizes to balance local refinement and global exploration. It includes an adaptive restart mechanism to allocate budget across these runs.
Reference: Nikolaus Hansen. 2009. Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed. In Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers (GECCO ‘09). Association for Computing Machinery, New York, NY, USA, 2389–2396. https://doi.org/10.1145/1570256.1570333.
Algorithm name : "BIPOP_aCMAES"
Parameters :
population_size: 0Note
Number of offspring sampled each generation.
0derives the default value based on the dimensionality of the problem.max_restarts: 8Note
Maximum number of adaptive restarts the algorithm will perform.
max_iterations: 5000Note
Maximum number of iterations (generations) allowed per run.
initial_step: 0.3Note
Initial global step size (sigma), scaled by the average bound width.
bound_strategy:reflect-randomNote
Boundary handling policy for infeasible samples. Available strategies:
"random","reflect-random","clip".
Adaptive Restart-Refine Differential Evolution (ARRDE)
ARRDE is an extension of the Differential Evolution algorithm with adaptive restart and refinement strategies.
Algorithm name : "ARRDE"
Parameters :
population_size: 0Note
Initial population size. If set to
0, ARRDE computes\[N = \max\!\left(2D,\; \operatorname{clip}\big(D \cdot m(\eta), 4, 3000\big)\right)\]where \(D\) is dimensionality, \(\eta = \tfrac{N_{\text{maxevals}}}{D}\), and
\[\begin{split}m(\eta) = \begin{cases} 2.0, & \log_{10} \eta \le 2,\\ 2.0 + 5.756 (\log_{10} \eta - 2)^{1.609}, & \text{otherwise.} \end{cases}\end{split}\]\(N_{\text{maxevals}}\) is the maximum number of function evaluations.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect","reflect-random","clip","periodic","none".
Grey Wolf Optimizer Differential Evolution (GWO-DE)
GWO-DE combines Differential Evolution with the Grey Wolf Optimizer, leveraging the social structure of wolves for optimization.
Algorithm name : "GWO_DE"
Parameters :
population_size: 0Note
The number of individuals in the population. If set to 0, a default size based on problem dimension is used.
mutation_rate: 0.5Note
The probability of mutating a parameter value during evolution.
crossover_rate: 0.7Note
The probability of recombining parent and mutant vectors.
elimination_prob: 0.1Note
The probability of eliminating a wolf from the population.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip".
j2020 Algorithm
The j2020 algorithm is a variation of Differential Evolution that incorporates parameter-specific strategies for mutation and recombination.
Reference : J. Brest, M. S. Maučec and B. Bošković, “Differential Evolution Algorithm for Single Objective Bound-Constrained Optimization: Algorithm j2020,” 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 2020, pp. 1-8, doi: 10.1109/CEC48606.2020.9185551.
Algorithm name : "j2020"
Parameters :
population_size: 0Note
Initial population size (N). If set to
0, it will be automatically determined as follows:\[N = 8 \cdot D\]tau1: 0.1Note
The value of tau1 variable. The value must be between 0 and 1.
tau2: 0.1Note
The value of tau1 variable. The value must be between 0 and 1.
myEqs: 0.25Note
The value of myEqs variable. The value must be between 0 and 1.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip".
LSRTDE Algorithm
The LSRTDE algorithm. Designed to solve CEC2024.
Reference : V. Stanovov and E. Semenkin, “Success Rate-based Adaptive Differential Evolution L-SRTDE for CEC 2024 Competition,” 2024 IEEE Congress on Evolutionary Computation (CEC), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/CEC60901.2024.10611907.
Algorithm name : "LSRTDE"
Parameters :
population_size: 0Note
nitial population size (N). If set to
0, it will be automatically determined as follows:\[N = 20 \cdot D\]where D is the dimensionality of the problem.
memory_size: 5Note
Memory size for storing the values of
CRandF.success_rate: 0.5Note
The success rate required for selecting an individual for the next generation.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip".
NLSHADE-RSP Algorithm
NLSHADE-RSP is an extension of the SHADE algorithm designed to solve CEC2021 problems.
Reference : V. Stanovov, S. Akhmedova and E. Semenkin, “NL-SHADE-RSP Algorithm with Adaptive Archive and Selective Pressure for CEC 2021 Numerical Optimization,” 2021 IEEE Congress on Evolutionary Computation (CEC), Kraków, Poland, 2021, pp. 809-816, doi: 10.1109/CEC45853.2021.9504959.
Algorithm name : "NLSHADE_RSP"
Parameters :
population_size: 0Note
Initial population size (N). If set to 0, it will be automatically determined.
\[N = 30 \cdot D\]where D is the dimensionality of the problem.
memory_size: 100Note
Memory size for storing the values of
CRandFarchive_size_ratio: 2.6Note
The ratio of the archive size relative to the population size.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
JADE Algorithm
JADE is a variant of Differential Evolution that introduces adaptive strategies for mutation and selection.
Reference : J. Zhang and A. C. Sanderson, “JADE: Adaptive Differential Evolution With Optional External Archive,” in IEEE Transactions on Evolutionary Computation, vol. 13, no. 5, pp. 945-958, Oct. 2009, doi: 10.1109/TEVC.2009.2014613.
Algorithm name : "JADE"
Parameters :
population_size: 0Note
Initial population size (N). If set to
0, it will be automatically determined as follows:If the dimensionality \(D\) of the problem is \(D < 10\), then \(N = 30\).
If \(10 \leq D \leq 30\), then \(N = 100\).
If \(30 < D \leq 50\), then \(N = 200\).
If \(50 < D \leq 70\), then \(N = 300\).
Else, \(N = 400\).
c: 0.1Note
The value of c variable. The value must be between 0 and 1.
mutation_strategy:current_to_pbest_A_1binNote
Mutation strategy used in the optimization process. Available strategies:
"best1bin","best1exp","rand1bin","rand1exp","current_to_pbest1bin","current_to_pbest1exp","current_to_pbest_A_1bin","current_to_pbest_A_1exp".archive_size_ratio: 1.0Note
The ratio of the archive size relative to the population size.
minimum_population_size: 4Note
The minimum population size.
reduction_strategy:linearNote
Strategy used to reduce the population size. Available strategies:
"linear","exponential","agsk".bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
jSO Algorithm
The jSO algorithm is a variant of LSHADE, designed to solve CEC2017 problems.
Reference : J. Brest, M. S. Maučec and B. Bošković, “Single objective real-parameter optimization: Algorithm jSO,” 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 2017, pp. 1311-1318, doi: 10.1109/CEC.2017.7969456.
Algorithm name : "jSO"
Parameters :
population_size: 0Note
Initial population size (N). If set to 0, it will be automatically determined as:
\[N = 25 \cdot \log(D) \cdot \sqrt{D}\]where D is the dimensionality of the problem.
memory_size: 5Note
Memory size for storing the values of
CRandF.archive_size_ratio: 1.0Note
The ratio of the archive size relative to the population size.
minimum_population_size: 4Note
The minimum population size.
reduction_strategy:linearNote
Strategy used to reduce the population size. Available strategies:
"linear","exponential","agsk".bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
LSHADE Algorithm
Linear Population Reduction - Success History Adaptive Differential Evolution (LSHADE) algorithm. Originally designed to solve CEC2014.
Reference : R. Tanabe and A. S. Fukunaga, “Improving the search performance of SHADE using linear population size reduction,” 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 2014, pp. 1658-1665, doi: 10.1109/CEC.2014.6900380.
Algorithm name : "LSHADE"
Parameters :
population_size: 0Note
Initial population size (N). If set to 0, it will be automatically determined.
\[N = 18 \cdot D\]where D is the dimensionality of the problem.
memory_size: 6Note
Memory size for storing the values of
CRandF.mutation_strategy:current_to_pbest_A_1binNote
Mutation strategy used in the optimization process. Available strategies:
"best1bin","best1exp","rand1bin","rand1exp","current_to_pbest1bin","current_to_pbest1exp","current_to_pbest_A_1bin","current_to_pbest_A_1exp".archive_size_ratio: 2.6Note
The ratio of the archive size relative to the population size.
minimum_population_size: 4Note
The minimum population size.
reduction_strategy:linearNote
The strategy for reducing the population size over time. Options include linear, exponential, or agsk.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
IMODE Algorithm
Improved Multi-Operator Differential Evolution (IMODE) blends multiple mutation strategies with adaptive control parameters and a linear population reduction schedule to balance exploration and exploitation on challenging multimodal problems.
Reference: Karam M. Sallam, Saber M. Elsayed, Ripon K. Chakrabortty, and Michael J. Ryan. 2020. Improved Multi-operator Differential Evolution Algorithm for Solving Unconstrained Problems. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE Press, 1–8. https://doi.org/10.1109/CEC48606.2020.9185577
Algorithm name : "IMODE"
Parameters :
population_size: 0Note
Initial population size. If set to
0, it follows the IMODE rule\[N = \min(5000, \max(18 \cdot D, 6 \cdot D^2))\]where D is the dimensionality of the problem.
minimum_population_size: 4Note
Terminal population size used during linear reduction. Must be at least 4.
memory_size: 0Note
Size of the success-history memory for adaptive parameters.
0applies the heuristic20 \cdot D.archive_size_ratio: 2.6Note
Archive size relative to the population size.
bound_strategy:reflect-randomNote
Boundary handling policy. Available strategies:
"random","reflect","reflect-random","clip","periodic","none".
AGSK Algorithm
Adaptive Gaining-Sharing Knowledge-based (AGSK) algorithm. Designed for the CEC 2020 benchmark suite, AGSK relies on junior/senior knowledge sharing phases, adaptive knowledge pools, and the AGSK population-size reduction policy.
Reference: A. W. Mohamed, A. A. Hadi, A. K. Mohamed and N. H. Awad, “Evaluating the Performance of Adaptive GainingSharing Knowledge Based Algorithm on CEC 2020 Benchmark Problems,” 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 2020, pp. 1-8, doi: 10.1109/CEC48606.2020.9185901.
Algorithm name : "AGSK"
Parameters :
population_size: 0Note
Initial population size (N). If set to 0, it follows the reference rule
\[\begin{split}N = \begin{cases} 40 \cdot D, & D > 5,\\ 100, & \text{otherwise} \end{cases}\end{split}\]where D is the dimensionality of the problem.
minimum_population_size: 12Note
Minimum population size allowed during AGSK’s nonlinear reduction schedule. Must be >= 4.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect","reflect-random","clip","periodic","none".
Artificial Bee Colony (ABC)
The Artificial Bee Colony (ABC) algorithm is a swarm intelligence-based optimization algorithm inspired by the foraging behavior of honey bees.
Algorithm name : "ABC"
Parameters :
population_size: 0Note
Initial population size (N). If set to 0, it will be automatically determined.
\[N = 5 \cdot D\]where D is the dimensionality of the problem.
mutation_strategy:rand1Note
Mutation strategy, default is “rand1”, available :
"rand1","best1".bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
Dual Annealing (DA)
Dual Annealing combines simulated annealing with local search to provide a flexible and robust global optimization algorithm.
Reference : Tsallis C, Stariolo DA. Generalized Simulated Annealing. Physica A, 233, 395-406 (1996).
Algorithm name : "DA"
Parameters :
acceptance_par: -5.0Note
The acceptance parameter controlling the probability of accepting worse solutions. The value must be between -1.0e+4 and -5.
visit_par: 2.67Note
The parameter controlling the annealing rate during the search. The value must be between 1.0 and 3.0.
initial_temp: 5230.0Note
The initial temperature for the annealing process. The value must be between 0.01 and 5.0e+4.
restart_temp_ratio: 2e-5Note
The temperature ratio for restart condition. The value must be between 0 and 1.
use_local_search: trueNote
Whether to use local search (e.g., L-BFGS-B) to refine the solutions.
local_search_algo:L_BFGS_BNote
The local search algorithm to be used. Available :
"NelderMead"and"L_BFGS_B"finite_diff_rel_step: 1e-10Note
The relative step size for finite difference computations. The default value 0.0 means that the relative step is given by the square root of machine epsilon.
bound_strategy:periodicNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
Nelder-Mead Algorithm
The Nelder-Mead algorithm is a derivative-free optimization method that relies on reflection, expansion, contraction, and shrinkage to search for an optimum.
Reference : Nelder, John A.; R. Mead (1965). “A simplex method for function minimization”. Computer Journal. 7 (4): 308–313. doi:10.1093/comjnl/7.4.308.
Algorithm name : "NelderMead"
Parameters :
locality_factor: 1.0Note
The factor controlling the step size for reflection and expansion during optimization.
bound_strategy:reflect-randomNote
Method for handling boundary violations. Available strategies:
"random","reflect-random","clip","periodic".
L-BFGS-B Algorithm
L-BFGS-B is a quasi-Newton method that approximates the Hessian matrix while handling bound constraints. The implementation here utilizes the back-end code from the LBFGSpp library (https://github.com/yixuan/LBFGSpp), which provides L-BFGS-B updates and Hessian approximation functionality.
Minion implements a customized derivative calculation method that ensures both vectorization and noise robustness. To improve stability under noise, the derivative is computed using an adaptive step size, and a noise-robust Lanczos derivative is employed.
Additionally, function calls are vectorized, meaning the objective function and its derivative can be evaluated in a single batch. This batch execution can be further parallelized using multithreading or multiprocessing, leading to significant computational efficiency improvements.
Reference : Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. (1995). “A Limited Memory Algorithm for Bound Constrained Optimization”, SIAM J. Sci. Comput. 16 (5): 1190–1208.
Algorithm Name : "L_BFGS_B"
Parameters
``max_iterations``: 15000
Note
The maximum number of iterations allowed for the algorithm.
``m``: 15
Note
The number of previous iterations used to approximate the Hessian matrix.
``g_epsilon``: 1e-8
Note
The absolute gradient convergence tolerance.
``g_epsilon_rel``: 0.0
Note
The relative gradient convergence tolerance.
``f_reltol``: 1e-8
Note
The function value convergence tolerance.
``max_linesearch``: 20
Note
The maximum number of iterations allowed during line search.
``c_1``: 1e-3
Note
The first Wolfe condition parameter for line search.
``c_2``: 0.9
Note
The second Wolfe condition parameter for line search.
``func_noise_ratio``: 1e-16
Note
Noise level (ratio), defined as the deviation of the function value from its ideal smooth counterpart, relative to the function value. If the function is smooth, set this to zero.
``N_points_derivative``: 3
Note
The number of sample points used for derivative calculations. If set to an even number, it is automatically increased by 1 to make it odd. Given
N, the total function call batch size for one function evaluation and derivative calculation is computed as: 1 + D * (N - 1), whereDis the dimensionality of the problem.
L-BFGS Algorithm
L-BFGS is a quasi-Newton method that approximates the Hessian matrix for unconstrained optimization rpoblem. The implementation here utilizes the back-end code from the LBFGSpp library (https://github.com/yixuan/LBFGSpp), which provides L-BFGS updates and Hessian approximation functionality.
Minion implements a customized derivative calculation method that ensures both vectorization and noise robustness. To improve stability under noise, the derivative is computed using an adaptive step size, and a noise-robust Lanczos derivative is employed.
Additionally, function calls are vectorized, meaning the objective function and its derivative can be evaluated in a single batch. This batch execution can be further parallelized using multithreading or multiprocessing, leading to significant computational efficiency improvements.
Reference: Liu, D. C.; Nocedal, J. (1989). “On the Limited Memory Method for Large Scale Optimization”. *Mathematical Programming B, 45(3): 503-528.*
Algorithm Name : "L_BFGS"
Parameters
``max_iterations``: 15000
Note
The maximum number of iterations allowed for the algorithm.
``m``: 15
Note
The number of previous iterations used to approximate the Hessian matrix.
``g_epsilon``: 1e-8
Note
The absolute gradient convergence tolerance.
``g_epsilon_rel``: 0.0
Note
The relative gradient convergence tolerance.
``f_reltol``: 1e-8
Note
The function value convergence tolerance.
``max_linesearch``: 20
Note
The maximum number of iterations allowed during line search.
``c_1``: 1e-3
Note
The first Wolfe condition parameter for line search.
``c_2``: 0.9
Note
The second Wolfe condition parameter for line search.
``func_noise_ratio``: 1e-16
Note
Noise level (ratio), defined as the deviation of the function value from its ideal smooth counterpart, relative to the function value. If the function is smooth, set this to zero.
``N_points_derivative``: 3
Note
The number of sample points used for derivative calculations. If set to an even number, it is automatically increased by 1 to make it odd. Given
N, the total function call batch size for one function evaluation and derivative calculation is computed as: 1 + D * (N - 1), whereDis the dimensionality of the problem.