Algorithm Details

This section provides detailed explanations of the optimization algorithms implemented in the Minion library, along with their key parameters and how to configure them.

Each optimization algorithm has a set of parameters that can be adjusted to influence the optimization process. These parameters can be obtained and modified as follows:

C++ Example:

In C++, you can retrieve the default settings for an algorithm and adjust the parameters as needed:

minion::DefaultSettings settings;
// Retrieve the default settings for the LSHADE algorithm
std::map<std::string, minion::ConfigValue> options = settings.getDefaultSettings("LSHADE");

// Override default parameters
options["population_size"] = 50;

// Initialize the minimizer with custom settings
auto min = minion::Minimizer(rosenbrock_vect, bounds, {}, nullptr, nullptr, "LSHADE", max_evals, -1, options);

// Perform optimization
auto min_result = min.optimize();

Python Example:

In Python, you can pass the parameters via a dictionary when creating the Minimizer object. Here’s an example for configuring and using the ARRDE algorithm:

import minionpy as mpy

options = {
    "population_size"           : 0,           # Auto sizing based on dimension and budget when set to 0
    "minimum_population_size"   : 4,           # Terminal population size (cannot be below 4)
    "bound_strategy"            : "reflect-random"  # Boundary handling policy
}

# Use ARRDE algorithm for optimization
min = mpy.Minimizer(func=objective_function,
                    x0=x0,
                    bounds=[(-10, 10)] * dimension,
                    algo="ARRDE",
                    maxevals=10000,
                    callback=None,
                    seed=None,
                    options=options)

# Run optimization
result = min.optimize()

Note

The optional x0 argument can contain multiple initial guesses. Population-based algorithms, such as Differential Evolution variants, LSHADE, and PSO, use these samples to initialise their populations. For algorithms that evolve a single incumbent solution (e.g., CMA-ES, Nelder-Mead, L-BFGS, L-BFGS-B), Minion evaluates the provided guesses first and starts from the best-performing one.

Each algorithm comes with a set of configurable parameters that affect its behavior during the optimization process. For example, the population_size parameter controls the number of candidate solutions in the population, while the mutation_rate defines the probability of modifying a candidate solution during the mutation process.

Please refer to the respective algorithm section for detailed descriptions of each parameter.

Differential Evolution (DE)

Differential Evolution (DE) is a population-based stochastic optimization algorithm that applies mutation, crossover, and selection to evolve a population of candidate solutions.

Reference : Storn, R and Price, K, Differential Evolution - a Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, Journal of Global Optimization, 1997, 11, 341 - 359.

Algorithm name : "DE"

Parameters :

population_size: 0

Note

Population size. 0 lets the solver choose its heuristic default from problem dimension and budget.
mutation_rate: 0.5

Note

Differential weight \(F\) used to scale mutation steps.
crossover_rate: 0.8

Note

Binomial or exponential crossover probability \(CR\).
mutation_strategy: best1bin

Note

Mutation / crossover strategy. Available strategies include "best1bin", "best1exp", "rand1bin", "rand1exp", "current_to_best1bin", "current_to_best1exp", "current_to_pbest1bin", "current_to_pbest1exp", "current_to_pbest_A_1bin", "current_to_pbest_A_1exp", "current_to_pbest_AW_1bin", and "current_to_pbest_AW_1exp".
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance. Optimization stops when the population fitness spread falls below this threshold.
bound_strategy: reflect-random

Note

Boundary handling policy. Available strategies: "random", "reflect-random", "clip".

Particle Swarm Optimization (PSO)

Minion implements several particle swarm variants. The canonical PSO uses a global-best topology and the standard inertia / acceleration update.

Reference: Kennedy, J. and Eberhart, R., “Particle Swarm Optimization,” Proc. IEEE International Conference on Neural Networks, 1995.

Algorithm name : "PSO"

Parameters :

population_size: 0

Note

Swarm size. If set to 0, it defaults to 5 * D where D is the problem dimension.
inertia_weight: 0.7

Note

Inertia weight \(\omega\) that controls how much momentum each particle carries between steps.
cognitive_coefficient: 1.5

Note

Personal acceleration constant \(c_1\) steering particles toward their own best position.
social_coefficient: 1.5

Note

Social acceleration constant \(c_2\) that pulls particles toward the global best.
velocity_clamp: 0.2

Note

Fraction of the search range used as a velocity limit. 0 disables clamping.
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on swarm fitness values. Smaller values require the swarm to contract more before stopping.
bound_strategy: reflect-random

Note

Boundary handling policy. Available strategies: "random", "reflect-random", "clip".

SPSO-2011

This implementation mirrors the stochastic PSO 2011 variant proposed by Clerc and Bratton, including adaptive informant topologies and hypersphere sampling.

Reference: M. Zambrano-Bigiarini, M. Clerc and R. Rojas, “Standard Particle Swarm Optimisation 2011 at CEC-2013: A baseline for future PSO improvements,” 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 2013, pp. 2337-2344, doi: 10.1109/CEC.2013.6557848.

Algorithm name : "SPSO2011"

Parameters :

population_size: 0

Note

Swarm size. 0 selects the heuristic default of 5 * D.
inertia_weight: 0.729844

Note

Inertia parameter \(\omega\) controlling exploration versus exploitation.
cognitive_coefficient: 1.49618

Note

Cognitive acceleration constant influencing attraction toward each particle’s personal best.
social_coefficient: 1.49618

Note

Social acceleration constant drawing particles toward informants and the global best.
phi_personal: 1.49618

Note

Parameter used in the stochastic hypersphere sampling for the personal component.
phi_social: 1.49618

Note

Parameter used in the stochastic hypersphere sampling for the social component.
neighborhood_size: 3

Note

Number of neighbors considered when building the adaptive informant topology.
informant_degree: 3

Note

Degree of the communication graph; controls how many informants each particle has.
velocity_clamp: 0.0

Note

Maximum velocity as a fraction of the search range. 0 disables the clamp.
normalize: False

Note

Whether to transform search vectors into a normalised unit hypercube before mapping back to bounds.
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on swarm fitness values.
bound_strategy: reflect-random

Note

Boundary handling policy. Available strategies: "random", "reflect-random", "clip".

Dynamic Multi-Swarm PSO (DMSPSO)

Dynamic Multi-Swarm PSO partitions the swarm into co-operative sub-swarms that periodically regroup, blending global and local attraction.

Algorithm name : "DMSPSO"

Parameters :

population_size: 0

Note

Swarm size. 0 enables the adaptive default of 5 * D particles.
inertia_weight: 0.7

Note

Inertia term balancing exploration and exploitation across sub-swarms.
cognitive_coefficient: 1.2

Note

Personal attraction strength toward each particle’s historical best.
social_coefficient: 1.0

Note

Global attraction weight toward the swarm-wide best.
local_coefficient: 1.4

Note

Influence of the current sub-swarm best position.
global_coefficient: 0.8

Note

Influence of the overall global best retained across regroupings.
subswarm_count: 4

Note

Number of co-operative sub-swarms maintained before regrouping.
regroup_period: 5

Note

Iterations between sub-swarm regrouping events.
velocity_clamp: 0.2

Note

Fraction of the search range that caps particle velocities. 0 disables clamping.
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on swarm fitness values.
bound_strategy: reflect-random

Note

Boundary handling policy. Available strategies: "random", "reflect-random", "clip".

LSHADE-cnEpSin

Ensemble Sinusoidal LSHADE with covariance learning augments LSHADE with an ensemble of sinusoidal F-adaptation schemes and a local covariance matrix for crossover.

Reference: N. H. Awad, M. Z. Ali and P. N. Suganthan, “Ensemble sinusoidal differential covariance matrix adaptation with Euclidean neighborhood for solving CEC2017 benchmark problems,” 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 2017, pp. 372-379, doi: 10.1109/CEC.2017.7969336.

Algorithm name : "LSHADE_cnEpSin"

Parameters :

population_size: 0

Note

Initial population size. 0 chooses the default 18 * D heuristic.
memory_size: 5

Note

Number of entries stored in the shared scaling-factor, crossover-rate, and frequency memories.
archive_rate: 1.4

Note

Ratio between archive size and current population size.
minimum_population_size: 4

Note

Final population size reached by linear population reduction.
p_best_fraction: 0.11

Note

Fraction of top individuals eligible for the current-to-p-best mutation.
rotation_probability: 0.4

Note

Probability of switching to eigen-space crossover with covariance learning.
neighborhood_fraction: 0.5

Note

Fraction of the population used to estimate the covariance matrix.
freq_init: 0.5

Note

Initial value for the sinusoidal frequency adaptation.
learning_period: 20

Note

Window size for tracking ensemble success statistics.
sin_freq_base: 0.5

Note

Base frequency for the first sinusoidal strategy.
epsilon: 1e-8

Note

Small constant used to stabilise probability updates.
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on population fitness values.
bound_strategy: reflect-random

Note

Boundary handling policy. Available strategies: "random", "reflect-random", "clip".

CMA-ES

Covariance Matrix Adaptation Evolution Strategy samples offspring from an evolving multivariate normal distribution whose covariance is adapted from successful steps.

Reference: N. Hansen and A. Ostermeier, “Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation,” Proceedings of IEEE International Conference on Evolutionary Computation, Nagoya, Japan, 1996, pp. 312-317, doi: 10.1109/ICEC.1996.542381.

Algorithm name : "CMAES"

Parameters :

population_size: 0

Note

Number of offspring sampled each generation. 0 derives the default 4 + 3 \log(D) value.
mu: 0

Note

Number of parents contributing to the new mean. 0 sets it to population_size / 2.
initial_step: 0.3

Note

Initial global step size, scaled by the average bound width.
cc: 0.0

Note

Cumulation parameter for the covariance evolution path.
cs: 0.0

Note

Cumulation parameter for the step-size control path.
c1: 0.0

Note

Learning rate for rank-one covariance updates.
cmu: 0.0

Note

Learning rate for the rank-\mu covariance update.
damps: 0.0

Note

Damping parameter controlling how aggressively the step size adapts.
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on offspring fitness values.
bound_strategy: reflect-random

Note

Boundary handling policy for infeasible samples. Available strategies: "random", "reflect-random", "clip".

BIPOP aCMA-ES

BIPOP aCMA-ES is an adaptive bi-population variant of CMA-ES that alternates between runs with small and large population sizes to balance local refinement and global exploration. It includes an adaptive restart mechanism to allocate budget across these runs.

Reference: Nikolaus Hansen. 2009. Benchmarking a BI-population CMA-ES on the BBOB-2009 function testbed. In Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers (GECCO ‘09). Association for Computing Machinery, New York, NY, USA, 2389–2396. https://doi.org/10.1145/1570256.1570333.

Algorithm name : "BIPOP_aCMAES"

Parameters :

population_size: 0

Note

Number of offspring sampled each generation. 0 derives the default value based on the dimensionality of the problem.
max_iterations: 100000

Note

Maximum number of iterations (generations) allowed per run.
initial_step: 0.3

Note

Initial global step size (sigma), scaled by the average bound width.
bound_strategy: reflect-random

Note

Boundary handling policy for infeasible samples. Available strategies: "random", "reflect-random", "clip".

Adaptive Restart-Refine Differential Evolution (ARRDE)

ARRDE is an extension of the Differential Evolution algorithm with adaptive restart and refinement strategies.

Algorithm name : "ARRDE"

Parameters :

population_size: 0

Note

Initial population size. If set to 0, ARRDE computes

\[N = \max\!\left(2D,\; \operatorname{clip}\big(D \cdot m(\eta), 4, 3000\big)\right)\]

where \(D\) is dimensionality, \(\eta = \tfrac{N_{\text{maxevals}}}{D}\), and

\[\begin{split}m(\eta) = \begin{cases} 2.0, & \log_{10} \eta \le 2,\\ 2.0 + 5.756 (\log_{10} \eta - 2)^{1.609}, & \text{otherwise.} \end{cases}\end{split}\]

\(N_{\text{maxevals}}\) is the maximum number of function evaluations.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect", "reflect-random", "clip", "periodic", "none".

Grey Wolf Optimizer Differential Evolution (GWO-DE)

GWO-DE combines Differential Evolution with the Grey Wolf Optimizer, leveraging the social structure of wolves for optimization.

Algorithm name : "GWO_DE"

Parameters :

population_size: 0

Note

The number of individuals in the population. If set to 0, a default size based on problem dimension is used.
mutation_rate: 0.5

Note

The probability of mutating a parameter value during evolution.
crossover_rate: 0.7

Note

The probability of recombining parent and mutant vectors.
elimination_prob: 0.1

Note

The probability of eliminating a wolf from the population.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip".

j2020 Algorithm

The j2020 algorithm is a variation of Differential Evolution that incorporates parameter-specific strategies for mutation and recombination.

Reference : J. Brest, M. S. Maučec and B. Bošković, “Differential Evolution Algorithm for Single Objective Bound-Constrained Optimization: Algorithm j2020,” 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 2020, pp. 1-8, doi: 10.1109/CEC48606.2020.9185551.

Algorithm name : "j2020"

Parameters :

population_size: 0

Note

Initial population size (N). If set to 0, it will be automatically determined as follows:

\[N = 8 \cdot D\]
tau1: 0.1

Note

The value of tau1 variable. The value must be between 0 and 1.
tau2: 0.1

Note

The value of tau1 variable. The value must be between 0 and 1.
myEqs: 0.4

Note

The value of myEqs variable. The value must be between 0 and 1.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip".

LSRTDE Algorithm

The LSRTDE algorithm. Designed to solve CEC2024.

Reference : V. Stanovov and E. Semenkin, “Success Rate-based Adaptive Differential Evolution L-SRTDE for CEC 2024 Competition,” 2024 IEEE Congress on Evolutionary Computation (CEC), Yokohama, Japan, 2024, pp. 1-8, doi: 10.1109/CEC60901.2024.10611907.

Algorithm name : "LSRTDE"

Parameters :

population_size: 0

Note

nitial population size (N). If set to 0, it will be automatically determined as follows:

\[N = 20 \cdot D\]

where D is the dimensionality of the problem.
memory_size: 5

Note

Memory size for storing the values of CR and F.
success_rate: 0.5

Note

The success rate required for selecting an individual for the next generation.
bound_strategy: random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip".

NLSHADE-RSP Algorithm

NLSHADE-RSP is an extension of the SHADE algorithm designed to solve CEC2021 problems.

Reference : V. Stanovov, S. Akhmedova and E. Semenkin, “NL-SHADE-RSP Algorithm with Adaptive Archive and Selective Pressure for CEC 2021 Numerical Optimization,” 2021 IEEE Congress on Evolutionary Computation (CEC), Kraków, Poland, 2021, pp. 809-816, doi: 10.1109/CEC45853.2021.9504959.

Algorithm name : "NLSHADE_RSP"

Parameters :

population_size: 0

Note

Initial population size (N). If set to 0, it will be automatically determined.

\[N = 30 \cdot D\]

where D is the dimensionality of the problem.
memory_size: 100

Note

Memory size for storing the values of CR and F
archive_size_ratio: 2.6

Note

The ratio of the archive size relative to the population size.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

JADE Algorithm

JADE is a variant of Differential Evolution that introduces adaptive strategies for mutation and selection.

Reference : J. Zhang and A. C. Sanderson, “JADE: Adaptive Differential Evolution With Optional External Archive,” in IEEE Transactions on Evolutionary Computation, vol. 13, no. 5, pp. 945-958, Oct. 2009, doi: 10.1109/TEVC.2009.2014613.

Algorithm name : "JADE"

Parameters :

population_size: 0
Note

Initial population size (N). If set to 0, it will be automatically determined as follows:
- If the dimensionality \(D\) of the problem is \(D < 10\), then \(N = 30\).
- If \(10 \leq D \leq 30\), then \(N = 100\).
- If \(30 < D \leq 50\), then \(N = 200\).
- If \(50 < D \leq 70\), then \(N = 300\).
- Else, \(N = 400\).
c: 0.1

Note

The value of c variable. The value must be between 0 and 1.
mutation_strategy: current_to_pbest_A_1bin

Note

Mutation strategy used in the optimization process. Available strategies: "best1bin", "best1exp", "rand1bin", "rand1exp", "current_to_pbest1bin", "current_to_pbest1exp", "current_to_pbest_A_1bin", "current_to_pbest_A_1exp".
archive_size_ratio: 1.0

Note

The ratio of the archive size relative to the population size.
minimum_population_size: 4

Note

The minimum population size.
reduction_strategy: linear

Note

Strategy used to reduce the population size. Available strategies: "linear", "exponential", "agsk".
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on population fitness values.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

jSO Algorithm

The jSO algorithm is a variant of LSHADE, designed to solve CEC2017 problems.

Reference : J. Brest, M. S. Maučec and B. Bošković, “Single objective real-parameter optimization: Algorithm jSO,” 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 2017, pp. 1311-1318, doi: 10.1109/CEC.2017.7969456.

Algorithm name : "jSO"

Parameters :

population_size: 0

Note

Initial population size (N). If set to 0, it will be automatically determined as:

\[N = 25 \cdot \log(D) \cdot \sqrt{D}\]

where D is the dimensionality of the problem.
memory_size: 5

Note

Memory size for storing the values of CR and F.
archive_size_ratio: 1.0

Note

The ratio of the archive size relative to the population size.
minimum_population_size: 4

Note

The minimum population size.
reduction_strategy: linear

Note

Strategy used to reduce the population size. Available strategies: "linear", "exponential", "agsk".
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on population fitness values.
bound_strategy: random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

LSHADE Algorithm

Linear Population Reduction - Success History Adaptive Differential Evolution (LSHADE) algorithm. Originally designed to solve CEC2014.

Reference : R. Tanabe and A. S. Fukunaga, “Improving the search performance of SHADE using linear population size reduction,” 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 2014, pp. 1658-1665, doi: 10.1109/CEC.2014.6900380.

Algorithm name : "LSHADE"

Parameters :

population_size: 0

Note

Initial population size (N). If set to 0, it will be automatically determined.

\[N = 18 \cdot D\]

where D is the dimensionality of the problem.
memory_size: 6

Note

Memory size for storing the values of CR and F.
mutation_strategy: current_to_pbest_A_1bin

Note

Mutation strategy used in the optimization process. Available strategies: "best1bin", "best1exp", "rand1bin", "rand1exp", "current_to_pbest1bin", "current_to_pbest1exp", "current_to_pbest_A_1bin", "current_to_pbest_A_1exp".
archive_size_ratio: 2.6

Note

The ratio of the archive size relative to the population size.
minimum_population_size: 4

Note

The minimum population size.
reduction_strategy: linear

Note

The strategy for reducing the population size over time. Options include linear, exponential, or agsk.
bound_strategy: random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

IMODE Algorithm

Improved Multi-Operator Differential Evolution (IMODE) blends multiple mutation strategies with adaptive control parameters and a linear population reduction schedule to balance exploration and exploitation on challenging multimodal problems.

Reference: Karam M. Sallam, Saber M. Elsayed, Ripon K. Chakrabortty, and Michael J. Ryan. 2020. Improved Multi-operator Differential Evolution Algorithm for Solving Unconstrained Problems. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE Press, 1–8. https://doi.org/10.1109/CEC48606.2020.9185577

Algorithm name : "IMODE"

Parameters :

population_size: 0

Note

Initial population size. If set to 0, it follows the IMODE rule

\[N = \min(5000, \max(18 \cdot D, 6 \cdot D^2))\]

where D is the dimensionality of the problem.
minimum_population_size: 4

Note

Terminal population size used during linear reduction. Must be at least 4.
memory_size: 0

Note

Size of the success-history memory for adaptive parameters. 0 applies the heuristic 20 \cdot D.
archive_size_ratio: 2.6

Note

Archive size relative to the population size.
bound_strategy: reflect-random

Note

Boundary handling policy. Available strategies: "random", "reflect", "reflect-random", "clip", "periodic", "none".

AGSK Algorithm

Adaptive Gaining-Sharing Knowledge-based (AGSK) algorithm. Designed for the CEC 2020 benchmark suite, AGSK relies on junior/senior knowledge sharing phases, adaptive knowledge pools, and the AGSK population-size reduction policy.

Reference: A. W. Mohamed, A. A. Hadi, A. K. Mohamed and N. H. Awad, “Evaluating the Performance of Adaptive GainingSharing Knowledge Based Algorithm on CEC 2020 Benchmark Problems,” 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 2020, pp. 1-8, doi: 10.1109/CEC48606.2020.9185901.

Algorithm name : "AGSK"

Parameters :

population_size: 0

Note

Initial population size (N). If set to 0, it follows the reference rule

\[\begin{split}N = \begin{cases} 40 \cdot D, & D > 5,\\ 100, & \text{otherwise} \end{cases}\end{split}\]

where D is the dimensionality of the problem.
minimum_population_size: 12

Note

Minimum population size allowed during AGSK’s nonlinear reduction schedule. Must be >= 4.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect", "reflect-random", "clip", "periodic", "none".

Artificial Bee Colony (ABC)

The Artificial Bee Colony (ABC) algorithm is a swarm intelligence-based optimization algorithm inspired by the foraging behavior of honey bees.

Algorithm name : "ABC"

Parameters :

population_size: 0

Note

Initial population size (N). If set to 0, it will be automatically determined.

\[N = 5 \cdot D\]

where D is the dimensionality of the problem.
limit: 100

Note

Scout trigger threshold. A food source is reinitialised after this many unsuccessful trials.
convergence_tol: 1e-4

Note

Diversity-based convergence tolerance on colony fitness values.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

Dual Annealing (DA)

Dual Annealing combines simulated annealing with local search to provide a flexible and robust global optimization algorithm.

Reference : Tsallis C, Stariolo DA. Generalized Simulated Annealing. Physica A, 233, 395-406 (1996).

Algorithm name : "DA"

Parameters :

acceptance_par: -5.0

Note

The acceptance parameter controlling the probability of accepting worse solutions. The value must be between -1.0e+4 and -5.
visit_par: 2.67

Note

The parameter controlling the annealing rate during the search. The value must be between 1.0 and 3.0.
initial_temp: 5230.0

Note

The initial temperature for the annealing process. The value must be between 0.01 and 5.0e+4.
restart_temp_ratio: 2e-5

Note

The temperature ratio for restart condition. The value must be between 0 and 1.
use_local_search: true

Note

Whether to use local search (e.g., L-BFGS-B) to refine the solutions.
local_search_algo: L_BFGS_B

Note

The local search algorithm to be used. Available : "NelderMead" and "L_BFGS_B"
func_noise_ratio: 1e-16

Note

Relative noise level used by the local-search derivative approximation.
N_points_derivative: 3

Note

Number of sample points used by the local-search derivative approximation. Even values are rounded up internally.
convergence_tol: 1e-4

Note

Objective-space convergence tolerance used by the annealing routine.
bound_strategy: periodic

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

Nelder-Mead Algorithm

The Nelder-Mead algorithm is a derivative-free optimization method that relies on reflection, expansion, contraction, and shrinkage to search for an optimum.

Reference : Nelder, John A.; R. Mead (1965). “A simplex method for function minimization”. Computer Journal. 7 (4): 308–313. doi:10.1093/comjnl/7.4.308.

Algorithm name : "NelderMead"

Parameters :

locality_factor: 0.05

Note

The factor controlling the step size for reflection and expansion during optimization.
convergence_tol: 1e-4

Note

Simplex-size / function-spread convergence tolerance used by the Nelder-Mead stopping test.
bound_strategy: reflect-random

Note

Method for handling boundary violations. Available strategies: "random", "reflect-random", "clip", "periodic".

L-BFGS-B Algorithm

L-BFGS-B is a quasi-Newton method that approximates the Hessian matrix while handling bound constraints. The implementation here utilizes the back-end code from the LBFGSpp library (https://github.com/yixuan/LBFGSpp), which provides L-BFGS-B updates and Hessian approximation functionality.

Minion implements a customized derivative calculation method that ensures both vectorization and noise robustness. To improve stability under noise, the derivative is computed using an adaptive step size, and a noise-robust Lanczos derivative is employed.

Additionally, function calls are vectorized, meaning the objective function and its derivative can be evaluated in a single batch. This batch execution can be further parallelized using multithreading or multiprocessing, leading to significant computational efficiency improvements.

Reference : Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. (1995). “A Limited Memory Algorithm for Bound Constrained Optimization”, SIAM J. Sci. Comput. 16 (5): 1190–1208.

Algorithm Name : "L_BFGS_B"

Parameters

``max_iterations``: 100000

Note

The maximum number of iterations allowed for the algorithm.
``m``: 10

Note

The number of previous iterations used to approximate the Hessian matrix.
``g_epsilon``: 1e-5

Note

The absolute gradient convergence tolerance.
``g_epsilon_rel``: 0.0

Note

The relative gradient convergence tolerance.
``f_reltol``: 1e-9

Note

The function value convergence tolerance.
``max_linesearch``: 20

Note

The maximum number of iterations allowed during line search.
``c_1``: 1e-3

Note

The first Wolfe condition parameter for line search.
``c_2``: 0.9

Note

The second Wolfe condition parameter for line search.
``func_noise_ratio``: 0.0

Note

Noise level (ratio), defined as the deviation of the function value from its ideal smooth counterpart, relative to the function value. If the function is smooth, set this to zero.
``N_points_derivative``: 3

Note

The number of sample points used for derivative calculations. If set to an even number, it is automatically increased by 1 to make it odd. Given N, the total function call batch size for one function evaluation and derivative calculation is computed as: 1 + D * (N - 1), where D is the dimensionality of the problem.

L-BFGS Algorithm

L-BFGS is a quasi-Newton method that approximates the Hessian matrix for unconstrained optimization rpoblem. The implementation here utilizes the back-end code from the LBFGSpp library (https://github.com/yixuan/LBFGSpp), which provides L-BFGS updates and Hessian approximation functionality.

Minion implements a customized derivative calculation method that ensures both vectorization and noise robustness. To improve stability under noise, the derivative is computed using an adaptive step size, and a noise-robust Lanczos derivative is employed.

Additionally, function calls are vectorized, meaning the objective function and its derivative can be evaluated in a single batch. This batch execution can be further parallelized using multithreading or multiprocessing, leading to significant computational efficiency improvements.

Reference: Liu, D. C.; Nocedal, J. (1989). “On the Limited Memory Method for Large Scale Optimization”. *Mathematical Programming B, 45(3): 503-528.*

Algorithm Name : "L_BFGS"

Parameters

``max_iterations``: 100000

Note

The maximum number of iterations allowed for the algorithm.
``m``: 10

Note

The number of previous iterations used to approximate the Hessian matrix.
``g_epsilon``: 1e-5

Note

The absolute gradient convergence tolerance.
``g_epsilon_rel``: 0.0

Note

The relative gradient convergence tolerance.
``f_reltol``: 1e-9

Note

The function value convergence tolerance.
``max_linesearch``: 20

Note

The maximum number of iterations allowed during line search.
``c_1``: 1e-3

Note

The first Wolfe condition parameter for line search.
``c_2``: 0.9

Note

The second Wolfe condition parameter for line search.
``func_noise_ratio``: 0.0

Note

Noise level (ratio), defined as the deviation of the function value from its ideal smooth counterpart, relative to the function value. If the function is smooth, set this to zero.
``N_points_derivative``: 3

Note

The number of sample points used for derivative calculations. If set to an even number, it is automatically increased by 1 to make it odd. Given N, the total function call batch size for one function evaluation and derivative calculation is computed as: 1 + D * (N - 1), where D is the dimensionality of the problem.