Welcome to Optimithon’s documentation!

Introduction

This code aims to provide flexible and extendable implementation of various standard and experimental optimization methods. The development is gradual and priority will be given to those methods that seem to be more exciting to implement (based on personal interests).

This project was mainly motivated by my lake of knowledge in numerical optimization, otherwise there are excellent (open source) tools for almost every task that a machine can do. This occurred while I was working on an other scientific project on global optimization irene.rtfd.io, which tends to provide a tool to find a lower bound for the global minimum of a function when the function and constraints are (algebraically) well presented.

Requirements

Dependencies are minimal as the code is mainly written in python:

Download

Optimithon can be obtained from https://github.com/mghasemi/optimithon.

Installation

To install Optimithon, run the following in terminal:

sudo python setup.py install

Documentation

The documentation is produced by Sphinx and is intended to cover code usage as well as a bit of theory to explain each method briefly. For more details refer to the documentation at optimithon.rtfd.io.

License

This code is distributed under MIT license:

MIT License

Copyright (c) 2018 Mehdi Ghasemi

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Optimization Problem

A typical optimization problem can be formulated as

(1)\[\begin{split}\left\lbrace \begin{array}{lll} \min_x & f(x) & \\ \textrm{subject to} & & \\ & g_i(x)\ge0 & i=1,\dots,m\\ & \textrm{and} & \\ & h_j(x)=0 & j=1,\dots,k \end{array} \right.\end{split}\]

The problem (1) is called a constrained optimization. If the constraints \(g_i(x)\ge0\) and \(h_j(x)=0\) are absent then the (1) is called an unconstrained optimization problem.

This package attempts to implement various methods to solve the unconstrained optimization problem. Then by employing the barrier functions method, the resulted code for unconstrained problem is modified to solve the general form of (1).

The user interface for both unconstrained and constrained optimization problems is the same, but some parameters are ignored for unconstrained problems. A minimal code to solve (1) would look like the following:

from Optimithon import Base, QuasiNewton # import the essentials
f = # definition of the objective function
ineqs = [g_i for i in range(m)] # the list of inequality constraints: g_i >= 0.
eqs = [h_j for j in range(k)] # the list of equality constraints: h_j == 0.
OPTIM = Base(f, # the objective function (mandatory)
         ineq=ineqs, # inequality constraints
         eq = eqs, # equality constraints
         x0=x0, # an initial point, a numpy array
         )
OPTIM() # run the optimization procedure
print(OPTIM.solution) # show the outcome

Iterative Optimization Methods

The iterative (unconstrained) optimization methods are the most popular optimization methods to approximate a (local) minimum of a given function. Generally, an iterative method uses a patter like the following:

  • With the objective function \(f\) and an initial guess for the minimum \(x=x_0\):
  • Repeat:
    • Find a descent direction \(p_n\) at point \(x_n\),
    • Find a positive value \(\alpha\) such that \(f(x_n+\alpha p_n)\) is a reasonable decrease compare to \(f(x_n)\),
    • Update \(x_{n+1}=x_{n}+\alpha p_n\),
  • Until a termination criterion is satisfied.
  • Return \(x_n\) as an approximation for a local minimum of \(f\).

Variations of the iterative methods focus on finding a suitable descent direction \(p_n\), as well as suitable value for \(\alpha\) and a termination strategy.

To determine a suitable direction some methods use first or second (or even higher orders) derivatives of \(f\). Those methods that do not use derivatives are called derivative free methods.

The derivative base methods are implemented in QuasiNewton module.

Derivative Based Methods

Derivative Based Methods are a class of iterative optimization methods that calculate a descent direction using gradient and/or hessian of the objective function.

For more details on the methods introduced in this chapter refer to [JNSJW].

[JNSJW](1, 2) J. Nocedal, S. J. Wright, Numerical Optimization, 2nd ed., Springer, New York, NY, USA (2006).

Descent Direction

The following is the list of implemented methods to find a descent direction.

Gradient descent direction

This method chooses the backward gradient direction to achieve the direction which results in steepest decrease in the values of the objective function:

\[p_n=-\nabla f(x_n)\]

Newton Conjugate Gradient method

The Newton conjugate gradient method uses the followin descent direction:

\[p_{n+1}=\nabla^2f(x_n)^{-1}\nabla f(x_n),\]

where \(\nabla^2f(x_n)\) is the Hessian of \(f\) at \(x_n\).

Fletcher-Reeves method

The direction suggested by R. Fletcher and C. M. Reeves in 1964 [RFCMR]. Let \(p_0=-\nabla f(x_0)\) and

\[p_n=\frac{\nabla f(x_n)^T\nabla f(x_n)}{\|\nabla f(x_{n-1})\|^2}p_{n-1}-\nabla f(x_n).\]
[RFCMR]R. Fletcher and C. M. Reeves, Function minimization by conjugate gradients, Comput. J. 7 (1964), 149–154.

Polak–Ribiere method

Suggested by E. Polak and G. Ribiere in 1969 [EPGR]. Let \(p_0=-\nabla f(x_0)\) and

\[p_n=\frac{\nabla f(x_n)^T(\nabla f(x_n)-\nabla f(x_{n-1}))}{\|\nabla f(x_{n-1})\|^2}p_{n-1}-\nabla f(x_n).\]
[EPGR]E. Polak and G. Ribiere, Note sur la convergence de directions conjuguee, Rev. Francaise Informat Recherche Operationelle, 3e Annee 16 (1969), 35–43.

Hestenes-Stiefel method

Suggested by M. R. Hestenes and E. Stiefel in 1953 [MRHES]. Let \(p_0=-\nabla f(x_0)\) and

\[p_n=\frac{\nabla f(x_n)^T(\nabla f(x_n)-\nabla f(x_{n-1}))}{(\nabla f(x_n)-\nabla f(x_{n-1}))^Tp_{n-1}}p_{n-1}-\nabla f(x_n).\]
[MRHES]M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953).

Dai-Yuan method

Suggested by Y.-H. Dai and Y. Yuan in 1999 [YHDYY]. Let \(p_0=-\nabla f(x_0)\) and

\[p_n=\frac{\|\nabla f(x_n)\|^2}{(\nabla f(x_n)-\nabla f(x_{n-1}))p_{n-1}}p_{n-1}-\nabla f(x_n).\]
[YHDYY]Y.-H. Dai and Y. Yuan, A nonlinear conjugate gradient method with a strong global convergence property, SIAM J. Optim. 10 (1999), no. 1, 177–182.

Davidon-Fletcher-Powell method

Let \(H_0=\nabla^2f(x_0)^{-1}\) and

\[\begin{split}\begin{array}{lcl} H_n & = & H_{n-1}+\frac{(x_n - x_{n-1})^T(x_n - x_{n-1})}{(x_n - x_{n-1})^T(\nabla f(x_n)-\nabla f(x_{n-1}))}\\ & - & \frac{H_{n-1}(\nabla f(x_n)-\nabla f(x_{n-1}))(\nabla f(x_n)-\nabla f(x_{n-1}))^TH_{n-1}}{(\nabla f(x_n)- \nabla f(x_{n-1}))^TH_{n-1}(\nabla f(x_n)-\nabla f(x_{n-1}))}, \end{array}\end{split}\]

then \(p_n=-H_n\nabla f(x_n)\) (see [WCD] and [RF]).

[WCD]W. C. Davidon, Variable metric method for minimization, SIAM Journal on Optimization, 1: 1–17 (1991).
[RF](1, 2) R. Fletcher, Practical methods of optimization (2nd ed.), New York: John Wiley & Sons (1987).

Broyden-Fletcher-Goldfarb-Shanno method

Let \(H_0=\nabla^2f(x_0)^{-1}\) and

\[\begin{split}\begin{array}{lcl} H_n & = & \left(I-\frac{(x_n - x_{n-1})(\nabla f(x_n)-\nabla f(x_{n-1}))^T}{(\nabla f(x_n)-\nabla f(x_{n-1}))^T(x_n - x_{n-1})}\right)\\ & \times & H_{n-1}\\ & \times & \left(I-\frac{(\nabla f(x_n)-\nabla f(x_{n-1}))(x_n - x_{n-1})^T}{(\nabla f(x_n)-\nabla f(x_{n-1}))^T(x_n - x_{n-1})}\right)\\ & + & \frac{(x_n - x_{n-1})(x_n - x_{n-1})^T}{(\nabla f(x_n)-\nabla f(x_{n-1}))^T(x_n - x_{n-1})}, \end{array}\end{split}\]

then \(p_n=-H_n\nabla f(x_n)\) (see [RF]).

Broyden’s method

Let \(H_0=\nabla^2f(x_0)^{-1}\) and

\[\begin{split}\begin{array}{lcl} H_n & = & H_{n-1}\\ & + & \frac{((x_n- x_{n-1})-H_{n-1}(\nabla f(x_n)-f(x_{n-1})))(x_n- x_{n-1})^TH_{n-1}} {(x_n- x_{n-1})^TH_{n-1}(\nabla f(x_n)-\nabla f(x_{n-1}))}, \end{array}\end{split}\]

then \(p_n=-H_n\nabla f(x_n)\) (see [CGB]).

[CGB]C. G. Broyden, A Class of Methods for Solving Nonlinear Simultaneous Equations. Math. of Comput. AMS. 19 (92): 577–593 (1965).

Symmetric Rank-One (SR1) method

Let \(H_0=\nabla^2f(x_0)^{-1}\) and

\[\begin{split}\begin{array}{lcl} H_n & = & H_{n-1}\\ & + & \frac{((x_n- x_{n-1})-H_{n-1}(\nabla f(x_n)-\nabla f(x_{n-1}))((x_n- x_{n-1})-H_{n-1}(\nabla f(x_n)-\nabla f(x_{n-1}))^T} {((x_n- x_{n-1})-H_{n-1}(\nabla f(x_n)-\nabla f(x_{n-1}))^T(\nabla f(x_n)-\nabla f(x_{n-1}))}, \end{array}\end{split}\]

then \(p_n=-H_n\nabla f(x_n)\) (see [RHB]).

[RHB]R. H. Byrd Analysis of a Symmetric Rank-One Trust Region Method, SIAM J. Optim 6(4) (1996).

Line Search methods

In every iteration, beside finding a descent direction, the algorithm also requires the magnitude of the descent, denoted by \(\alpha\) in the algorithm. One popular method to find \(\alpha\) is called line search. The following is the list of line search methods implemented.

Barzilai-Borwein method

The length of the descent direction suggested by Barzilai-Borwein method [JBJMB] is calculated with the following formula:

\[\alpha=\frac{(x_n- x_{n-1})(\nabla f(x_n)-\nabla f(x_{n-1}))^T}{\|\nabla f(x_n)-\nabla f(x_{n-1})\|^2}.\]
[JBJMB]J. Barzilai, J. M. Borwein. Two-point step size gradient methods, IMA J. Numerical Analysis, 8(1):141–148 (1988).

Backtrack line search method

Backtrack line search is a generic algorithm relying in various conditions to approximate a suitable magnitude for the descent direction [JNSJW].

Starting with a maximum candidate step size value \(\alpha_0>0\), using search control parameters \(\tau\in(0,1)\) and \(c\in(0,1)\), the backtracking line search algorithm can be expressed as follows:

  • Set \(t=-cp_n\cdot\nabla f(x_n)\) and iteration counter \(j=0\).
  • Until a condition \(\dagger(\alpha_j, t)\) is satisfied, repeatedly increment \(j\) and set \(\alpha_j=\tau\alpha_{j-1}\).
  • Return \(\alpha_j\) as the solution.

The \(\dagger\) condition is usually one of the following:

  • Wolfe condition: \(p_n\cdot\nabla f(x_n+\alpha_j p_n)\ge t\)
  • Armijo condition: \(\alpha_jt\ge f(x_n+\alpha_jp_n)-f(x_n)\)
  • Goldstein condition:
    • \(f(x_n)+(1-c)\alpha_jt\leq f(x_n+\alpha_jp_n)\) and
    • \(f(x_n+\alpha_jp_n)\leq f(x_n)+\alpha_jt\)
  • Strong Wolfe condition:
    • \(f(x_n+\alpha_jp_n)\leq f(x_n)+c_1\alpha_jt\) and
    • \(|p_n\nabla f(x_n+\alpha_jp_n)|\leq c_2|t|\) for \(0<c_1<c_2<1\)
  • Binary Search method: \(f(x_n+\alpha_jp_n)<f(x_n)\)

Termination criterion

At the end of every iteration a termination criterion is evaluated to decide continuation or break of the loop. The following is a list of implemented methods:

Cauchy condition

Given the sequence of calculated points \((x_n)\), this condition checks whether the values of the objective are making enough progress or reached a limit point. In symbols, for \(\varepsilon>0\),

\[|f(x_n)-f(x_{n-1})|<\varepsilon.\]

Cauchy_x condition

Given the sequence of calculated points \((x_n)\), this condition checks whether this sequence is making enough progress or reached an approximate limit point. In symbols, for \(\varepsilon>0\),

\[\|x_n - x_{n+1}\|<\varepsilon.\]

ZeroGradient condition

This condition checks the size of gradient vector at each point found at the end of iteration. If the gradient vector is close enough to zero, then it means that the values of the objective will not make significant progress at any direction. In symbols, for \(\varepsilon>0\),

\[\|\nabla f(x_n)\|<\varepsilon.\]

Note that this condition may not be suitable to solve constrained optimization problems.

Constrained Optimization

It is possible to transform a constrained optimization problem (1) to an unconstrained one with some conditions on the solutions. A popular method to do so is known as barrier function where the objective function is modified to penalize the search method if any of the constraints is violated.

Barrier Function Method

Let \(\phi\) be a function that takes over relatively small values (compare to values of \(f\)) over positive reals and big values for negative ones. Then the function \(f(x)+\sum_{i=1}^m\phi(g_i(x))\) is fairly large values if \(x\) is outside of the feasibility region. Therefore, it is likely that a search method tends to focus on the values inside the feasibility region. Similarly, let \(\psi\) be a function that returns large positive values apart from \(0\). Then \(f(x)+\sum_{j=1}^k\psi(h_j(x))\) is large outside of the feasibility region and rather small on the feasibility region.

Since the values of continuous functions changes gradually, it seems implausible to be able to choose a function that successfully accomplish the above task. So, we employ an increasing sequence of positive numbers \((\sigma_n)\) that approaches to \(+\infty\) and find the optimum value for the function

\[\Lambda_n(x)=f(x)+\frac{1}{\sigma_n}\sum_{i=1}^m\phi(g_i(x))+\sigma_n\sum_{j=1}^k\psi(h_j(x)),\]

then the optimum values of \(\Lambda_n\) approaches the optimum value of \(f\) inside the feasibility region. Note that if the optimum value is located on the boundary of the feasibility region, this method would only produce an approximate interior substitute. This is why the method is referred to as interior point method.

The following options for the barrier function are implemented:

  • For the inequality conditions:
    • Carrol: is the standard barrier function defined by \(-\frac{1}{g_i(x)}\)
    • Logarithmic: is the standard barrier function defined by \(-\log(g_i(x))\)
    • Expn: is the standard barrier function defined by \(e^{-g_i(x)+\epsilon}\)
  • For the equality condition:
    • Courant: function which is simply defined by \(h_j^2(x)\).

Note

The value of \(\epsilon\) in the code is taken to be equal to \(\epsilon=\frac{1}{\sigma_n}\).

Benchmark Problems

We employ the QuasiNewton class to solve a few benchmark optimization problems. These benchmarks that are mainly taken from [MJXY].

[MJXY]M. Jamil, Xin-She Yang, A literature survey of benchmark functions for global optimization problems, IJMMNO, Vol. 4(2) (2013).

Rosenbrock Function

The original Rosenbrock function is \(f(x, y)=(1-x)^2 + 100(y-x^2)^2\) which is a sums of squares and attains its minimum at \((1, 1)\). The global minimum is inside a long, narrow, parabolic shaped flat valley. To find the valley is trivial. To converge to the global minimum, however, is difficult. The same holds for a generalized form of Rosenbrock function which is defined as:

\[f(x_1,\dots,x_n) = \sum_{i=1}^{n-1} 100(x_{i+1} - x_i^2)^2+(1-x_i)^2.\]

Since \(f\) is a sum of squares, and \(f(1,\dots,1)=0\), the global minimum is equal to 0. The following code optimizes the Rosenbrock function over \(9-x_i^2\ge0\) for \(i=1,\dots,9\):

from Optimithon import Base
from Optimithon import QuasiNewton
from numpy import array

NumVars = 9
fun = lambda x: sum([100 * (x[i + 1] - x[i]**2)**2 +
                     (1 - x[i])**2 for i in range(NumVars - 1)])
x0 = array([0 for _ in range(NumVars)])

OPTIM = Base(fun, ineq=[lambda x: 9 - x[i]**2 for i in range(NumVars)],
             br_func='Carrol',
             penalty=1.e6,
             method=QuasiNewton, x0=x0,
             t_method='Cauchy',
             dd_method='BFGS',
             ls_method='Backtrack',
             ls_bt_method='Armijo',
             )
OPTIM.Verbose = False
OPTIM.MaxIteration = 1500
OPTIM()
print(OPTIM.solution)

The result is:

objective: 1.55528520803e-11
x: [ 1.0000001   1.00000013  1.00000013  1.00000005  0.99999996  0.99999993
    1.00000007  0.99999997  0.99999989]
NumIteration: 55
NumFuncEval: 256
success: True
message: Progress in objective values less than error tolerance (Cauchy condition)
RunTime: 0.0378611087799
Family: Quasi-Newton method
Step Size: Backtrack
Backtrack Stop Criterion: Armijo
Descent Direction: BFGS
Termination Criterion: Cauchy
Barrier Function: Carrol
Penalty Factor: 1000000.0

Giunta Function

Giunta is an example of continuous, differentiable, separable, scalable, multimodal function defined by:

\[\begin{split}\begin{array}{lcl} f(x_1, x_2) & = & \frac{3}{5} + \sum_{i=1}^2[\sin(\frac{16}{15}x_i-1)\\ & + & \sin^2(\frac{16}{15}x_i-1)\\ & + & \frac{1}{50}\sin(4(\frac{16}{15}x_i-1))]. \end{array}\end{split}\]

The following code optimizes \(f\) when \(1-x_i^2\ge0\):

from Optimithon import Base
from Optimithon import QuasiNewton
from numpy import array, sin

fun = lambda x: .6 + (sin((16. / 15.) * x[0] - 1) + (sin((16. / 15.) * x[0] - 1))**2 + .02 * sin(4 * ((16. / 15.) * x[0] - 1))) + (
    sin((16. / 15.) * x[1] - 1) + (sin((16. / 15.) * x[1] - 1))**2 + .02 * sin(4 * ((16. / 15.) * x[1] - 1)))
x0 = array([0 for _ in range(2)])

OPTIM = Base(fun, ineq=[lambda x: 1 - x[i]**2 for i in range(2)],
             br_func='Carrol',
             penalty=1.e6,
             method=QuasiNewton, x0=x0,
             t_method='Cauchy_x',
             dd_method='BFGS',
             ls_method='Backtrack',
             ls_bt_method='Armijo',
             )
OPTIM.Verbose = False
OPTIM.MaxIteration = 1500
OPTIM()
print(OPTIM.solution)

The output looks like:

objective: 0.0644704205391
x: [ 0.46732003  0.46731857]
NumIteration: 8
NumFuncEval: 20
success: True
message: The progress in values of points is less than error tolerance (0.000000)
RunTime: 0.0011510848999
Family: Quasi-Newton method
Step Size: Backtrack
Backtrack Stop Criterion: Armijo
Descent Direction: BFGS
Termination Criterion: Cauchy_x
Barrier Function: Carrol
Penalty Factor: 1000000.0

Parsopoulos Function

Parsopoulos is defined as \(f(x,y)=\cos^2(x)+\sin^2(y)\). The following code computes its minimum where \(-5\leq x,y\leq5\):

from Optimithon import Base
from Optimithon import QuasiNewton
from numpy import array, sin, cos

fun = lambda x: cos(x[0])**2 + sin(x[1])**2
x0 = array((1., -2.))

OPTIM = Base(fun, ineq=[lambda x: 25. - x[j]**2 for j in range(2)],
             br_func='Carrol',
             penalty=1.e6,
             method=QuasiNewton, x0=x0,
             t_method='Cauchy_x',
             dd_method='BFGS',
             ls_method='BarzilaiBorwein',
             )
OPTIM.Verbose = False
OPTIM.MaxIteration = 1500
OPTIM()
print(OPTIM.solution)

The solution is the following:

objective: 7.48150734385e-16
x: [ 1.57079633 -3.14159263]
NumIteration: 33
NumFuncEval: 36
success: True
message: The progress in values of points is less than error tolerance (0.000000)
RunTime: 0.00488901138306
Family: Quasi-Newton method
Step Size: BarzilaiBorwein
Descent Direction: BFGS
Termination Criterion: Cauchy_x
Barrier Function: Carrol
Penalty Factor: 1000000.0

Shubert Function

Shubert function is defined by:

\[f(x_1,\dots,x_n) = \prod_{i=1}^n\left(\sum_{j=1}^5\cos((j+1)x_i+i)\right).\]

It is a continuous, differentiable, separable, non-scalable, multimodal function. The following code compares the result of five optimizers when \(-10\leq x_i\leq10\) and \(n=2\):

from Optimithon import Base
from Optimithon import QuasiNewton
from numpy import array, cos

fun = lambda x: sum([cos((j + 1) * x[0] + j) for j in range(1, 6)]) * \
    sum([cos((j + 1) * x[1] + j) for j in range(1, 6)])
x0 = array((1., -1.))

OPTIM = Base(fun, ineq=[lambda x: 100. - x[i]**2 for i in range(2)],
             br_func='Carrol',
             penalty=1.e6,
             method=QuasiNewton, x0=x0,
             t_method='Cauchy',
             dd_method='Gradient',
             ls_method='Backtrack',
             ls_bt_method='Armijo',
             )
OPTIM.Verbose = False
OPTIM.MaxIteration = 1500
OPTIM()
print(OPTIM.solution)

which results in:

objective: -18.09556507
x: [-7.06139727 -1.47136939]
NumIteration: 51
NumFuncEval: 1021
success: True
message: Progress in objective values less than error tolerance (Cauchy condition)
RunTime: 2.48312807083
Family: Quasi-Newton method
Step Size: Backtrack
Backtrack Stop Criterion: Armijo
Descent Direction: Gradient
Termination Criterion: Cauchy
Barrier Function: Carrol
Penalty Factor: 1000000.0

McCormick Function

McCormick function is defined by

\[f(x, y) = \sin(x+y) + (x-y)^2-1.5x+2.5y+1.\]

Attains its minimum at \(f(-.54719, -1.54719)\approx-1.9133\):

from Optimithon import Base
from Optimithon import QuasiNewton
from numpy import array, sin
from scipy.optimize import minimize

fun = lambda x: sin(x[0] + x[1]) + (x[0] - x[1]) ** 2 - 1.5 * x[0] + 2.5 * x[1] + 1.
x0 = array((0., 0.))

OPTIM = Base(fun,
             method=QuasiNewton, x0=x0,  # max_lngth=100.,
             t_method='Cauchy_x',  # 'Cauchy_x', 'ZeroGradient',
             dd_method='BFGS',
             # 'Newton', 'SR1', 'HestenesStiefel', 'PolakRibiere', 'FletcherReeves', 'Gradient', 'DFP', 'BFGS', 'Broyden', 'DaiYuan'
             ls_method='Backtrack',  # 'BarzilaiBorwein', 'Backtrack',
             ls_bt_method='Armijo',  # 'Armijo', 'Goldstein', 'Wolfe', 'BinarySearch'
             )
OPTIM.Verbose = False
OPTIM.MaxIteration = 1500
OPTIM()
print(OPTIM.solution)

The output will be:

objective: -1.91322295498
x: [-0.54719755 -1.54719755]
NumIteration: 9
NumFuncEval: 23
success: True
message: The progress in values of points is less than error tolerance (0.000000)
RunTime: 0.000735998153687
Family: Quasi-Newton method
Step Size: Backtrack
Backtrack Stop Criterion: Armijo
Descent Direction: BFGS
Termination Criterion: Cauchy_x
Barrier Function: Carrol
Penalty Factor: 100000.0

Code Documentation

‘base’ Module

This module implements general containers for typical optimization methods.

class base.Base(obj, **kwargs)[source]

This is the base class that serves all the iterative optimization methods. An object derived from Base requires the following parameters:

Parameters:
  • objMANDATORY- is a real valued function to be minimized
  • x0 – an initial guess of the optimal point
  • method – the optimization class which implements iterate and terminate procedures (default: OptimTemplate that returns the value of the function at the initial point x0)
  • VerboseBoolean- If True prompts messages at every stage of the iteration as well as termination
  • kwargs – the rest of parameters that will ba passed to method

The object then passes all other given parameters to the method class for further processes. When a termination condition is satisfied, the object fills the results in the solution attribute which is an instance of Solution class. The given class method can pass arbitrary pieces of information to the solution by modifying its MetaData dictionary.

When an object optim of type Base initiated, the optimization process can be invoked by calling the object itself like a function:

optim = Base(f, method=QuasiNewton, x0=init_point)
optim()
print(optim.solution)
class base.OptimTemplate(obj, **kwargs)[source]

Provides a template for an iterative optimization method.

Parameters:
  • obj – a real valued function (objective function)
  • x0 – an initial guess for a (local) minimum
  • jac – a vector calculating the gradient of the objective function (optional, if not given will be numerically approximated)
  • difftool – an object to calculate Gradient and Hessian of the objective (optional, default NumericDiff.Simple)
class base.Solution[source]

A class to keep outcome and details of the optimization run.

‘QuasiNewton’ Module

This module contains implementations of variations of unconstrained optimization methods known as Quasi-Newton methods. A Quasi-Newton method is an iterative algorithm that approximates a local minima of the objective function. Starting with a given initial point x0, each iteration consists of three major subprocedures:

  • Finding a descent direction (via DescentDirection class)
  • Finding the length of descent (using LineSearch class)
  • Check the termination condition (Termination class).

The right strategy to attempt an optimization problem must be pre-determined, otherwise, it uses a default set up to solve the problem.

class QuasiNewton.Barrier(QNRef, **kwargs)[source]

Implementation of some barrier functions to be used for constrained optimization problems. Three barrier functions are implemented:

  • Carrol: is the standard barrier function defined by \(-\frac{1}{g_i(x)}\)
  • Logarithmic: is the standard barrier function defined by \(-\log(g_i(x))\)
  • Expn: is the standard barrier function defined by \(e^{-g_i(x)+\epsilon}\)

The only barrier function implemented for the equality is the typical function known as Courant function which is simply \(h_j^2(x)\).

The default barrier function is Carrol and the default penalty factor is 10^{-5}. To specify the barrier function and penalty factor initiate the optimizer with keywords br_func that accepts one of the above three values and penalty that must be a positive real number.

class QuasiNewton.DescentDirection(QNRef, **kwargs)[source]

Implements various descent direction methods for Quasi-Newton methods. The descent method can be determined at initiation using dd_method parameter. The following values are acceptable:

  • ‘Gradient’: (default) The steepest descent direction.
  • ‘Newton’: Newton Conjugate Gradient method.
  • ‘FletcherReeves’: Fletcher-Reeves method.
  • ‘PolakRibiere’: Polak-Ribiere method.
  • ‘HestenesStiefel’: Hestenes-Stiefel method.
  • ‘DaiYuan’: Dai-Yuan method
  • ‘DFP’: Davidon-Fletcher-Powell formula.
  • ‘BFGS’: Broyden-Fletcher-Goldfarb-Shanno algorithm.
  • ‘Broyden’: Broyden’s method.
  • ‘SR1’: Symmetric rank-one method.

To calculate derivatives, the QuasiNewton class uses the object provided as the value of the difftool variable at initiation.

BFGS()[source]
Returns:the descent direction determined by Broyden-Fletcher-Goldfarb-Shanno algorithm
Broyden()[source]
Returns:the descent direction determined by Broyden’s method
DFP()[source]
Returns:the descent direction determined by Davidon-Fletcher-Powell formula
DaiYuan()[source]
Returns:the descent direction determined by Dai-Yuan method
FletcherReeves()[source]
Returns:the descent direction determined by Fletcher-Reeves method
Gradient()[source]
Returns:the gradient at current point
HestenesStiefel()[source]
Returns:the descent direction determined by Hestenes-Stiefel method
Newton()[source]
Returns:the descent direction determined by Newton Conjugate Gradient method
PolakRibiere()[source]
Returns:the descent direction determined by Polak-Ribiere method
SR1()[source]
Returns:the descent direction determined by Symmetric rank-one method
class QuasiNewton.LineSearch(QNref, **kwargs)[source]

This class provides the step length at each iteration. The value of ls_bt_method at initiation determines whether to use ‘BarzilaiBorwein’ method or a variation of ‘Backtrack’ (default). It accepts a control parameter tau and max_lngth at initiation. tau must be a positive real less than 1. The variation of the backtrack is then determined by the value of ls_bt_method which can be selected among the following:

  • ‘Armijo’: indicates Armijo condition and the parameter c1 can be modified at initiation as well.
  • ‘Wolfe’: indicates Wolfe condition and the parameters c1 and c2 can be modified at initiation.
  • ‘StrongWolfe’: indicates StrongWolfe condition and the parameters c1 and c2 can be modified at initiation.
  • ‘Goldstein’: indicates Goldstein condition and the parameter c1 can be modified at initiation.
Armijo(alpha, ft_x, tx)[source]

Implementation of Armijo.

Parameters:
  • alpha – current candidate for step length
  • ft_x – value of the objective at the candidate point
  • tx – the candidate point
Returns:

True or False

Backtrack()[source]

A generic implementation of Backtrack.

Returns:step length
BarzilaiBorwein()[source]

Implementation of Barzilai-Borwein.

Returns:step length
Goldstein(alpha, ft_x, tx)[source]

Implementation of Goldstein.

Parameters:
  • alpha – current candidate for step length
  • ft_x – value of the objective at the candidate point
  • tx – the candidate point
Returns:

True or False

StrongWolfe(alpha, ft_x, tx)[source]

Implementation of Strong Wolfe.

Parameters:
  • alpha – current candidate for step length
  • ft_x – value of the objective at the candidate point
  • tx – the candidate point
Returns:

True or False

Wolfe(alpha, ft_x, tx)[source]

Implementation of Wolfe.

Parameters:
  • alpha – current candidate for step length
  • ft_x – value of the objective at the candidate point
  • tx – the candidate point
Returns:

True or False

class QuasiNewton.QuasiNewton(obj, **kwargs)[source]
Parameters:
  • obj – the objective function
  • ineq(optional) list of inequality constraints, default: []
  • eq(optional) list of equality constraints, default: []
  • ls_method(optional) the line search strategy, default: Backtrack
  • ls_bt_method(optional) the backtrack termination condition, default: Armijo
  • dd_method(optional) the descent direction method, default: Gradient
  • t_method(optional) termination condition, default: Cauchy
  • br_func(optional) barrier function family, default: Carrol
  • penalty(optional) penalty factor for the barrier function, default: 1.e5
  • max_iter(optional) maximum number of iterations, default: 100

This class hosts a family of first and second order iterative methods to solve an unconstrained optimization problem. The general schema follows the following steps:

  • Given the point \(x\), find a suitable descent direction \(p\).
  • Find a suitable length \(\alpha\) for the direction \(p\) such that \(x+\alpha p\) results in an appropriate decrease in values of the objective.
  • Update \(x\) to \(x+\alpha p\) and repeat the above steps until a termination condition is satisfied.

The initial value for x can be set at initiation by passing x0=init_point to the Base instance. There are various methods to determine the descent direction p at each step. The DescentDirection class implements a variety of these methods. To choose one of these methods one should pass the method by its known name at initiation simply by setting dd_method=’method name’. This parameter will be passed to DescentDirection class (see the documentation for DescentDirection). Also, to determine a suitable value for \(alpha\) various options are available the class LineSearch is responsible for handling the computation for \(alpha\). The parameters ls_method and ls_bt_method can be set at initiation to determine the details for line search. The termination condition also can vary and the desired condition can be determined by setting t_method at initiation which will be passed to the Termination class. Each of these classes may accept other parameters that can be set at initiation. To find out about those parameter see the corresponding documentation.

iterate()[source]

This method updates the iterate method of the OptimTemplate by customizing the descent direction method as well as finding the descent step length. These method can be determined by the user.

Returns:None
terminate()[source]

This method updates the terminate method of the OptimTemplate which is given by user.

Returns:True or False
class QuasiNewton.Termination(QNRef, **kwargs)[source]

Implements various termination criteria for Quasi-Newton loop. A particular termination method can be selected at initiation of the Base object by setting t_method with the name of the method as an string. The following termination criteria are implemented:

  • ‘Cauchy’: Checks of the changes in the values of the objective function are significant enough or not.
  • ‘ZeroGradient’: Checks if the gradient of the objective is close to zero or not.

The value of the tolerated error is a property of OptimTemplate and hence can be modified as desired.

Cauchy()[source]

Checks if the values of the objective function form a Cauchy sequence or not.

Returns:True or False
Cauchy_x()[source]

Checks if the sequence of points form a Cauchy sequence or not.

Returns:True or False
ZeroGradient()[source]

Checks if the gradient vector is small enough or not.

Returns:True or False

‘NumericDiff’ Module

This module provides very basic way to numerically approximate partial derivatives, gradient and Hessian of a function.

class NumericDiff.Simple(**kwargs)[source]

A simple class to calculate partial derivatives of a given function. Passing a value for Infinitesimal forces the calculations to be done according to the infinitesimal value provided by user. Otherwise, the default value (1.e-7) is used.

Diff(f, i=0)[source]
Parameters:
  • f – a real valued function
  • i – the index variable for differentiation
Returns:

partial derivative of \(f\) with respect to :math:’i^{th}` variable as a function.

Gradient(f)[source]
Parameters:f – a real valued function
Returns:a vector function that returns the gradient vector of f at each point.
Hessian(f)[source]
Parameters:f – a real valued function
Returns:the Hessian matrix of \(f\) at each point

‘excpt’ Module

This module provides descriptive exception for the other modules.

exception excpt.DiffEror(*args)[source]

Errors that may have happened while calculating derivatives of functions.

exception excpt.DirectionError(*args)[source]

Handles the errors caused during the computation of a descent direction.

exception excpt.Error(*args)[source]

Generic errors that may occur in the course of a run.

exception excpt.MaxIterations(*args)[source]

Errors caused by reaching the preset maximum number of iterations.

exception excpt.Undeclared(*args)[source]

Raised when an undeclared function is used.

exception excpt.ValueRange(*args)[source]

Errors resulted from computations over unauthorized regions.

Indices and tables