# How to choose the right optimization algorithm?

I need to find the minimum of a function. Reading the docs at http://docs.scipy.org/doc/scipy/reference/optimize.html I see that there are several algorithms that do the same thing, i.e. find the minimum. How do I know which one I should choose?

some of the algorithm listed

• Minimize a function using the downhill simplex algorithm.
• Minimize a function using the BFGS algorithm.
• Minimize a function with nonlinear conjugate gradient algorithm.
• Minimize the function f using the Newton-CG method.
• Minimize a function using modified Powell’s method.

My function is linear. dimensionality is around 232750(this is how many different gradients I have to compute each time), it takes about 2 mins to compute the gradient and the cost once, so not cheap. I don’t think I have constraints. it is deterministic and continuous.

Based on what you said: I assume you have to optimize for 50 variables; I also assume that you are having a situation that it is very expensive to find analytical derivatives (let alone get numericals out) and that your optimization is unconstrained.

Let me stress, you are a bit unluckily cause between 25-30 and 100 variables it is a bit of twilight zone when it comes to choosing between large or small scale optimization routines. Having said that though, nothing is lost.

Given that even first order derivative are expensive to get that kind off kills the idea of Newton’s method. You might have some luck with Quasi-Newton (BFGS) though if your Hessian is slightly diagonal like to start with. C-G is usually a bit slower than BFGS so probably that won’t be improve things a lot; use it if memory is also an issue (or just go for L-BFGS in that case).
Additionally given how slow it is to evaluate your function, a simple steepest descent/line search algorithm would be tortuously slow; same things goes with Simulated Annealing and other random search variants (I assume you don’t have access to HMC and all that jazz).

So, when you need the best bang for your buck when it comes to a single function evaluation: Go with Powell’s method and also test COBYLA; despite being a constrained optimization algorithm because it will internally linear approximate your function’s gradient to speed up things, it will be able to take advantage of your function’s linearity. Also definitely try NLopt for Python. They have a lot of gradient-free optimizers; try UOBYQA; it’s Powell’s brainchild also (Unconstrained Optimization BY Quadratic Approximations).

Very briefly : The N-CG algorithms depends on computing the Hessian, and your Hessian seems very expensive to compute. NLCG and BFGS don’t require it although the might try to try compute it once in their first step.

I left out the simplex algorithm on purpose because it’s a totally different beast; nothing to do with gradients as such. Try it but I can’t really comment on it; it is really dependent on your problem’s nature.

For a first good reference on numerical optimization C.T.Kelly’s book Iterative Methods for Optimization will get you quite far, quite nicely.