minFunc

Mark Schmidt (2005-2012)

minFunc is a Matlab function for unconstrained optimization of differentiable real-valued multivariate functions using line-search methods. It uses an interface very similar to the Matlab Optimization Toolbox function fminunc, and can be called as a replacement for this function. On many problems, minFunc requires fewer function evaluations to converge than fminunc (or minimize.m). Further it can optimize problems with a much larger number of variables (fminunc is restricted to several thousand variables), and uses a line search that is robust to several common function pathologies.

The default parameters of minFunc call a quasi-Newton strategy, where limited-memory BFGS updates with Shanno-Phua scaling are used in computing the step direction, and a bracketing line-search for a point satisfying the strong Wolfe conditions is used to compute the step direction. In the line search, (safeguarded) cubic interpolation is used to generate trial values, and the method switches to an Armijo back-tracking line search on iterations where the objective function enters a region where the parameters do not produce a real valued output (i.e. complex, NaN, or Inf).

Features

Some highlights of the non-default features present in minFunc:

Usage

minFunc uses an interface very similar to Matlab's fminunc. If you currently call 'fminunc(@myFunc,x0,options,myFuncArg1,myFuncArg2)', then you can use minFunc instead by simply replacing 'fminunc' with 'minFunc'. Note that by default minFunc assumes that the gradient is supplied, unless the 'numDiff' option is set to 1 (for forward-differencing) or 2 (for central-differencing). minFunc supports many of the same parameters as fminunc (but not all), but has some differences in naming and also has many parameters that are not available for fminunc. 'help minFunc' will give a list of parameters and their explanation.

Example

'example_minFunc' gives an example of running the various limited-memory solvers in minFunc with default options. It requires rosenbrock.m (the 2D Rosenbrock "banana" function) from the minimize.m webpage (minimize.m is also included in the demo if it is found on the path). To run the example, in Matlab type:
>> cd minFunc_2012
>> addpath(genpath(pwd))
>> example_minFunc
Running the example should produce the following output:
Result after 25 evaluations of limited-memory solvers on 2D rosenbrock:
---------------------------------------
x1 = 0.0000, x2 = 0.0000 (starting point)
x1 = 1.0000, x2 = 1.0000 (optimal solution)
---------------------------------------
x1 = 0.8725, x2 = 0.7569 (minimize.m by C. Rasmussen)
x1 = 0.3654, x2 = 0.1230 (minFunc with steepest descent)
x1 = 0.4974, x2 = 0.2452 (minFunc with cyclic steepest descent)
x1 = 0.8756, x2 = 0.7661 (minFunc with spectral gradient descent)
x1 = 0.5840, x2 = 0.3169 (minFunc with Hessian-free Newton)
x1 = 0.7478, x2 = 0.5559 (minFunc with preconditioned Hessian-free Newton)
x1 = 1.0010, x2 = 1.0020 (minFunc with conjugate gradient)
x1 = 0.7907, x2 = 0.6256 (minFunc with scaled conjugate gradient)
x1 = 0.9794, x2 = 0.9491 (minFunc with preconditioned conjugate gradient)
x1 = 1.0000, x2 = 1.0000 (minFunc with limited-memory BFGS - default)
A note on the above performance results: The default parameters of minFunc were tuned based on log-linear statistical models (not on standard optimization problems like Rosenbrock's function).

To illustrate the use of Hessian-vector products and preconditioning, running example_minFunc_LR will show how to train a logistic regression classifier under various configurations of the Hessian-vector product and preconditioning function.

A very useful function that comes with the minFunc package is the derivativeCheck function that numerically checks the user-supplied gradient and/or Hessian, as well as the fastDerivativeCheck function which is more suitable for high-dimensional problems. Running example_derivativeCheck will show how to call these functions.

I have also prepared a more extensive set of examples of using minFunc on the webpage Examples of using minFunc.

Download

The complete set of .m and .mex files for the current version of minFunc are available here

Updates

2013: Two minor updates for using minFunc in Octave (thanks to Jean-Francois Cardoso): 2012: A bunch of minor changes in response to feedback and some other changes that I've been wanting to make for a while. Most notably, the core L-BFGS code is now faster, and several nice enhancements to the derivative-checking functionality were added. 2009: Another set of minor changes in response to feedback (including mex files for some 64-bit architectures), but also some non-trivial changes to the conjugate gradient and dense quasi-Newton methods (this version is available here). Some highlights include: 2008: I put an updated version of minFunc on this webpage (this version is available here). Most of the changes made were minor and were made in response to user feedback. Here are some highlights: 2007: Since a bunch of people have been using it, I put a version of minFunc online at this webpage (this first version is still available here).

Common Problems

There are several situations that account for the majority of the unsatisfactory behaviour of minFunc (with the first being the most common):
  1. The user-supplied gradient code is wrong. It is recommended to use the derivativeCheck options (or function) to make sure that the gradient code roughly matches numerically computed derivatives.
  2. minFunc runs out of memory. By default, minFunc uses a large number of corrections in the L-BFGS method. For problems with a very large number of variables, the 'Corr' parameter should be decreased (i.e. if you run out of memory on the 10th iteration, try setting 'Corr' to 9 or lower).
  3. minFunc returns before a solution of satisfactory accuracy is achieved. In this case, decreasing optTol and/or progTol will typically fix the problem, but in some cases the objective function or variables need to be re-scaled.
  4. Each iteration requires many function evaluations. In this case, changing the line search initialization (options.LS_init) from its default value can sometimes improve performance (i.e. setting it to 2 will use a quadratic approximation to initialize the line search).
  5. You get the error "??? Undefined function or method 'lbfgsAddC'". This occurs if Matlab can't find the mex file for your operating system. If this occurs, you can either: (i) set 'options.useMex' to 0, or (ii) compile the mex file for your operating system by running the mexAll function. If you end up compiling a file in minFunc for your operating system, please send it to me and I will add to the distribution.
  6. The objective function is not appropriate (this is a modeling issue, not an optimization problem). You need to make sure that a minimum of the function is actually the result you want, and that you don't need constraints.
  7. When using the pure Newton method, the default options in minFunc assume that the Hessian is positive-definite or close to positive-definite. If this is not the case, the 'hessianModify' option may need to be changed (to 1, 2, or 3, for example) in order to obtain good performance.

Constraints, L1-Regularization, Derivative-Free Methods, Trust Region, Coordinate Descent

Previous versions of minFunc included extra functions that solved certain types of constrained optimization problems (such as 'minFuncBC', that optimized with lower and/or upper bounds on the variables). These functions are no longer included but several constrained optimization methods are available in minConf. Alternately, Michael Zibulevsky has written a penalty/barrier multiplier method for large-scale constrained optimization that uses minFunc as a sub-routine, available here (I have not used this code). For non-smooth optimization where the only source of the non-differentiability is due to the presence of an L1-regularization term, see L1General.

I have also written trust-region, derivative-free, and coordinate descent codes. In some scenarios, the second-order trust-region code requires fewer function evaluations than minFunc (based on line-searches), while some of the derivative-free methods can be applied to problems that are continuous but not necessarily differentiable (i.e. if the function uses absolute values). The coordinate descent method can use additional structure present in the problem to (for very structured problems) give better performance. These methods are not available in the on-line package, contact me if you want these additional methods.

Running on Graphics Cards

Rainer Heintzmann has a written package that allows some Matlab operations to be performed very efficiently on graphics cards. This package and a discussion of modifying minFunc to use this package are available here (I have not used this code).

Other Issues

Sometimes, we want to minimize a function of x that has the interface myFunc(y,x,arg1,arg2) instead of myFunc(x,arg1,arg2), such as when using Matlab's object oriented programming functionality. To switch the argument order, you can use an anonymous function:
myObj = @(x)f(y,x,arg1,arg2);
minFunc(myObj,x_init);
To reference minFunc in a publication, please include my name and a link to this website. You may also want to include the date, since I may update the software in the future.


Mark Schmidt > Software > minFunc