[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

ProfJCNash profjcnash at gmail.com
Sun Nov 15 18:21:21 CET 2015


Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
"bad" per se. It's issues are that it assumes the parameters are all on
the same scale, and the termination (not convergence) test can't use
gradients, so it tends to get "near" the optimum very quickly -- say
only 10% of the computational effort -- then spends an awful amount of
effort deciding it's got there. It often will do poorly when the
function has nearly "flat" zones e.g., long valley with very low slope.

So my message is still that Nelder-Mead is an unfortunate default -- it
has been chosen I believe because it is generally robust and doesn't
need gradients. BFGS really should use accurate gradients, preferably
computed analytically, so it would only be a good default in that case
or with very good approximate gradients (which are costly
computationally).

However, if you understand what NM is doing, and use it accordingly, it
is a valuable tool. I generally use it as a first try BUT turn on the
trace to watch what it is doing as a way to learn a bit about the
function I am minimizing. Rarely would I use it as a production minimizer.

Best, JN

On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> Hi,
> 
>  
> 
> While I agree with the comments about paying attention to parameter
> scaling, a major issue here is that the default optimization algorithm,
> Nelder-Mead, is not very good.  It is unfortunate that the optim
> implementation chose this as the "default" algorithm.  I have several
> instances where people have come to me with poor results from using
> optim(), because they did not realize that the default algorithm is
> bad.  We (John Nash and I) have pointed this out before, but the R core
> has not addressed this issue due to backward compatibility reasons. 
> 
>  
> 
> There is a better implementation of Nelder-Mead in the "dfoptim" package.
> 
>  
> 
> ​require(dfoptim)
> 
> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
> 
> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
> 
> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
> 
> print(mm_def1$par)
> 
> print(mm_def2$par)
> 
> print(mm_def3$par)
> 
>  
> 
> In general, better implementations of optimization algorithms are
> available in packages such as "optimx", "nloptr".  It is unfortunate
> that most naïve users of optimization in R do not recognize this. 
> Perhaps, there should be a "message" in the optim help file that points
> this out to the users. 
> 
>  
> 
> Hope this is helpful,
> 
> Ravi
> 
>



More information about the R-help mailing list