[R] Cautioning optim() users about "Nelder-Mead" default - (originally) Optim instability

John C Frain frainj at gmail.com
Sun Nov 15 22:31:33 CET 2015


In econometrics it was common to start an optimization with Nelder-Mead and
then switch to one of the other algorithms to finish the optimization. As
John Nash states NM gets one close. switching then speeds the final
solution.

John

John C Frain
3 Aranleigh Park
Rathfarnham
Dublin 14
Ireland
www.tcd.ie/Economics/staff/frainj/home.html
mailto:frainj at tcd.ie
mailto:frainj at gmail.com

On 15 November 2015 at 20:05, Mark Leeds <markleeds2 at gmail.com> wrote:

> and just to add to john's comments, since he's too modest, in my
> experience,  the algorithm in the rvmmin  package ( written by john ) shows
> great improvement compared to the L-BFGS-B  algorithm so I don't use
> L-BFGS-B anymore.  L-BFGS-B often has a dangerous convergence issue  in
> that it can claim to converge when it hasn't. which, to
> me is worse than not converging.  Most likely it has to do with the link
> below which was pointed out to me by john a while back.
>
> http://www.ece.northwestern.edu/~morales/PSfiles/acm-remark.pdf
>
>
> On Sun, Nov 15, 2015 at 2:41 PM, ProfJCNash <profjcnash at gmail.com> wrote:
>
> > Agreed on the default algorithm issue. That is important for users to
> > know, and I'm happy to underline it. Also that CG (which is based on one
> > of my codes) should be deprecated. BFGS (also based on one of my codes
> > from long ago) does much better than I would ever have expected.
> >
> > Over the years I've tried different Nelder-Mead implementations. Cannot
> > say I've found any that is always better than that in optim() (also
> > based on an old code of mine), though nmkb() from dfoptim package seems
> > to do better a lot of the time, and it has a transformation method for
> > bounds, which may be useful, but does have the awkwardness that one
> > cannot start on a bound. For testing a function, I don't think it makes
> > a lot of difference which variant of NM one uses if the trace is on to
> > catch never-ending runs. For production use, it is a really good idea to
> > try different methods on a sample of likely cases and choose a method
> > that does well. That is the motivation for the optimx package or the
> > opm() function of the newer optimz (on R-forge) that I'm still
> > polishing. optimz has a function optimr() that has the same call as
> > optim() but incorporates over a dozen optimizers via method = "(selected
> > method)".
> >
> > As a gradient-free choice, the Powell codes from minqa or other packages
> > (there are several implementations) can sometimes have spectacular
> > performance, but they also flub rather more regularly than Nelder-Mead
> > in my experience. That is, when they are good, they are very very good,
> > and when they are not they are horrid. (Plagiarism!)
> >
> > JN
> >
> > On 15-11-15 12:46 PM, Ravi Varadhan wrote:
> > > Hi John,
> > > My main point is not about Nelder-Mead per se.  It is *primarily* about
> > the Nelder-Mead implementation in optim().
> > >
> > > The users of optim() should be cautioned regarding the default
> algorithm
> > and that they should consider alternatives such as "BFGS" in optim(), or
> > other implementations of Nelder-Mead.
> > >
> > > Best regards,
> > > Ravi
> > > ________________________________________
> > > From: ProfJCNash <profjcnash at gmail.com>
> > > Sent: Sunday, November 15, 2015 12:21 PM
> > > To: Ravi Varadhan; 'r-help at r-project.org'; lorenzo.isella at gmail.com
> > > Cc: bhh at xs4all.nl; Gabor Grothendieck
> > > Subject: Re: Cautioning optim() users about "Nelder-Mead" default -
> > (originally) Optim instability
> > >
> > > Not contradicting Ravi's message, but I wouldn't say Nelder-Mead is
> > > "bad" per se. It's issues are that it assumes the parameters are all on
> > > the same scale, and the termination (not convergence) test can't use
> > > gradients, so it tends to get "near" the optimum very quickly -- say
> > > only 10% of the computational effort -- then spends an awful amount of
> > > effort deciding it's got there. It often will do poorly when the
> > > function has nearly "flat" zones e.g., long valley with very low slope.
> > >
> > > So my message is still that Nelder-Mead is an unfortunate default -- it
> > > has been chosen I believe because it is generally robust and doesn't
> > > need gradients. BFGS really should use accurate gradients, preferably
> > > computed analytically, so it would only be a good default in that case
> > > or with very good approximate gradients (which are costly
> > > computationally).
> > >
> > > However, if you understand what NM is doing, and use it accordingly, it
> > > is a valuable tool. I generally use it as a first try BUT turn on the
> > > trace to watch what it is doing as a way to learn a bit about the
> > > function I am minimizing. Rarely would I use it as a production
> > minimizer.
> > >
> > > Best, JN
> > >
> > > On 15-11-15 12:02 PM, Ravi Varadhan wrote:
> > >> Hi,
> > >>
> > >>
> > >>
> > >> While I agree with the comments about paying attention to parameter
> > >> scaling, a major issue here is that the default optimization
> algorithm,
> > >> Nelder-Mead, is not very good.  It is unfortunate that the optim
> > >> implementation chose this as the "default" algorithm.  I have several
> > >> instances where people have come to me with poor results from using
> > >> optim(), because they did not realize that the default algorithm is
> > >> bad.  We (John Nash and I) have pointed this out before, but the R
> core
> > >> has not addressed this issue due to backward compatibility reasons.
> > >>
> > >>
> > >>
> > >> There is a better implementation of Nelder-Mead in the "dfoptim"
> > package.
> > >>
> > >>
> > >>
> > >> ​require(dfoptim)
> > >>
> > >> mm_def1 <- nmk(par = par_ini1, min.perc_error, data = data)
> > >>
> > >> mm_def2 <- nmk(par = par_ini2, min.perc_error, data = data)
> > >>
> > >> mm_def3 <- nmk(par = par_ini3, min.perc_error, data = data)
> > >>
> > >> print(mm_def1$par)
> > >>
> > >> print(mm_def2$par)
> > >>
> > >> print(mm_def3$par)
> > >>
> > >>
> > >>
> > >> In general, better implementations of optimization algorithms are
> > >> available in packages such as "optimx", "nloptr".  It is unfortunate
> > >> that most naïve users of optimization in R do not recognize this.
> > >> Perhaps, there should be a "message" in the optim help file that
> points
> > >> this out to the users.
> > >>
> > >>
> > >>
> > >> Hope this is helpful,
> > >>
> > >> Ravi
> > >>
> > >>
> > >
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list