[R] Subject: Re ZINB by Newton Raphson??

(Ted Harding) Ted.Harding at manchester.ac.uk
Wed Jun 23 00:21:30 CEST 2010


Regarding Newton-Raphson, I would add that there are certain kinds
of function for which the method *cannot* work.

N-R is essentially a root-finding technique, and when used for
finding a minimum (or maximum) it is applied to finding the root
of the deriviative.

As an example of a class of functions for which N-R will fail
to find a root, consider

  f(x) = -(-x)^a  for x < 0
  f(x) =  x^a     for x > 0
  f(0) =  0

where 0 < a <= 1/2.

With x[1] as starting point, if a = 1/2 then the N-R iteration
will alternate between +x[1] and -x[1].

If a < 1/2, then x[n+1] = -x[n]*(1-a)/a and will take values
with alternate signs which are at increasing distances (by the
factor (1-a)/a > 1) from the root x=0.

Hence in all these cases N-R will not converge. Only when a > 1/2
will it work. And the above is only one special class of functions
for which, for similar reasons, N-R will not converge.

The corresponding class of functions for which N-R would not find
a minimum are such that f(x) above is proportional to the derivatve,
e.g.

  F(x) = (-x)^(a+1)  for x < 0
  F(x) = x^(a+1)     for x < 0
  F(0) = 0

Ted.

On 22-Jun-10 20:52:26, Achim Zeileis wrote:
> John,
> 
> thanks for the comments, very useful.
> 
> Just three short additions specific to the ZINB case:
>    1. zeroinfl() doesn't uses BFGS (by default) and not Newton-Raphson
>       because we have analytical gradients but not an analytical
> Hessian.
>    2. If the default starting values do not work, zeroinfl() offers EM
>       estimation of the starting values (which is typically followed by
>       a single iteration of BFGS only). EM is usually much more robust
>       but slower, hence it's not the default in zeroinfl().
>    3. I pointed the original poster to this information but he still
>       insisted on Newton-Raphson for no obvious reason. As I didn't
>       want to WTFM again on the list, I stopped using up bandwidth.
> 
> thx,
> Z
> 
> On Tue, 22 Jun 2010, Prof. John C Nash wrote:
> 
>> I have not included the previous postings because they came out very 
>> strangely on my mail reader. However, the question concerned the
>> choice of 
>> minimizer for the zeroinfl() function, which apparently allows any of
>> the 
>> current 6 methods of optim() for this purpose. The original poster
>> wanted to 
>> use Newton-Raphson.
>>
>> Newton-Raphson (or just Newton for simplicity) is commonly thought to
>> be the 
>> "best" way to approach optimization problems. I've had several people
>> ask me 
>> why the optimx() package (see OptimizeR project on r-forge -- probably
>> soon 
>> on CRAN, we're just tidying up) does not have such a choice. Since the
>> question comes up fairly frequently, here is a response. I caution
>> that it is 
>> based on my experience and others may get different mileage.
>>
>> My reasons for being cautious about Newton are as follows:
>> 1) Newton's method needs a number of safeguards to avoid singular or 
>> indefinite Hessian issues. These can be tricky to implement well and
>> to do so 
>> in a way that does not hinder the progress of the optimization.
>> 2) One needs both gradient and Hessian information, and it needs to be
>> accurate. Numerical approximations are slow and often inadequate for
>> tough 
>> problems.
>> 3) Far from a solution, Newton is often not very good, likely because
>> the 
>> Hessian is not like a nice quadratic over the whole space.
>>
>> Newton does VERY well at converging when it has a "close enough"
>> start. If 
>> you can find an operationally useful way to generate such starts, you
>> deserve 
>> awards like the Fields.
>>
>> We have in our optimx work (Ravi Varadhan and I) developed a prototype
>> safeguarded Newton. As yet we have not included it in optimx(), but
>> probably 
>> will do so in a later version after we figure out what advice to give
>> on 
>> where it is appropriate to apply it.
>>
>> In the meantime, I would suggest that BFGS or L-BFGS-B are the closest
>> options in optim() and generally perform quite well. There are updates
>> to 
>> BFGS and CG on CRAN in the form of Rvmmin and Rcgmin which are all-R 
>> implementations with box constraints too. UCMINF is a very similar 
>> implementation of the unconstrained algorithm that seems to have the
>> details 
>> done rather well -- though BFGS in optim() is based on my work, I
>> actually 
>> find UCMINF often does better. There's also nlm and nlminb.
>>
>> Via optimx() one can call these, and also some other minimizers, or
>> even 
>> "all.methods", though that is meant for learning about methods rather
>> than 
>> solving individual problems.
>>
>> JN
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--------------------------------------------------------------------
E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 22-Jun-10                                       Time: 23:21:27
------------------------------ XFMail ------------------------------



More information about the R-help mailing list