summary: [R] numerical differentiation in R? (for optim "SANN" parscale)

Fri Jul 18 12:53:37 CEST 2003

Dear Wayne Jones, Ravi Varadhan, Roger D. Peng and Jerome Asselin,

Thank you for the helpful answers! I summarise them below and add my 
experiences:

The numerical differentiation
-----------------------------
 > Check out ?fdHess and run the example!
This was the solution. help(fdHess, package="nlme")

 > help(numericDeriv,package="nls")
This is a good solution, too. For my case, the above was more practical.

Running optim(..., method="SANN")
---------------------------------
 > You don't need to do any numerical differentiation in "optim", by
 > default it will automatically compute the derivatives via numerical
 > differentiation.
I was experimenting with optim a lot, and I found that "SANN" does not 
calculate derivatives.

 > For the other four methods 'optim' will do
 > numerical differentiation for you if a gradient is not provided.
This agrees with my observations.

 > 'optim' does not require any differentiation of the objective function
 > for the "SANN" method.
True, however, providing a 'parscale' based on the derivatives for 
"SANN" vastly accelerated its convergence. See below.

Role of 'parscale' optim(..., control=list(parscale=g, ...))
------------------------------------------------------------

For my function to optimise this was the solution:
library(nlme)
fd<-fdHess(start.values, modell.2)
g <- 1/fd$gradient
out<-optim(start.values, modell.2, method="SANN", hessian=TRUE, 
control=list(trace=2, parscale=g))

 > the 'parscale' argument has nothing to do with
 > differentiation.  As far as I know, it is used to scale the values of
 > the parameters before choosing candidates (so that they are roughly
 > comparable).
Differentiation was useful to examine the scales.

 > The help sais:
 > `parscale' A vector of scaling values for the parameters.
 >           Optimization is performed on `par/parscale' and these should
 >           be comparable in the sense that a unit change in any element
 >           produces about a unit change in the scaled value.

So, yes, 'parscale' is used to scale the parameters before choosing 
candidates. But choosing candidates seems to be critical: setting 
parscale to the reciprocials of the gradient values calculated at a good 
guess of the optimal parameters accelerated the convergence immensely.

In my case parscale values were very diverse, ranging from 1e-07 to 
1e+05. Without letting the optimisation procedure know these differences 
in the scales, it generated poor candidates.

Thanks you once more, and I hope you found my experiences useful.

Gábor

-- 
Gabor BORGULYA MD MSc
Semmelweis University of Budapest, 2nd Dept of Paediatrics
Hungarian Paediatric Cancer Registry