[Rd] Using the nls package

Douglas Bates bates@stat.wisc.edu
01 Aug 2000 10:03:44 -0500

Telford Tendys <telford@progsoc.uts.edu.au> writes:

> On Mon, Jul 31, 2000 at 07:20:32AM +0100, Prof Brian D Ripley wrote:
> > 
> > How can you be sure?  What are `the same numbers'?  The parameters
> > unchanged to machine precision, yes, but that's not what you print.
> The only safe way I can think of would be to go for machine precision
> as the default and then have some user-settable threshold for:
> 	( current residual / previous residual )

A more common way of defining a criterion like that would be
    (previous RSS - current RSS)/(previous RSS)
if you have ensured that your algorithm cannot increase the RSS
(residual sum-of-squares) from one iteration to the next.

However, any such criterion that is based on the progress of the
iterations, instead of being based solely on the current position as
the orthogonality criterion is, exposes you to the danger of declaring
convergence when the algorithm has not reached the optimum.
B. D. McCullough has written several papers comparing the performance
of different statistical and econometrics packages on test problems,
frequently the test problems called the StRD prepared by NIST, the
National Institute for Standards and Technology in the USA.  (There is
a package called NISTnls on CRAN that contains that data sets for the
nonlinear least squares problems in the StRD.  You can run all those
tests with the shell command 'R CMD check NISTnls' after installing
that package.)  Some of the references by McCullough are

B. D. McCullough (1998), "Assessing the Reliability of Statistical
  Software: Part I", The American Statistician, 52(4), 358-366
B. D. McCullough (1999), "Assessing the Reliability of Statistical
  Software: Part II", The American Statistician, 53(2), 149-159
B. D. McCullough (1999), "Econometric Software Reliability: E-VIEWS,
  LIMDEP, SHAZAM, and TSP", Journal of Applied Econometrics, 14(2),
B. D. McCullough and Berry Wilson (1999), "On the Accuracy of
  Statistical Procedures in Microsoft EXCEL 97", Computational
  Statistics and Data Analysis, 31(1), 27-37.
B. D. McCullough (1999), "Experiences with the StRD: Application and
  Interpretation", in _Computing Science and Statistics_, 31, 16-21.

In that last paper listed above, McCullough writes:
 The StRD has been applied both formally and informally to over a
 dozen of the most popular statistics and econometrics packages, and
 only two of them have not produced nonlinear "solutions" with zero
 digits of accuracy: the statistical package S-PLUS (with analytic
 derivatives) and the econometrics package TSP (which has automatic
 analytic differentiation).

As I had a hand in writing the nls functions for both S-PLUS and R, I
happen to know that the algorithms are similar although the
implementations are quite different.  These functions are not fooled
into declaring convergence prematurely because they use the
orthogonality convergence criterion.

If we were to start introducing all kinds of ad-hoc rules for
declaring convergence in nls, simply to handle those cases of
artificial data that are generated without noise, we would run the
risk of ruining the good behavior of an algorithm that gives a
reliable indication of convergence.

> > > Yes, this is understandable but if most people are like me (questionable
> > > assumption I admit) they try a few artificial examples first to get
> > > the general idea of how these things work. I guess there is a certain
> > > illogical assumption that if I try what seems to me to be an obviously
> > > easy problem and it cannot handle it then something must be wrong and
> > > trying harder problems is not going to help.
> > 
> > Perhaps you need to read up on the algorithmic theory so you are not so
> > easily confused.  NLS problems with an exact-fit solution are a different
> > sub-class with different effective algorithms.
> Undoubtably there is a lot more reading that I can do to improve
> my understanding. I'm just making a suggestion based on the observation
> that the algorithm actually found the right answer very quickly.

> I searched a few of the local bookshops (over the internet) and
> Diana Bates (with her children's stories) came up infinitely more often
> than Douglas Bates and his regression. I will have to search the
> University Co-op bookshop with my hands because the recent introduction of
> GST has blown up their webpage.
> Douglas, if you are reading this, what say you help me understand the
> code in nls.R and nls.c and I'll promise to buy your book (though it could
> take me a while to mail-order it in) ?
> > A pretty good reasonable assumption is that statistics does not work
> > in complexe numbers unless explicitly stated.
> Consider also the case for fitting higher dimensional REAL data.
> Admittedly, this is a contrived example but you may have an experiment
> with one input variable and three output variables. You know from theory
> that the output variables are all related to each other by some function
> but you need to find particular parameters in that function:
> > d <- data.frame( x=1:10 )
> > d$y <- eval( expression( outer( x, 1:3 )), d )
> > d$y <- d$y + rnorm( length( d$y ))
> > library( nls )
> > w <- nls( y ~ I( outer( x, P1 * 1:3 ) + P2 ), d, start=list( P1=3, P2=3 ))
> Error in qr.qty(QR, resid) : qr and y must have the same number of rows
> If the nls system supported these `wide' problems properly then complex
> number support could just be recast as a special case of the same.
> >From a mathematical point of view, this sort of regression can be recast
> into a normal regression but with more sample points (and each sample
> point repeated N times) and with a bogus input variable that gives the
> column number. Thus, the same fundamental algorithm must work for both
> cases if the structure of the data is suitably munged around.

Such "wide" data are also the subject of a considerable amount of
research.  Usually the criterion used for parameter estimation is the
product of the singular values of the residual matrix, not simply the
sum of the squares of the residuals.  There is an entire chapter in
Bates and Watts (1988) devoted to the analysis of such multiresponse
data, including discussion of algorithms.  At present these algorithms
are not implemented in R.

> I've been looking through the code in nls.c and nls.R quite closely and
> I can see a lot of stuff which seems to be catering to cases of various
> dimensional objects. I cannot fully comprehend what it does and I cannot
> get a high dimensional case (such as the above) to work properly.
> The crux of it seems to be that qr() needs a matrix to work on
> so it uses as.matrix() to coerce its input. If you feed it a vector
> you get a column matrix (sounds good), if you feed it a matrix you
> have no problem, if you feed it a high dimensional tensor you get
> a column vector (which surprised me), and qr() goes with that:
> > dim( qr( 1:10 )$qr )
> [1] 10  1
> > dim( qr( outer( 1:3, 1:5 ))$qr )
> [1] 3 5
> > dim( qr( outer( 1:2, outer( 1:3, 1:4 )))$qr )
> [1] 24  1
> The result is that the nls model comes to grief when it feeds
> the three dimensional tensor into qr() then what it gets back
> (i.e. a long, thin column matrix) does not match up with anything.
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch