[R] Nonlinear regression question

Douglas Bates bates at stat.wisc.edu
Sat Aug 12 19:33:22 CEST 2000

"Albertus J Smit" <asmit at cyllene.uwa.edu.au> writes:

> Dear R users
> I recently migrated from Statistica/SigmaPlot (Windows) to R (Linux), so
> please excuse if this may sound 'basic'.
> When running a nonlinear regression (V = Vmax * conc / (Ks + conc), i.e.
> Michaelis-Menten) on SigmaPlot, I get the output listed below:
> >>>Begin SigmaPlot Output<<<
> R = 0.94860969	Rsqr = 0.89986035	Adj Rsqr = 0.89458984
> Standard Error of Estimate = 4.1543
>  	Coefficient	Std. Error	t	P
> a	34.5878	2.7690	12.4910	<0.0001
> b	6.8553	2.2949	2.9872	0.0076
> Analysis of Variance:
>  	DF	SS	MS	F	P
> Regression	1	2946.6381	2946.6381	170.7350	<0.0001
> Residual	19	327.9123	17.2585
> Total	20	3274.5504	163.7275
> >>>End SigmaPlot Output<<<
> (In the above output, a is Vmax and b is Ks)
> I have no problem performing the regression using R, and I successfully
> obtain the parameter estimates using the function nls().  However, how do I
> obtain the ANOVA output, r, r^2 and adj. r^2?

As one of the developers of the nls function I would like to state
that the lack of automatic ANOVA, R^2 and adj. R^2 from nls is a
feature, not a bug :-)

More seriously, it is possible to do some computations with the
components of the object produced by nls and obtain numbers similar to
what Statistica produces, but it is not easy to decide how these
numbers should be interpreted and why you would want to use them.

Please allow me to discuss some background on the methods.  I don't
want to get too professorial about this but I would like to emphasize
that there are good reasons for not doing such calculations.  You may
also want to read Bill Venables' excellent "Exegeses on Linear Models"
for more background on silly things done with linear and nonlinear
model fits.  (Bill, where can your exegeses be found?)

The ANOVA methods were possibly the greatest achievement of 20th
century statistics, involving very sophisticated concepts of
n-dimensional geometry and subtle computational methods.  Because the
computation was formidable for the resources available when the
methods were developed, these methods were often taught in terms of
short-cut computational methods and people came to think of the
computation as the justification of the method.  It's not.

A cleaner way of thinking of ANOVA is as a method of comparing nested
models fit to the same data.  That is why the R anova function can
take several fitted models as arguments for which it will return
statistics assessing the comparison of these models.  There is an
anova.nls that does this for nonlinear model fits.  It produces F
statistics and p-values but only when you explicitly fit the models of
interest.  If you want to compare two nls models for the same data,
they you just need to fit them and compare them with anova.

In the case of linear models there are some computational shortcuts
that can be used for ANOVA.  They are based on decomposing the
response space into linear subspaces defined by terms in the model.
For nonlinear models this type of decomposition doesn't apply.  After
all - they're not linear and the terms in the model generally don't
define linear subspaces of the response space.

(Even when I teach linear models these days, I encourage students to
perform ANOVA by considering what models they wish to compare, fitting
those models, and comparing the fits.  The computation is usually
trivial so why perform all sorts of algebraic gymnastics to create a
shortcut method?  I am amazed at the number of texts on linear models
that still discuss these issues in terms of constrained optimization,
Lagrange multipliers, etc. which no one in their right mind would ever

If one looked at the ANOVA from Statistica as a comparison of two
models then it would presumably be comparing the fit from the trivial
model (y_i = \mu + \epsilon_i) to the fit from the full
Michaelis-Menten model.  The first question should be whether these
models are nested.  In this case they are because the Michaelis-Menten
is reduced to the trivial model when Ks = 0 but you wouldn't
understand that from the output here.  (Many other, perhaps most
other, nonlinear models cannot be reduced to a trivial model by
constraints on the parameters.  One would then be at a total loss as
to how to interpret such ANOVA output.)

Notice that the p-value for the ANOVA is different from the p-value
for the t-test of b = 0.  This is because of nonlinearity.  The
p-value from the F-test would be preferred to that from the t-test
because the former is based on two model fits and is not influenced by
parameter-effects nonlinearity while the latter, based on a single
model fit, is influenced by this nonlinearity.

So, if your purpose in doing the ANOVA is to compare the
Michaelis-Menten model to the trivial model then I encourage you to
use the p-value from the F-test.  If your purpose is to "assess the
model" then you will need to think more carefully of what you mean by
"assess the model".

R^2 is even more problematic than ANOVA.  For R^2 to be meaningful
there should be certain orthogonality relationships among components
of the residual vector.  These will almost certainly not occur in a
nonlinear model fit.  My best advice regarding R^2 statistics with
nonlinear models is, as Nancy Reagan suggested, "Just say no.".  If a
journal editor states that you must include R^2 to show "the
proportion of variation accounted for by the model", try to persuade
that person that it is not meaningful.  Adjusted R^2 is even less
likely to be useful and the R produced in the Statistica output is
probably the most meaningless statistic there.  They seem to have
calculated R by taking the square root of the R^2.  Such a value would
definitely not represent any kind of correlation coefficient for the

In summary, those statistics are tied to particular properties of
linear models and calculating them from a nonlinear model fit is, at
best, risky and, at worse, nonsense.

Sorry for answering a perfectly legitimate question with a diatribe.
As you might guess, I feel strongly about these issues.  I recently
attended a Ph.D. prelim exam where the candidate had proposed research
on various ways of defining an R^2 statistic in the original data
scale from a linear model fit to data in a transformed scale
determined by the Box-Cox method.  There were seven different possible
definitions for R^2, all of which, as acknowledged by the candidate
and by the thesis advisor, were incorrect.  The purpose of this
path-breaking study is to determine in some way which of these seven
possible incorrect definitions should be used.  My suggestion that the
obvious answer was "none of them" was regarded as somewhat heretical.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list