[R] Lack of Fit test

Thu Feb 24 00:01:02 CET 2000

Alan T. Arnholt writes:

> I guess my question was not adequately stated when I sent it to
> the list.  I was inquiring to see if anyone had written code to
> perform a lack of fit test in the special case when you have
> replicate predictors.  If your predictors contain replicates
> (repeated x values with one predictor or repeated combinations
> of x values with multiple predictors), you can easily calculate
> a pure error test for lack of fit.  The error term will be
> partitioned into pure error (error within replicates), and a
> lack of fit error and the F-test can be used to test if you
> have chosen an adequate regression model.  See Neter, Kutener,
> Nachtsheim, and Wasserman fourth edition page 115, or Draper
> and Smith.

This is precisely a special case of what I was saying.  You fit a
larger model that contains the given model as a special case and
test one within the other.  In this case the larger model is
pretty obvious (a single classification model with the repeated
combinations defining the classes) but even here some care is
necessary if the degrees of freedom available for estimating
sigma^2 is too few.

Moreover no special software is needed.  Suppose you have only
one predictor with repeated values, say x, and you are testing a
simple linear regression model.  Then you can do the test using

inner.mod <- lm(y ~ x, dat)
outer.mod <- lm(y ~ factor(x), dat)
anova(inner.mod, outer.mod)

Test for lack of fit done.  If you have several predictors
defining the repeated combinations all you need do is paste them
together, for example, and make a factor from that.

> Bill Venables wrote "...It makes it impossible to write code to
> do it automatically, but if you know what you are doing, the
> procedure is simple with the software you have.  As with so
> many things in statistics, it is not a matter of good software
> so much as of having a good understanding of the problem in
> hand."  I guess I am not sure what "if you know what you are
> doing the procedure is simple..." means since I clearly know
> what I am doing in reference to the statistical procedure.
> Where I need help is not with the statistics, but rather with
> automating the procedure in R.

You are testing "Lack of fit" by testing one linear model within
another and that procedure is pretty well automated within R
already (see above).  What the textbooks don't always tell you is
that that is all you are doing when you test for lack of fit, so
it can look like a special procedure that needs some new software
to perform.  It isn't.

What I was also referring to was the fact that the textbook test
for lack of fit is not always appropriate, even in this special
situation of repeated combinations of predictors.  If you have
too few degrees of freedom left over for estimating sigma^2
within repeated combinations you may need to make do with a
smaller outer model.  From a statistical point of view, unless
you are in a very special context that is not a decision you can
afford to delegate to an automatic procedure.  You need a good
understanding of the problem in hand to make it sensibly.

-- 
Bill Venables,      Statistician,     CMIS Environmetrics Project
CSIRO Marine Labs, PO Box 120, Cleveland, Qld,  AUSTRALIA.   4163
Tel: +61 7 3826 7251           Email: Bill.Venables at cmis.csiro.au    
Fax: +61 7 3826 7304      http://www.cmis.csiro.au/bill.venables/

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._