[R] Lack of Fit test

Wed Feb 23 16:05:44 CET 2000

> From: "Alan T. Arnholt" <arnholt at math.appstate.edu>
> To: Bill Venables <William.Venables at cmis.CSIRO.AU>
> Cc: r-help at stat.math.ethz.ch, arnholt at math.appstate.edu
> Subject: Re: [R] Lack of Fit test
> Date: Wed, 23 Feb 2000 09:40:21 -0500 (EST)
> X-Authentication: none
> 
> 
> I guess my question was not adequately stated when I sent it to the list.  I 
was
> inquiring to see if anyone had written code to perform a lack of fit test in 
the 
> special case when you have replicate predictors.  If your predictors contain 
> replicates (repeated x values with one predictor or repeated combinations of x 
> values with multiple predictors), you can easily calculate a pure error test 
for 
> lack of fit. The error term will be partitioned into pure error (error within 
replicates)
> , and a lack of fit error and the F-test can be used to test if you have 
chosen an 
> adequate regression model.  See Neter, Kutener, Nachtsheim, and Wasserman 
fourth edition
> page 115, or Draper and Smith.  Bill Venables wrote "...It makes it
> impossible to write code to do it automatically, but if you know
> what you are doing, the procedure is simple with the software you
> have.  As with so many things in statistics, it is not a matter
> of good software so much as of having a good understanding of the
> problem in hand."  I guess I am not sure what "if you know what you are doing 
the
> procedure is simple..." means since I clearly know what I am doing in 
reference to 
> the statistical procedure.  Where I need help is not with the statistics, but 
rather 
> with automating the procedure in R.  

That's easy. Suppose your data frame x has some column, say, ID, which
identifies the various cases, and you fitted  

fit1 <- lm(y ~ rhs, data=df)

Now do 

fit2 <- lm(y ~ factor(ID), data=df)
anova(fit1, fit2, test="F")

e.g.

set.seed(123)
df <- data.frame(x = rnorm(10), ID=1:10)[rep(1:10, 1+rpois(10, 3)), ]
df$y <- 3*df$x+rnorm(nrow(df))
fit1 <- lm(y ~ x, data=df)
fit2 <- lm(y ~ factor(ID), data=df)
anova(fit1, fit2, test="F")

  Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1     23     26.101                         
2     15     15.222  8 10.878  1.3399 0.2975

Despite Bill's sound comments, there is an R package lmtest on CRAN,
which is full of tests for linear models.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._