[R] confidence interval in "predict.lm"

Fri Nov 15 17:43:28 CET 2002

I am studying statistics using R and a book "Understandable Statistics", by
Brase and Brase.  The book has two
worked examples for calculating a confidence interval around a predicted
value from a linear model.  The answers
to the two examples in the book differ from those I get from R.  The
regression line, the standard error, and the
predicted value in
R and the book all agree for the examples.  Hence I gather that R and the
book use different formula to calculate
the confidence interval.  Could someone explain why the difference exists,
and which function(s) in R I might use
to get the answers in the book, and (perhaps) an explanation as to which
method to use in various situations).

The example:

> x<-c(10,20,30,40,50,60,70)
> y<-c(17,21,25,28,33,40,49)
> dat <- data.frame(temp=x,amnt=y)
  temp amnt
1   10   17
2   20   21
3   30   25
4   40   28
5   50   33
6   60   40
7   70   49

being a table of temperatures (temp) and the corresponding amounts of copper
sulfate that disolve in 100g of water
at that temperature.

The regression line:

> mod <- lm(amnt ~ temp,dat)
> summary(mod)

Call:
lm(formula = amnt ~ temp, data = dat)

Residuals:
      1       2       3       4       5       6       7
 1.7857  0.7143 -0.3571 -2.4286 -2.5000 -0.5714  3.3571

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.14286    1.98463   5.111  0.00374 **
temp         0.50714    0.04438  11.428 8.98e-05 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

Residual standard error: 2.348 on 5 degrees of freedom
Multiple R-Squared: 0.9631,     Adjusted R-squared: 0.9558
F-statistic: 130.6 on 1 and 5 DF,  p-value: 8.985e-05

The .95 confidence interval for a temperature of 45 degrees:
>
foo<-predict(mod,data.frame(temp=45),level=.95,interval="confidence",se.fit=
T)
> foo
$fit
          fit      lwr      upr
[1,] 32.96429 30.61253 35.31604

$se.fit
[1] 0.9148715

$df
[1] 5

$residual.scale
[1] 2.348252

The book gives the confidence interval as 26.5 <= y <= 39.5.  The book
defines the confidence interval calculation thus:

  yp - E <= y <= yp + E

  Where
   E = tc*sC *sqrt(1 + 1/n + (x-xBar)^2/SSx)
   yp is the predicted value from the regression line
   tc is the value from Student's t distribution for a confidence
    level, c, using n-2 degrees of freedom,
   sC is the standard error of estimate
   SSx is Sum(x^2)-[Sum(x)]^2/n
   n is the number of data pairs.

So that even though the model, predicted value, standard error all agree, R
gives a much smaller confidence
interval than the book does.

Thanks for any advice/help.

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._