[R] Unexpected behavior of predict and interval="confidence"

Gavin Simpson gavin.simpson at ucl.ac.uk
Mon Oct 30 09:07:35 CET 2006


On Sun, 2006-10-29 at 12:54 -0500, Kevin Middleton wrote:
> Based on some recent r-help discussions, I have been trying out  
> plotting confidence intervals using predict and matplot. Matplot  
> appeared to not be plotting the linear regression when using the  
> default column names generated by read.table (V1, V2, etc). On  
> further inspection, the error seemed to be with predict and vector  
> names (V1, V2) rather than with matplot. I was using some textbook  
> data sets that were read with read.table, so my data.frame columns  
> were named by default. The problem seems to be related to the name of  
> the vector only (though I may be wrong).
> 
> The example below, based on that in ?predict.lm, better describes the  
> problem. Using predict with V2~V1 & y~x produces identical output  
> (when x<-V1 and y<-V2). However, using predict with  
> interval="confidence" results in different output from the same data.  
> That with y~x is correct, and V2~V1 is "incorrect".

This is because you have been caught out by the way newdata is handled
in predict.

This is your problem:

new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))
names(new)

You have created new, a 1 column data frame with the colname "x". As "x"
doesn't exist in the model formula (V2 ~ V1), what the predict line
actually means, is to predict value for the observed data used to fit
them model, i.e. the fitted values:

pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence")
fit <- fitted(lm(V2 ~ V1))
all.equal(fit, pred.w.clim[, 1])

When you do predict(lm(y ~ x), new, interval="confidence"), "x" is now
in the model formula *and* in "new" so you get predictions for the x
values in "new".

If you'd done this:

new <- data.frame(V1 = seq(min(V1), max(V1), length.out = length(V1)))
predict(lm(V2 ~ V1), new, interval="confidence")

Then you'd have gotten what you wanted.

HTH

G

> 
> This may be related to a previous post on r-help:
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56982.html
> 
> I can't figure out why there would be a difference in predict when  
> only the vector name seemingly changes. Granted, I could just read  
> the data.frame is as "x" and "y."
> 
> Thanks
> Kevin
> 
> ######################
> 
> set.seed(10)
> 
> # For example purposes, plot side by side
> par(mfrow=c(1,2))
> 
> V1 <- rnorm(15)
> V2 <- V1 + rnorm(15)
> 
> new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))
> 
> pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence")
> matplot(new$x, pred.w.clim,
>          lty=c(1,2,2), type="l",
>          col=c("black", "red", "red"),
>          ylab="predicted y")
> points(V1,V2)
> 
> 
> # Create x & y equal to V1 & V2
> x <- V1
> y <- V2
> 
> pred.w.clim2 <- predict(lm(y ~ x), new, interval="confidence")
> matplot(new$x, pred.w.clim2,
>          lty=c(1,2,2), type="l",
>          col=c("black", "red", "red"),
>          ylab="predicted y")
> points(x,y)
> 
> # Test if V1=x, V2=y
> all.equal(x,V1)
> all.equal(y,V2)
> 
> # Same output with predict
> predict(lm(V2 ~ V1))
> predict(lm(y ~ x))
> all.equal(predict(lm(V2 ~ V1)), predict(lm(y ~ x)))
> 
> # Different output with interval="confidence"
> pred.w.clim
> pred.w.clim2
> all.equal(pred.w.clim, pred.w.clim2)
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson                     [t] +44 (0)20 7679 0522
ECRC                              [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building                  [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK                        [w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT                          [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%



More information about the R-help mailing list