[R] Unexpected behavior of predict and interval="confidence"
Gavin Simpson
gavin.simpson at ucl.ac.uk
Mon Oct 30 09:07:35 CET 2006
On Sun, 2006-10-29 at 12:54 -0500, Kevin Middleton wrote:
> Based on some recent r-help discussions, I have been trying out
> plotting confidence intervals using predict and matplot. Matplot
> appeared to not be plotting the linear regression when using the
> default column names generated by read.table (V1, V2, etc). On
> further inspection, the error seemed to be with predict and vector
> names (V1, V2) rather than with matplot. I was using some textbook
> data sets that were read with read.table, so my data.frame columns
> were named by default. The problem seems to be related to the name of
> the vector only (though I may be wrong).
>
> The example below, based on that in ?predict.lm, better describes the
> problem. Using predict with V2~V1 & y~x produces identical output
> (when x<-V1 and y<-V2). However, using predict with
> interval="confidence" results in different output from the same data.
> That with y~x is correct, and V2~V1 is "incorrect".
This is because you have been caught out by the way newdata is handled
in predict.
This is your problem:
new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))
names(new)
You have created new, a 1 column data frame with the colname "x". As "x"
doesn't exist in the model formula (V2 ~ V1), what the predict line
actually means, is to predict value for the observed data used to fit
them model, i.e. the fitted values:
pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence")
fit <- fitted(lm(V2 ~ V1))
all.equal(fit, pred.w.clim[, 1])
When you do predict(lm(y ~ x), new, interval="confidence"), "x" is now
in the model formula *and* in "new" so you get predictions for the x
values in "new".
If you'd done this:
new <- data.frame(V1 = seq(min(V1), max(V1), length.out = length(V1)))
predict(lm(V2 ~ V1), new, interval="confidence")
Then you'd have gotten what you wanted.
HTH
G
>
> This may be related to a previous post on r-help:
> http://finzi.psych.upenn.edu/R/Rhelp02a/archive/56982.html
>
> I can't figure out why there would be a difference in predict when
> only the vector name seemingly changes. Granted, I could just read
> the data.frame is as "x" and "y."
>
> Thanks
> Kevin
>
> ######################
>
> set.seed(10)
>
> # For example purposes, plot side by side
> par(mfrow=c(1,2))
>
> V1 <- rnorm(15)
> V2 <- V1 + rnorm(15)
>
> new <- data.frame(x = seq(min(V1), max(V1), length.out = length(V1)))
>
> pred.w.clim <- predict(lm(V2 ~ V1), new, interval="confidence")
> matplot(new$x, pred.w.clim,
> lty=c(1,2,2), type="l",
> col=c("black", "red", "red"),
> ylab="predicted y")
> points(V1,V2)
>
>
> # Create x & y equal to V1 & V2
> x <- V1
> y <- V2
>
> pred.w.clim2 <- predict(lm(y ~ x), new, interval="confidence")
> matplot(new$x, pred.w.clim2,
> lty=c(1,2,2), type="l",
> col=c("black", "red", "red"),
> ylab="predicted y")
> points(x,y)
>
> # Test if V1=x, V2=y
> all.equal(x,V1)
> all.equal(y,V2)
>
> # Same output with predict
> predict(lm(V2 ~ V1))
> predict(lm(y ~ x))
> all.equal(predict(lm(V2 ~ V1)), predict(lm(y ~ x)))
>
> # Different output with interval="confidence"
> pred.w.clim
> pred.w.clim2
> all.equal(pred.w.clim, pred.w.clim2)
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Gavin Simpson [t] +44 (0)20 7679 0522
ECRC [f] +44 (0)20 7679 0565
UCL Department of Geography
Pearson Building [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street
London, UK [w] http://www.ucl.ac.uk/~ucfagls/
WC1E 6BT [w] http://www.freshwaters.org.uk/
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
More information about the R-help
mailing list