[R] problem with predict()
ripley@stats.ox.ac.uk
ripley at stats.ox.ac.uk
Mon Jul 1 16:31:21 CEST 2002
On Mon, 1 Jul 2002, Prof Brian D Ripley wrote:
> I should point that there is (as I thought) nothing wrong with predict.lm
> on a rank-degenerate problem, e.g.
>
> x1 <- rnorm(100)
> x3 <- rnorm(100)
> y <- rnorm(100)
> train <- data.frame(y=y, x1=x1, x2=x1, x3=x3)
> fit <- lm(y ~ ., train)
> stopifnot(all.equal(predict(fit), predict(fit, train)))
>
> although as Thomas points out a warning would be useful.
>
> The problem here is that model.matrix is (for me) adding 13 duplicate
> columns in lm and not in predict.lm. That's a bug unrelated to predict().
Follow up: the data file posted contains illegal variable names. These
are remapped by make.names into valid names, thereby creating duplicated
names. terms.formula creates a formula with these duplicated names in,
and with a column in model.matrix for each of the duplicates. However, as
the formula is invalid, it gets corrected in predict.lm by
delete.response().
So the error is to attempt to use a data frame with invalid names, and the
bug is that R did not detect the duplicates.
read.table should call make.names(col.name, unique=TRUE) to avoid this,
and terms.formula needs to check for duplicates too.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list