[R] problem with predict()
Czerminski, Ryszard
ryszard at arqule.com
Thu Jun 27 21:29:23 CEST 2002
# Yes. You are *still* using a matrix in a data frame. Please do read more
# carefully.
I have read some more R documentation trying to understand difference
between
matrices and data frames etc... and I repeat my example this time
executing EXACTLY the same code with only difference being that in one case
I use smaller data sets ({train,test}-small.csv) and in the second case I
use larger
data sets ({train,test}.csv) - and I got different behaviour.
Small case (10*4) runs fine, larger case (164*119) fails.
Any ideas why this happens ?
R
> rm(list=ls())
> train.data <- read.csv("train-small.csv", header = TRUE, row.names =
"mol", comment.char="")
> test.data <- read.csv("test-small.csv", header = TRUE, row.names = "mol",
comment.char="")
> yr <- train.data[,1]; xr <- train.data[,-1]
> xr <- scale(xr)
> x.center <- attr(xr, "scaled:center"); x.scale <- attr(xr, "scaled:scale")
> mask <- apply(xr, 2, function(x) any(is.na(x)))
> xr <- xr[,!mask] # rm NA's
> ys <- test.data[,1]; xs <- test.data[,-1]
> xs <- scale(xs, center = x.center, scale = x.scale)
> xs <- xs[,!mask]
> train <- data.frame(y = yr, x = xr)
> test <- data.frame(y = ys, x = xs)
> model <- lm(y~., train)
> cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
dim(train) = 10 4 ; dim(test) = 10 4
> length(predict(model, test))
[1] 10
> train.data <- read.csv("train.csv", header = TRUE, row.names = "mol",
comment.char="")
> test.data <- read.csv("test.csv", header = TRUE, row.names = "mol",
comment.char="")
[snip...]
> cat("dim(train) =", dim(train), "; dim(test) =", dim(test), "\n")
dim(train) = 164 119 ; dim(test) = 35 119
> length(predict(model, test))
Error in drop(X[, piv, drop = FALSE] %*% beta[piv]) :
subscript out of bounds
>
Ryszard Czerminski phone: (781)994-0479
ArQule, Inc. email:ryszard at arqule.com
19 Presidential Way http://www.arqule.com
Woburn, MA 01801 fax: (781)994-0679
-----Original Message-----
From: ripley at stats.ox.ac.uk [mailto:ripley at stats.ox.ac.uk]
Sent: Friday, June 21, 2002 3:41 PM
To: Czerminski, Ryszard
Cc: r-help at stat.math.ethz.ch
Subject: RE: [R] problem with predict()
On Fri, 21 Jun 2002, Czerminski, Ryszard wrote:
> --- first problem
>
> If I store 'simulated' data in data frames:
> # train.data <- data.frame(matrix(rnorm(164*119), nrow = 164))
> # test.data <- data.frame(matrix(rnorm(35*119), nrow = 35))
> it still works the same way i.e. the code below works fine
> for simulated data and fails for 'real' data the only
> difference being in actual numeric values stored in data
> structures of the same shape and type.
>
> Any suggestions why this happens ?
Yes. You are *still* using a matrix in a data frame. Please do read more
carefully.
> --- second problem
>
> > As Andy Liaw pointed out, xr is a matrix. Take a look at the names of
> > train. Hint: they do not contain `x'.
>
> Following your hint I am guessing that the fact that names do not contain
> 'x'
> explains why lm(y~., train) form works and lm(y~x, train) fails
> and "lm(y~., train)" means roughly: correlate column "y" to all other
colums
No, it means regress y on all the remaining colums in the data argument.
>
> Where I can find more detail specification of this syntax ?
> In help(lm) I find this paragraph:
>
> Models for `lm' are specified symbolically. A typical model has
> the form `response ~ terms' where `response' is the (numeric)...
>
> which does not quite cover this case.
In any good book on the subject.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
More information about the R-help
mailing list