[R] question about predictions with linear models
Martin Maechler
maechler at stat.math.ethz.ch
Thu Oct 2 11:37:26 CEST 2003
>>>>> "Rajarshi" == Rajarshi Guha <rxg218 at psu.edu>
>>>>> on 01 Oct 2003 12:59:16 -0400 writes:
Rajarshi> Hi,
Rajarshi> this question is probably very obvious but I just cant see where I
Rajarshi> might be going wrong.
Rajarshi> I'm using the lm() function to generate a linear
Rajarshi> model and then make predictions using a different
Rajarshi> set of data.
Rajarshi> To generate the model I do (tdata & pdata are
Rajarshi> matrices of observations and parameters, tdepv,
Rajarshi> pdepv are response vectors)
This is not reproducible for us, since we don't have your data,
and the following is partly definitely not the R code you used.
(you didn't use "lnegth()" nor did you ***set** "length(pred) = 140",
did you?)
All we can guess is that you had missing values in your data,
and you did *not* consider setting options(na.action = "na.exclude")
{read ?na.exclude first}.
Rajarshi> x <- as.data.frame(tdata)
Rajarshi> x$tdepv <- tdepv
Rajarshi> lnegth(tdepv) = 140
Rajarshi> model <- lm(x$tdepv ~ x$V1 + x$V2 + x$V3 + x$V4, x)
Rajarshi> pred <- predict(model, x)
Rajarshi> length(pred) = 140
Rajarshi> y <- as.data.frame(pdata)
Rajarshi> y$pdepv <- pdepv
Rajarshi> length(pdepv) = 16
Rajarshi> pred <- predict(model, y)
Rajarshi> length(pred) = 140
Rajarshi> But I expect that length(pred) = 16
Rajarshi> Why do I get a different length? Furthermore, the original formula
Rajarshi> specified the variable tdepv which is not in the dataframe that I send
Rajarshi> to predict() - should I also make a variable called tdepv in the
Rajarshi> dataframe y?
Rajarshi> Thanks,
Rajarshi> -------------------------------------------------------------------
Rajarshi> Rajarshi Guha <rxg218 at psu.edu> <http://jijo.cjb.net>
Rajarshi> GPG Fingerprint: 0CCA 8EE2 2EEB 25E2 AB04 06F7 1BB9 E634 9B87 56EE
Rajarshi> -------------------------------------------------------------------
More information about the R-help
mailing list