[R] How consistent is predict() syntax?

Ajay Shah ajayshah at mayin.org
Fri Apr 13 19:45:31 CEST 2007


I have a situation where lagged values of a time-series are used to
predict future values. I have packed together the time-series and the
lagged values into a data frame:

> str(D)
'data.frame':   191 obs. of  13 variables:
 $ y    : num  -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11 4.79
 ...
 $ y.l1 : num  NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 -0.11
 ...
 $ y.l2 : num  NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 -0.45 ...
 $ y.l3 : num  NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 2.63 ...
 $ y.l4 : num  NA NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 1.71 ...
 $ y.l5 : num  NA NA NA NA NA -0.21 -2.28 -2.71 2.26 -1.11 ...
 $ y.l6 : num  NA NA NA NA NA NA -0.21 -2.28 -2.71 2.26 ...
 $ y.l7 : num  NA NA NA NA NA NA NA -0.21 -2.28 -2.71 ...
 $ y.l8 : num  NA NA NA NA NA NA NA NA -0.21 -2.28 ...
 $ y.l9 : num  NA NA NA NA NA NA NA NA NA -0.21 ...
 $ y.l10: num  NA NA NA NA NA NA NA NA NA NA ...
 $ y.l11: num  NA NA NA NA NA NA NA NA NA NA ...
 $ y.l12: num  NA NA NA NA NA NA NA NA NA NA ...

I have:

> insample <- 1:179
> outsample <- 180:191

To help you see what is going on:

> D[outsample,]
     y y.l1 y.l2 y.l3 y.l4 y.l5 y.l6 y.l7 y.l8  y.l9 y.l10 y.l11 y.l12
180 NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63 4.73 12.24 10.76  8.13  9.82
181 NA   NA 8.81 8.53 5.68 5.97 9.75 7.20 7.63  4.73 12.24 10.76  8.13
182 NA   NA   NA 8.81 8.53 5.68 5.97 9.75 7.20  7.63  4.73 12.24 10.76
183 NA   NA   NA   NA 8.81 8.53 5.68 5.97 9.75  7.20  7.63  4.73 12.24
184 NA   NA   NA   NA   NA 8.81 8.53 5.68 5.97  9.75  7.20  7.63  4.73
185 NA   NA   NA   NA   NA   NA 8.81 8.53 5.68  5.97  9.75  7.20  7.63
186 NA   NA   NA   NA   NA   NA   NA 8.81 8.53  5.68  5.97  9.75  7.20
187 NA   NA   NA   NA   NA   NA   NA   NA 8.81  8.53  5.68  5.97  9.75
188 NA   NA   NA   NA   NA   NA   NA   NA   NA  8.81  8.53  5.68  5.97
189 NA   NA   NA   NA   NA   NA   NA   NA   NA    NA  8.81  8.53  5.68
190 NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA  8.81  8.53
191 NA   NA   NA   NA   NA   NA   NA   NA   NA    NA    NA    NA  8.81

Now this works nicely:

> library(rpart)
> predict(rpart(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,])
     180      181      182      183      184      185      186  187 
7.551724 7.551724 7.551724 7.551724 7.551724 7.551724 7.551724  6.057636 
     188      189      190      191 
6.057636 6.057636 6.057636 6.057636 

But when I try to do:

> library(randomForest)
> predict(randomForest(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,])
[1]
 7.71523

I don't seem to get a vector of twelve predictions; I only get one
prediction. Is it the case that randomForest doesn't like missing
data? Is there anything I can do about it?

Further, when I try to do this:

> library(e1071)
> predict(svm(y ~ ., D[insample,], na.action=na.omit), newdata=D[outsample,])
Error in `names<-.default`(`*tmp*`, value = c("180", "181", "182", "183",  : 
      'names' attribute [12] must be the same length as the vector [0]

Any idea how I should approach this? Is there a generic interface to
the wide range of statistical tools in doing prediction?

-- 
Ajay Shah                                      http://www.mayin.org/ajayshah  
ajayshah at mayin.org                             http://ajayshahblog.blogspot.com
<*(:-? - wizard who doesn't know the answer.



More information about the R-help mailing list