[R] "subscript out of bounds" Error in predict.naivebayes

Stephen Weigand weigand.stephen at gmail.com
Wed Aug 29 04:09:43 CEST 2007


On 8/22/07, Polly He <biyuhe at gmail.com> wrote:
> I'm trying to fit a naive Bayes model and predict on a new data set using
> the functions naivebayes and predict (package = e1071).
>
> R version 2.5.1 on a Linux machine
>
> My data set looks like this. "class" is the response and k1 - k3 are the
> independent variables. All of them are factors. The response has 52 levels
> and k1 - k3 have 2-6 levels. I have about 9,300 independent variables but
> omit the long list here for simple demonstration. There are no missing
> values in the observations.
>
>    class k1 k2 k3
>       1  0  0  1
>       8  0  0  0
>
> # model fitting, I also tried setting laplace=0 but didn't help
>  nbmodel <- naiveBayes(class~., data=train, laplace=1)
>
> # predict
>  nb.fit <- predict(nbmodel, x.test[,-1])
>
> First I had no trouble fitting the model. R also returned the predictions
> for some of my large data sets. But for some data sets, R can fit the model
> (no error message, nb.model$tables look ok). When I invoked the predict
> function, it kept giving me the following message:
>
> # my data set has 1 response variable and 9318 independent variables
> Error in FUN(1:9319[[4L]], ...) : subscript out of bounds
[...]

In my experience, some predict methods have trouble when
newdata does not have all levels of a factor. This seems
to be the case with predict.naiveBayes:

example(naiveBayes)
predict(model, subset(HouseVotes84, V1 == "n"))

gives

Error in object$tables[[v]] : subscript out of bounds

One workaround is to predict for a "bigger" data set
and retain a subset of the predictions.

Hope this helps,

Stephen


-- 
Rochester, Minn. USA



More information about the R-help mailing list