[R] predict.naiveBayes() bug in e1071 package

David Winsemius dwinsemius at comcast.net
Tue Feb 7 19:09:40 CET 2012


On Feb 7, 2012, at 12:43 PM, Ali Tofigh wrote:

> Hi,
>
> I'm currently using the R package e1071 to train naive bayes
> classifiers and came across a bug: When the posterior probabilities of
> all classes are small, the result from the predict.naiveBayes function
> become NaNs.

This should be sent to the maintainer of the package. The name of the  
maintainer can always be found in the DESCRIPTION file.  Several of  
the authors are regular readers of rhelp, but I do not know whether  
David Meyer is. I'm sure a well-documented bug report, as this appears  
to be, will be welcomed.

-- 
David.
> This is an issue with the treatment of the
> log-transformed probabilities inside the predict.naiveBayes function.
> Here is an example to demonstrate the problem (you might need to
> increase 'nvar' depending on your machine):
>
> -------------------- 8< --------------------
> N <- 100
> nvar <- 60
> varnames <- paste("v", 1:nvar, sep="")
>
> dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/ 
> 2, 10, 1))})
> colnames(dat) <- varnames
>
> out <- rep(c("a","b"), each=N/2)
> names(dat) <- varnames
>
> nb <- naiveBayes(x=dat, y=out)
>
> new.dat <- t(rnorm(nvar, 5, 0.1))
> colnames(new.dat) <- varnames
>
> predict(nb, new.dat, type="raw")
> -------------------- 8< --------------------
>
> the results of the last line is usually NaNs. As for the solution:
>
> To protect agains very small numbers, the e1071:::predict.naiveBayes
> function takes the probabilities into log-space and adds instead of
> multiplying probabilities. However, when calculating the posterior
> probabilities of each class (when type = "raw"), the log of the
> probabilities are exponentiated, which defeats the purpose of the
> logspace transformation. I suggest the following change to the code:
>
> Towards the end of the predict.naiveBayes function, you currently do:
>
> L <- exp(L)
> L / sum(L)   # this is what is returned
>
> you can instead use
>
> sapply(L, function(lp) {1 / sum(exp(L - lp))})
>
> the above comes from the following equality:
>
> x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) -  
> log(x)))
>
> Best wishes,
> /Ali Tofigh
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT



More information about the R-help mailing list