[R] predict.naiveBayes() bug in e1071 package
Ali Tofigh
alix.tofigh at gmail.com
Tue Feb 7 18:43:14 CET 2012
Hi,
I'm currently using the R package e1071 to train naive bayes
classifiers and came across a bug: When the posterior probabilities of
all classes are small, the result from the predict.naiveBayes function
become NaNs. This is an issue with the treatment of the
log-transformed probabilities inside the predict.naiveBayes function.
Here is an example to demonstrate the problem (you might need to
increase 'nvar' depending on your machine):
-------------------- 8< --------------------
N <- 100
nvar <- 60
varnames <- paste("v", 1:nvar, sep="")
dat <- sapply(1:nvar, function(dummy) {c(rnorm(N/2, 0, 1), rnorm(N/2, 10, 1))})
colnames(dat) <- varnames
out <- rep(c("a","b"), each=N/2)
names(dat) <- varnames
nb <- naiveBayes(x=dat, y=out)
new.dat <- t(rnorm(nvar, 5, 0.1))
colnames(new.dat) <- varnames
predict(nb, new.dat, type="raw")
-------------------- 8< --------------------
the results of the last line is usually NaNs. As for the solution:
To protect agains very small numbers, the e1071:::predict.naiveBayes
function takes the probabilities into log-space and adds instead of
multiplying probabilities. However, when calculating the posterior
probabilities of each class (when type = "raw"), the log of the
probabilities are exponentiated, which defeats the purpose of the
logspace transformation. I suggest the following change to the code:
Towards the end of the predict.naiveBayes function, you currently do:
L <- exp(L)
L / sum(L) # this is what is returned
you can instead use
sapply(L, function(lp) {1 / sum(exp(L - lp))})
the above comes from the following equality:
x / (x + y + z) = 1 / (1 + exp(log(y) - log(x)) + exp(log(z) - log(x)))
Best wishes,
/Ali Tofigh
More information about the R-help
mailing list