[R] naiveBayes other than e1071

Kuhn, Max Max.Kuhn at pfizer.com
Tue Jun 5 22:56:48 CEST 2007


Saeed and Uwe,

The underlying problem is the distribution of the data. For example:

> table(x.x[,91], y.y)
             y.y
                 0    1
  0.000675027 2412    0
  0.002184892    0  481

When the function tries to estimate the distribution of this feature for
each class, it gets:

   nb$tables[[91]]
            [,1] [,2]
   0 0.000675027    0
   1 0.002184892    0

(Saeed - column 1 contains the means for each class and column 2
contains the variances)
 
For class 0, if a new data point for this variable has a value of
0.000675027, then dnorm(0.000675027, 0.000675027, 0) = Inf (all other
points have density values of zero). When the data are normalized by
p(x), this produces a NaN. A few of the predictors have this problem.

There should probably be some sort of check for this, but that might be
hard to do when usekernel = TRUE. Uwe - do you agree and/or have ideas? 

Good news Saeed! Just use variable 91 and you don't need a model.
Seriously, you might want to think about these data a bit. Many of them
are highly skewed and have a large point mass at zero. Modeling the
conditional probabilities using a normal distribution may not be the
best idea.

Max


-----Original Message-----
From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] 
Sent: Tuesday, June 05, 2007 3:56 PM
To: Saeed Abu Nimeh
Cc: Kuhn, Max; r-help at stat.math.ethz.ch
Subject: Re: [R] naiveBayes other than e1071



Saeed Abu Nimeh wrote:
> Max,
> Thanks. I have tried it but i keep getting an error:
> Error in as.vector(x, mode) : invalid argument 'mode'
> Do I have to do something specific when using the class column. I
tried
> both  y.y<-as.vector and y.y<-as.factor.
> 
> dread<-read.table('dataset.csv',sep=",")
> x.x<-as.matrix(dread[,2:256])
> y.y<-as.vector(dread[,1])
> nb<- NaiveBayes(x=x.x,grouping=y.y)
> pred.nb<-predict(nb)
> 
> Error in as.vector(x, mode) : invalid argument 'mode'



Please tell us (according to the posting guide): Which version of R? 
Which version of klaR? Example data that reproduce the error?

Uwe Ligges



> Thanks,
> Saeed
> 
> Kuhn, Max wrote:
>> Saeed,
>>
>> There is a version in the klaR package. I recently submitted a change
to
>> the predict function that may be related to your problem. 
>>
>> If:
>>
>>   1. the posterior probabilities (apart from the prior) are being
>> approximated by the product of the p(x_i|y_j) and
>>
>>   2. a lot of predictors are being used
>>
>> then posterior probabilities may have values of absolute zero. 
>>
>> When the approximation is used, the approximate posterior
probabilities
>> are normalized by their sum (which is zero in such cases).
>>
>> The patch in klaR uses the product of the conditional divided by the
>> marginal of x_i (per the true formula). I haven't seen the problem
occur
>> with this patch.
>>
>> HTH,
>>
>> Max
>>
>> -----Original Message-----
>> From: r-help-bounces at stat.math.ethz.ch
>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Saeed Abu
Nimeh
>> Sent: Monday, June 04, 2007 2:45 PM
>> To: r-help at stat.math.ethz.ch
>> Subject: [R] naiveBayes other than e1071
>>
>> Hi List,
>> Is there a naiveBayes interface other than the one in e1071 package.
For
>> some reason on certain datasets all predicted values are NaN, but it
>> predicts well on others.
>> Thanks,
>> Saeed
>> ---
>> model <- naiveBayes(x.train, y.train, laplace = 3)
>> pred <- predict(model,x.test,type="raw")
>>
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
----------------------------------------------------------------------
>> LEGAL NOTICE
>> Unless expressly stated otherwise, this message is confidential and
may be privileged.  It is intended for the addressee(s) only.  Access to
this E-mail by anyone else is unauthorized.  If you are not an
addressee, any disclosure or copying of the contents of this E-mail or
any action taken (or not taken) in reliance on it is unauthorized and
may be unlawful.  If you are not an addressee, please inform the sender
immediately.
>>
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list