[R] naiveBayes other than e1071

Uwe Ligges ligges at statistik.uni-dortmund.de
Wed Jun 6 11:03:05 CEST 2007


Dear Max,

thanks for your work on this!
I totally agree in all points and have added some check for zero 
variances to my working copy of NaiveBayes.default() which will be 
published in the next klaR release.

   if(!usekernel){
     temp <- apply(sapply(tables, function(x) x[,2]), 2,
                   function(x) any(!x))
     if(any(temp))
       stop("Zero variances for at least one class in variables: ",
            paste(names(tables)[temp], collapse=", "))
   }


Thanks again,
Uwe




Kuhn, Max wrote:
> Saeed and Uwe,
> 
> The underlying problem is the distribution of the data. For example:
> 
>> table(x.x[,91], y.y)
>              y.y
>                  0    1
>   0.000675027 2412    0
>   0.002184892    0  481
> 
> When the function tries to estimate the distribution of this feature for
> each class, it gets:
> 
>    nb$tables[[91]]
>             [,1] [,2]
>    0 0.000675027    0
>    1 0.002184892    0
> 
> (Saeed - column 1 contains the means for each class and column 2
> contains the variances)
>  
> For class 0, if a new data point for this variable has a value of
> 0.000675027, then dnorm(0.000675027, 0.000675027, 0) = Inf (all other
> points have density values of zero). When the data are normalized by
> p(x), this produces a NaN. A few of the predictors have this problem.
> 
> There should probably be some sort of check for this, but that might be
> hard to do when usekernel = TRUE. Uwe - do you agree and/or have ideas? 
> 
> Good news Saeed! Just use variable 91 and you don't need a model.
> Seriously, you might want to think about these data a bit. Many of them
> are highly skewed and have a large point mass at zero. Modeling the
> conditional probabilities using a normal distribution may not be the
> best idea.
> 
> Max
> 
> 
> -----Original Message-----
> From: Uwe Ligges [mailto:ligges at statistik.uni-dortmund.de] 
> Sent: Tuesday, June 05, 2007 3:56 PM
> To: Saeed Abu Nimeh
> Cc: Kuhn, Max; r-help at stat.math.ethz.ch
> Subject: Re: [R] naiveBayes other than e1071
> 
> 
> 
> Saeed Abu Nimeh wrote:
>> Max,
>> Thanks. I have tried it but i keep getting an error:
>> Error in as.vector(x, mode) : invalid argument 'mode'
>> Do I have to do something specific when using the class column. I
> tried
>> both  y.y<-as.vector and y.y<-as.factor.
>>
>> dread<-read.table('dataset.csv',sep=",")
>> x.x<-as.matrix(dread[,2:256])
>> y.y<-as.vector(dread[,1])
>> nb<- NaiveBayes(x=x.x,grouping=y.y)
>> pred.nb<-predict(nb)
>>
>> Error in as.vector(x, mode) : invalid argument 'mode'
> 
> 
> 
> Please tell us (according to the posting guide): Which version of R? 
> Which version of klaR? Example data that reproduce the error?
> 
> Uwe Ligges
> 
> 
> 
>> Thanks,
>> Saeed
>>
>> Kuhn, Max wrote:
>>> Saeed,
>>>
>>> There is a version in the klaR package. I recently submitted a change
> to
>>> the predict function that may be related to your problem. 
>>>
>>> If:
>>>
>>>   1. the posterior probabilities (apart from the prior) are being
>>> approximated by the product of the p(x_i|y_j) and
>>>
>>>   2. a lot of predictors are being used
>>>
>>> then posterior probabilities may have values of absolute zero. 
>>>
>>> When the approximation is used, the approximate posterior
> probabilities
>>> are normalized by their sum (which is zero in such cases).
>>>
>>> The patch in klaR uses the product of the conditional divided by the
>>> marginal of x_i (per the true formula). I haven't seen the problem
> occur
>>> with this patch.
>>>
>>> HTH,
>>>
>>> Max
>>>
>>> -----Original Message-----
>>> From: r-help-bounces at stat.math.ethz.ch
>>> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Saeed Abu
> Nimeh
>>> Sent: Monday, June 04, 2007 2:45 PM
>>> To: r-help at stat.math.ethz.ch
>>> Subject: [R] naiveBayes other than e1071
>>>
>>> Hi List,
>>> Is there a naiveBayes interface other than the one in e1071 package.
> For
>>> some reason on certain datasets all predicted values are NaN, but it
>>> predicts well on others.
>>> Thanks,
>>> Saeed
>>> ---
>>> model <- naiveBayes(x.train, y.train, laplace = 3)
>>> pred <- predict(model,x.test,type="raw")
>>>
>>> ______________________________________________
>>> R-help at stat.math.ethz.ch mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
> ----------------------------------------------------------------------
>>> LEGAL NOTICE
>>> Unless expressly stated otherwise, this message is confidential and
> may be privileged.  It is intended for the addressee(s) only.  Access to
> this E-mail by anyone else is unauthorized.  If you are not an
> addressee, any disclosure or copying of the contents of this E-mail or
> any action taken (or not taken) in reliance on it is unauthorized and
> may be unlawful.  If you are not an addressee, please inform the sender
> immediately.
>> ______________________________________________
>> R-help at stat.math.ethz.ch mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list