[R] Question regarding Naive Bayes

PHILIP GLADWIN philipgladwin at btinternet.com
Wed Nov 30 17:59:46 CET 2016


Hello,


 
I am working with the naïve bayes function inlibrary(e1071).


 
The function calls are:

transactions.train.nb = naiveBayes(as.factor(DealerID) ~

                                   as.factor(Manufacturer) 

                                    + as.factor(RangeDesc)

                                    +as.factor(BodyType)  

                                    +as.factor(FuelType) 

                                    +as.factor(PaintColour)

                                    +as.factor(TransmissionType) 

                                    +as.factor(Mileage)

                                    +as.factor(Registration),

                                     data=transactions.train, 

                                     na.action=na.omit)


 
where transactions.train is a dataframe with dimension 2032rows by 14 columns.


 
and


 
transactions.test.nb = predict(transactions.train.nb,transactions.test[,-1], type='raw')


 
An example of the result are

View(transactions.test.nb)


 
Reduced results shown:

                188                     225                         229                         270                     273

                                                                                 

1              0.000984              0.000492              0.000492              0.000492              0.001476

2              0.000984              0.000492              0.000492              0.000492              0.001476

3              0.000984              0.000492              0.000492              0.000492              0.001476

4              0.000984              0.000492              0.000492              0.000492              0.001476

5              0.000984              0.000492              0.000492              0.000492              0.001476


 
I was struggling to understand why the returnedprobabilities are the same for each column as I was hoping for them to bedifferent.

Dealer ID should have a different probability to row 1 than row 2.Each row does sum to 1.


 
Transactions.train represents 67% of the full set of data.

I’ve tried introducing laplace smoothing, and experimentedwith increasing and decreasing the number of parameters used to generate thetraining naivebayes object

But as of yet I can’t figure it out.  Could anybody help?


 
Kind regards,

Phil,


	[[alternative HTML version deleted]]



More information about the R-help mailing list