[R] Question regarding Naive Bayes

Wed Nov 30 17:59:46 CET 2016

Hello,

I am working with the naïve bayes function inlibrary(e1071).

The function calls are:

transactions.train.nb = naiveBayes(as.factor(DealerID) ~

                                   as.factor(Manufacturer) 

                                    + as.factor(RangeDesc)

                                    +as.factor(BodyType)  

                                    +as.factor(FuelType) 

                                    +as.factor(PaintColour)

                                    +as.factor(TransmissionType) 

                                    +as.factor(Mileage)

                                    +as.factor(Registration),

                                     data=transactions.train, 

                                     na.action=na.omit)

where transactions.train is a dataframe with dimension 2032rows by 14 columns.

and

transactions.test.nb = predict(transactions.train.nb,transactions.test[,-1], type='raw')

An example of the result are

View(transactions.test.nb)

Reduced results shown:

                188                     225                         229                         270                     273

1              0.000984              0.000492              0.000492              0.000492              0.001476

2              0.000984              0.000492              0.000492              0.000492              0.001476

3              0.000984              0.000492              0.000492              0.000492              0.001476

4              0.000984              0.000492              0.000492              0.000492              0.001476

5              0.000984              0.000492              0.000492              0.000492              0.001476

I was struggling to understand why the returnedprobabilities are the same for each column as I was hoping for them to bedifferent.

Dealer ID should have a different probability to row 1 than row 2.Each row does sum to 1.

Transactions.train represents 67% of the full set of data.

I’ve tried introducing laplace smoothing, and experimentedwith increasing and decreasing the number of parameters used to generate thetraining naivebayes object

But as of yet I can’t figure it out.  Could anybody help?

Kind regards,

Phil,

	[[alternative HTML version deleted]]