[R] naiveBayes: slow predict, weird results

Uwe Ligges ligges at statistik.tu-dortmund.de
Sat Feb 11 16:51:27 CET 2012


We don't have the data, but my guess is that you want to have some 
factors in your data that were integers when you tried the code below.

Uwe Ligges


On 10.02.2012 03:43, Sam Steingold wrote:
> I did this:
> nb<- naiveBayes(users, platform)
> pl<- predict(nb,users)
> nrow(users) ==>  314781
> ncol(users) ==>  109
>
> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
> (tens of minutes).  why?
>
> 2. the predict results were completely off the mark (quite the opposite
> of the expected overfitting).  suffice it to show the tables:
>
> pl:
>
>     android blackberry       ipad     iphone         lg      linux        mac
>           3          5         11         14     312723          5         11
>      mobile      nokia    samsung    symbian    unknown    windows
>        1864         17         16        112          0          0
>
> platform:
>     android blackberry       ipad     iphone         lg      linux        mac
>       18013       1221       2647       1328          4       2936      34336
>      mobile      nokia    samsung    symbian    unknown    windows
>          18         88         39        103       2660     251388
>
> i.e., nb classified nearly everything as "lg" while in the actual data
> "lg" is virtually nonexistent.
>
> 3. when I print "nb", I see "A-priori probabilities" (which are what I
> expected) and "Conditional probabilities" which are confusing because
> there are only two of them, e.g.:
>
>               android    0.048464998 0.43946764
>               blackberry 0.001638002 0.04045564
>               ipad       0.322251606 1.84940588
>               iphone     0.030873494 0.23250250
>               lg         0.000000000 0.00000000
>               linux      0.023501362 0.34698919
>               mac        0.082653774 1.22535027
>               mobile     0.000000000 0.00000000
>               nokia      0.000000000 0.00000000
>               samsung    0.000000000 0.00000000
>               symbian    0.000000000 0.00000000
>               unknown    0.003759398 0.08219078
>               windows    0.021158528 0.32916970
>
> the predictors are integers.
> is the first column for the 0 predictors and the second for all non-0?
> Is there a way to ask naiveBayes to differenciate between non-0 values?
>
> thanks!
>



More information about the R-help mailing list