[R] naiveBayes: slow predict, weird results

Sam Steingold sds at gnu.org
Fri Feb 10 03:43:30 CET 2012


I did this:
nb <- naiveBayes(users, platform)
pl <- predict(nb,users)
nrow(users) ==> 314781
ncol(users) ==> 109

1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
(tens of minutes).  why?

2. the predict results were completely off the mark (quite the opposite
of the expected overfitting).  suffice it to show the tables:

pl:

   android blackberry       ipad     iphone         lg      linux        mac 
         3          5         11         14     312723          5         11 
    mobile      nokia    samsung    symbian    unknown    windows 
      1864         17         16        112          0          0 

platform:
   android blackberry       ipad     iphone         lg      linux        mac 
     18013       1221       2647       1328          4       2936      34336 
    mobile      nokia    samsung    symbian    unknown    windows 
        18         88         39        103       2660     251388 

i.e., nb classified nearly everything as "lg" while in the actual data
"lg" is virtually nonexistent.

3. when I print "nb", I see "A-priori probabilities" (which are what I
expected) and "Conditional probabilities" which are confusing because
there are only two of them, e.g.:

             android    0.048464998 0.43946764
             blackberry 0.001638002 0.04045564
             ipad       0.322251606 1.84940588
             iphone     0.030873494 0.23250250
             lg         0.000000000 0.00000000
             linux      0.023501362 0.34698919
             mac        0.082653774 1.22535027
             mobile     0.000000000 0.00000000
             nokia      0.000000000 0.00000000
             samsung    0.000000000 0.00000000
             symbian    0.000000000 0.00000000
             unknown    0.003759398 0.08219078
             windows    0.021158528 0.32916970

the predictors are integers.
is the first column for the 0 predictors and the second for all non-0?
Is there a way to ask naiveBayes to differenciate between non-0 values?

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
http://www.childpsy.net/ http://ffii.org http://www.PetitionOnline.com/tap12009/
http://mideasttruth.com http://iris.org.il http://openvotingconsortium.org
The program isn't debugged until the last user is dead.



More information about the R-help mailing list