Sam Steingold sds at gnu.org
Fri Feb 10 16:01:54 CET 2012

When I tried to run svm on the same data frame, memory usage as reported
by top(1) doubled to 4GB almost right away and the function never
returned (has been running for ~15 hours now). ^C does not stop it.
This is most unusual, libsvm has always seemed very fast.

This is R version 2.13.1 (2011-07-08) (as distributed with ubuntu).

> * Sam Steingold <fqf at tah.bet> [2012-02-09 21:43:30 -0500]:
> I did this:
> nb <- naiveBayes(users, platform)
> pl <- predict(nb,users)
> nrow(users) ==> 314781
> ncol(users) ==> 109
> 1. naiveBayes() was quite fast (~20 seconds), while predict() was slow
> (tens of minutes).  why?
> 2. the predict results were completely off the mark (quite the opposite
> of the expected overfitting).  suffice it to show the tables:
> pl:
>    android blackberry       ipad     iphone         lg      linux        mac 
>          3          5         11         14     312723          5         11 
>     mobile      nokia    samsung    symbian    unknown    windows 
>       1864         17         16        112          0          0 
> platform:
>    android blackberry       ipad     iphone         lg      linux        mac 
>      18013       1221       2647       1328          4       2936      34336 
>     mobile      nokia    samsung    symbian    unknown    windows 
>         18         88         39        103       2660     251388 
> i.e., nb classified nearly everything as "lg" while in the actual data
> "lg" is virtually nonexistent.
> 3. when I print "nb", I see "A-priori probabilities" (which are what I
> expected) and "Conditional probabilities" which are confusing because
> there are only two of them, e.g.:
>              android    0.048464998 0.43946764
>              blackberry 0.001638002 0.04045564
>              ipad       0.322251606 1.84940588
>              iphone     0.030873494 0.23250250
>              lg         0.000000000 0.00000000
>              linux      0.023501362 0.34698919
>              mac        0.082653774 1.22535027
>              mobile     0.000000000 0.00000000
>              nokia      0.000000000 0.00000000
>              samsung    0.000000000 0.00000000
>              symbian    0.000000000 0.00000000
>              unknown    0.003759398 0.08219078
>              windows    0.021158528 0.32916970
> the predictors are integers.
> is the first column for the 0 predictors and the second for all non-0?
> Is there a way to ask naiveBayes to differenciate between non-0 values?
> thanks!

