[R] RandomForest vs. bayes & svm classification performance

Eleni Rapsomaniki e.rapsomaniki at mail.cryst.bbk.ac.uk
Mon Jul 24 19:59:31 CEST 2006


This is a question regarding classification performance using different methods.
So far I've tried NaiveBayes (klaR package), svm (e1071) package and
randomForest (randomForest). What has puzzled me is that randomForest seems to
perform far better (32% classification error) than svm and NaiveBayes, which
have similar classification errors (45%, 48% respectively). A similar
difference in performance is observed with different combinations of
parameters, priors and size of training data. 

Because I was expecting to see little difference in the perfomance of these
methods I am worried that I may have made a mistake in my randomForest call: 

my.rf=randomForest(x=train.df[,-response_index], y=train.df[,response_index],
xtest=test.df[,-response_index], ytest=test.df[,response_index],
importance=TRUE,proximity=FALSE, keep.forest=FALSE)

(where train.df and test.df are my train and test data.frames and response_index
is the column number specifiying the class)

My main question is: could there be a legitimate reason why random forest would
outperform the other two models (e.g. maybe one
method is more reliable with Gaussian data, handles categorical data
better etc)? Also, is there a way of evaluating the predictive ability of each
parameter in the bayesian model as it can be done for random Forests (through
the importance table)? 

I would appreciate any of your comments and suggestions on these.

Many thanks
Eleni Rapsomaniki

More information about the R-help mailing list