[R] randomForest parameters for image classification

Liaw, Andy andy_liaw at merck.com
Thu Nov 11 13:02:09 CET 2010


Please show us the code you used to run randomForest, the output, as
well as what you get with other algorithms (on the same random subset
for comparison).  I have yet to see a dataset where randomForest does
_far_ worse than other methods.

Andy 

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Deschamps, Benjamin
> Sent: Tuesday, November 09, 2010 10:52 AM
> To: r-help at r-project.org
> Subject: [R] randomForest parameters for image classification
> 
> I am implementing an image classification algorithm using the
> randomForest package. The training data consists of 31000+ training
> cases over 26 variables, plus one factor predictor variable (the
> training class). The main issue I am encountering is very low overall
> classification accuracy (a lot of confusion between classes). 
> However, I
> know from other classifications (including a regular decision tree
> classifier) that the training and validation data is sound and capable
> of producing good accuracies). 
> 
>  
> 
> Currently, I am using the default parameters (500 trees, mtry not set
> (default), nodesize = 1, replace=TRUE). Does anyone have experience
> using this with large datasets? Currently I need to randomly sample my
> training data because giving it the full 31000+ cases returns 
> an out of
> memory error; the same thing happens with large numbers of 
> trees.  From
> what I read in the documentation, perhaps I do not have 
> enough trees to
> fully capture the training data?
> 
>  
> 
> Any suggestions or ideas will be greatly appreciated.
> 
>  
> 
> Benjamin
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}



More information about the R-help mailing list