[R] randomForest importance problem with combine [Broadcast]

Liaw, Andy andy_liaw at merck.com
Tue Jul 24 17:21:52 CEST 2007


I've been fixing some problems in the combine() function, but that's
only for regression data.  Looks like you are doing classification, and
I don't see the problem:

R> library(randomForest)
randomForest 4.5-19 
Type rfNews() to see new features/changes/bug fixes.
R> set.seed(1)
R> rflist <- replicate(50, randomForest(iris[-5], iris[[5]], ntree=50,
importance=TRUE), simplify=FALSE)
R> rfall <- do.call(combine, rflist)
R> importance(rfall)
                setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length 0.4457861 0.53883425 0.5580657            0.4120840
Sepal.Width  0.3266790 0.07652383 0.3620240            0.2128450
Petal.Length 1.1950989 1.42014628 1.3220471            0.7989841
Petal.Width  1.1986973 1.40855969 1.3640620            0.7951053
             MeanDecreaseGini
Sepal.Length         9.578580
Sepal.Width          2.301172
Petal.Length        42.935832
Petal.Width         44.409058
R> importance(rflist[[1]])
               setosa  versicolor virginica MeanDecreaseAccuracy
Sepal.Length 0.401714  0.71583422 0.4946420            0.4166555
Sepal.Width  0.000000 -0.03155946 0.6829287            0.2317111
Petal.Length 1.290430  1.47915219 1.3456770            0.8219003
Petal.Width  1.110142  1.44996777 1.3584799            0.7881210
             MeanDecreaseGini
Sepal.Length         6.168439
Sepal.Width          2.240723
Petal.Length        48.821726
Petal.Width         42.059112

Please provide a reproducible example.

Andy
 

From: Joseph Retzer
> 
> My apologies, subject corrected.
> 
> 
> I'm building a RF 50 trees at a time due to memory limitations (I have
>  roughly .5 million observations and around 20 variables). I thought I
>  could combine some or all of my forests later and look at global
>  importance. 
> 
> If I have say 2 forests : tree1 and tree2, they have similar Gini and
>  Raw importances and, additionally, are similar to one another. After
>  combining (using the combine command) the trees into one however, the
>  combined tree Raw importances have changed in rank order 
> rather dramtically
>  (e.g. the top most important becomes least important. It is not
>  however a completely reversed ordering). In addtion, the 
> scale of both the
>  Raw and Gini importances is orders of magnitude smaller for 
> the combined
>  tree.
> 
> Note that the combined tree Gini importance looks roughly similar to
>  the individual tree Gini (and Raw) importance, at least in 
> terms of rank
>  ordering.
> 
> I'm using the non-formula randomForest specification  along  with
>   norm.votes=FALSE to facilitate  large sample  estimation  and  tree
>  combining.
> 
> I'm using R 2.5.0 on a windows XP machine with 2 gig RAM. I'm also
>  using randomForest 4.5-18.
> 
> Any advice is appreciated,
> Many thanks,
> Joe
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 


------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments,...{{dropped}}



More information about the R-help mailing list