[R] randomForest importance problem with combine [Broadcast]
Liaw, Andy
andy_liaw at merck.com
Tue Jul 24 17:21:52 CEST 2007
I've been fixing some problems in the combine() function, but that's
only for regression data. Looks like you are doing classification, and
I don't see the problem:
R> library(randomForest)
randomForest 4.5-19
Type rfNews() to see new features/changes/bug fixes.
R> set.seed(1)
R> rflist <- replicate(50, randomForest(iris[-5], iris[[5]], ntree=50,
importance=TRUE), simplify=FALSE)
R> rfall <- do.call(combine, rflist)
R> importance(rfall)
setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length 0.4457861 0.53883425 0.5580657 0.4120840
Sepal.Width 0.3266790 0.07652383 0.3620240 0.2128450
Petal.Length 1.1950989 1.42014628 1.3220471 0.7989841
Petal.Width 1.1986973 1.40855969 1.3640620 0.7951053
MeanDecreaseGini
Sepal.Length 9.578580
Sepal.Width 2.301172
Petal.Length 42.935832
Petal.Width 44.409058
R> importance(rflist[[1]])
setosa versicolor virginica MeanDecreaseAccuracy
Sepal.Length 0.401714 0.71583422 0.4946420 0.4166555
Sepal.Width 0.000000 -0.03155946 0.6829287 0.2317111
Petal.Length 1.290430 1.47915219 1.3456770 0.8219003
Petal.Width 1.110142 1.44996777 1.3584799 0.7881210
MeanDecreaseGini
Sepal.Length 6.168439
Sepal.Width 2.240723
Petal.Length 48.821726
Petal.Width 42.059112
Please provide a reproducible example.
Andy
From: Joseph Retzer
>
> My apologies, subject corrected.
>
>
> I'm building a RF 50 trees at a time due to memory limitations (I have
> roughly .5 million observations and around 20 variables). I thought I
> could combine some or all of my forests later and look at global
> importance.
>
> If I have say 2 forests : tree1 and tree2, they have similar Gini and
> Raw importances and, additionally, are similar to one another. After
> combining (using the combine command) the trees into one however, the
> combined tree Raw importances have changed in rank order
> rather dramtically
> (e.g. the top most important becomes least important. It is not
> however a completely reversed ordering). In addtion, the
> scale of both the
> Raw and Gini importances is orders of magnitude smaller for
> the combined
> tree.
>
> Note that the combined tree Gini importance looks roughly similar to
> the individual tree Gini (and Raw) importance, at least in
> terms of rank
> ordering.
>
> I'm using the non-formula randomForest specification along with
> norm.votes=FALSE to facilitate large sample estimation and tree
> combining.
>
> I'm using R 2.5.0 on a windows XP machine with 2 gig RAM. I'm also
> using randomForest 4.5-18.
>
> Any advice is appreciated,
> Many thanks,
> Joe
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments,...{{dropped}}
More information about the R-help
mailing list