[R] Random Forests Variable Importance Question
Paul Fisch
fischp at gmail.com
Mon Apr 13 11:04:03 CEST 2009
I am trying to use the random forests package for classification in R.
The Variable Importance Measures listed are:
-mean raw importance score of variable x for class 0
-mean raw importance score of variable x for class 1
-MeanDecreaseAccuracy
-MeanDecreaseGini
Now I know what these "mean" as in I know their definitions. What I
want to know is how to use them.
What I am trying to figure out is what these values mean in only the
context of how accurate they are, what is a good value, what is a bad
value, what are the maximums and minimums, etc.
If a variable has a high MeanDecreaseAccuracy or MeanDecreaseGini does
that mean it is important or unimportant? Also any information on the
raw scores would be really helpful too. I want to know everything
there is to know about these numbers that is relevant to the
application of them.
I don't really want a technical explanation that uses words like
'error', 'summation', or 'permutated', but rather a simpler
explanation that didn't involve any discussion of how random forests
works(I have read all about that and didn't find it very helpful.)
Like if I wanted someone to explain to me how to use a radio, I
wouldn't expect the explanation to involve how a radio converts radio
waves into sound.
If anyone can help me out at all it would be really great. I have
read many many lectures on random forests and other data mining
lectures but I have never found simple answers about how to read the
variable importance measures.
Thanks,
Paul Fisch
More information about the R-help
mailing list