[R] Variable Importance - Random Forest
Henric Nilsson (Public)
nilsson.henric at gmail.com
Sun Aug 26 00:32:37 CEST 2007
Den 2007-08-24 21:13, Mathe, Ewy (NIH/NCI) [F] skrev:
> Hello,
>
>
>
> I am trying to explore the use of random forests for classification and
> am certain about the interpretation of the importance measurements.
In case you haven't already done so, you probably want to read
@ARTICLE{Strobl+Boulesteix+Zeileis+Hothorn:2007,
author = {Carolin Strobl and Anne-Laure Boulesteix and Achim Zeileis
and Torsten Hothorn},
title = {Bias in Random Forest Variable Importance Measures:
Illustrations,
Sources and a Solution},
journal = {{BMC} Bioinformatics},
year = {2007},
volume = {8},
number = {25},
url = {http://www.biomedcentral.com/1471-2105/8/25/}
}
HTH,
Henric
>
>
>
> When having the option "importance = T" in the randomForest call, the
> resulting 'importance' element matrix has four columns with the
> following headings:
>
> 0 - mean raw importance score of variable x for class 0 (where
> importance is the difference between the permutated data error and the
> original test set error)
>
> 1 - mean raw importance score of variable x for class 1
>
> MeanDecreaseAccuracy : average lowering of the margin across all cases
> (where margin is the proportion of votes for the true class - the
> maximum proportion of votes for the other classes)
>
> MeanDecreaseGini : summation of the gini decreases over all trees in the
> forest
>
>
>
> Are these definitions correct? Why is the raw importance score
> calculated for each class? Could one just average the raw importance
> scores for class 0 and 1 to get a composite importance score?
>
>
>
> Now, when having the option "importance = F" in the randomForest call,
> the 'importance' element is now a vector. What values are those?
>
>
>
> Thank you in advance for any input you may have.
>
>
>
> Best,
>
> Ewy
>
>
>
>
>
>
>
>
>
> Ewy Mathe, Ph. D.
>
> Laboratory of Human Carcinogenesis
>
> National Cancer Institute, NIH
>
> 37 Convent Drive
>
> Building 37, Room 3068
>
> Bethesda, MD 20892-4255
>
> Tel: 301-496-5835
>
> Fax: 301-496-0497
>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list