[R] Question on: Random Forest Variable Importance for RegressionProblems

Mareike Ließ mareike.liess at gmx.de
Wed Apr 28 20:34:41 CEST 2010


Well, explanation on "importance" says, that for regression the first 
column (%IncMSE)
is the mean decrease in accuracy and the second ("IncNodePurity") the 
mean decrease in MSE.
Dose not make much sense at all.
I do not know what "%IncMSE" stands for. Alright, MSE= mean square 
error", but of what exactly,
since it is found next to the variables used for prediction.
And what does "%Inc" refer to?  percent increase? But again percent 
increase regarding mean square error?
Why would I want to increase the mean square error?
If as I assume "IncNodePurity" stands for increase in node 
impurity...why would I want to increase the node impurity?
It would really help a lot to know how these two values are exactly 
calculated and what they stand for. Both is not clear.

Thanks
Mareike



Liaw, Andy schrieb:
> I would have thought that the help page for importance() is an (the?) obvious place to look...
>
> If that description is not clear, please let me know which part isn't clear to you.
>
> Andy
>
> From: Mareike Lies
>   
>> I am trying to use the package RandomForest performing regression.
>> The variable importance estimates are given as:  "%IncMSE"    
>>  and      
>> "IncNodePurity"
>> Can anyone explain me what these refer to and how they are calculated?
>> I found a lot of information on variable importance measures for 
>> classification problems, but nothing on regression.
>>
>> Thanks a lot.
>> Mareike
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>     
> Notice:  This e-mail message, together with any attach...{{dropped:13}}



More information about the R-help mailing list