[R] Question on: Random Forest Variable Importance for RegressionProblems

Greg Snow Greg.Snow at imail.org
Wed Apr 28 21:28:08 CEST 2010


The importance measures show how much MSE or Impurity increase when that variable is randomly permuted.  If you randomly permute a variable that does not gain you anything in prediction, then predictions won't change much and you will only see small changes in impurity and mse.  On the other hand the important variables will change the predictions by quite a bit if randomly permuted, so you will see bigger changes.  Turn this around and you see big changes indicate important variables.

-- 
Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at imail.org
801.408.8111


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-
> project.org] On Behalf Of Mareike Ließ
> Sent: Wednesday, April 28, 2010 12:35 PM
> To: Liaw, Andy
> Cc: r-help at r-project.org
> Subject: Re: [R] Question on: Random Forest Variable Importance for
> RegressionProblems
> 
> Well, explanation on "importance" says, that for regression the first
> column (%IncMSE)
> is the mean decrease in accuracy and the second ("IncNodePurity") the
> mean decrease in MSE.
> Dose not make much sense at all.
> I do not know what "%IncMSE" stands for. Alright, MSE= mean square
> error", but of what exactly,
> since it is found next to the variables used for prediction.
> And what does "%Inc" refer to?  percent increase? But again percent
> increase regarding mean square error?
> Why would I want to increase the mean square error?
> If as I assume "IncNodePurity" stands for increase in node
> impurity...why would I want to increase the node impurity?
> It would really help a lot to know how these two values are exactly
> calculated and what they stand for. Both is not clear.
> 
> Thanks
> Mareike
> 
> 
> 
> Liaw, Andy schrieb:
> > I would have thought that the help page for importance() is an (the?)
> obvious place to look...
> >
> > If that description is not clear, please let me know which part isn't
> clear to you.
> >
> > Andy
> >
> > From: Mareike Lies
> >
> >> I am trying to use the package RandomForest performing regression.
> >> The variable importance estimates are given as:  "%IncMSE"
> >>  and
> >> "IncNodePurity"
> >> Can anyone explain me what these refer to and how they are
> calculated?
> >> I found a lot of information on variable importance measures for
> >> classification problems, but nothing on regression.
> >>
> >> Thanks a lot.
> >> Mareike
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> > Notice:  This e-mail message, together with any
> attach...{{dropped:13}}
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list