[R] Random Forest AUC

Sun Oct 24 16:40:04 CEST 2010

The OOB error estimates in RF is one really nifty feature that alleviate
the need for additional cross-validation or resampling.  I've done some
empirical comparison between OOB estimates and 10-fold CV estimates, and
they are basically the same.  

Andy

> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Claudia Beleites
> Sent: Saturday, October 23, 2010 3:39 PM
> To: r-help at r-project.org
> Subject: Re: [R] Random Forest AUC
> 
> Dear List,
> 
> Just curiosity (disclaimer: I never used random forests till now for 
> more than a little playing around):
> 
> Is there no out-of-bag estimate available?
> I mean, there are already ca. 1/e trees where a (one) given sample is 
> out-of-bag, as Andy explained. If now the voting is done only 
> over the 
> oob trees, I should get a classical oob performance measure.
> Or is the oob estimate internally used up by some kind of 
> optimization 
> (what would that be, given that the trees are grown till the end?)?
> 
> Hoping that I do not spoil the pedagogic efforts of the list 
> in teaching 
> Ravishankar to do his homework reasoning himself...
> 
> Claudia
> 
> Am 23.10.2010 20:49, schrieb Changbin Du:
> > I think you should use 10 fold cross validation to judge 
> your performance on
> > the validation parts. What you did will be overfitted for 
> sure, you test on
> > the same training set used for your model buliding.
> >
> >
> > On Sat, Oct 23, 2010 at 6:39 AM, mxkuhn<mxkuhn at gmail.com>  wrote:
> >
> >> I think the issue is that you really can't use the 
> training set to judge
> >> this (without resampling).
> >>
> >> For example, k nearest neighbors are not known to over 
> fit, but  a 1nn
> >> model will always perfectly predict the training data.
> >>
> >> Max
> >>
> >> On Oct 23, 2010, at 9:05 AM, "Liaw, 
> Andy"<andy_liaw at merck.com>  wrote:
> >>
> >>> What Breiman meant is that as the model gets more complex 
> (i.e., as the
> >>> number of trees tends to infinity) the geneeralization 
> error (test set
> >>> error) does not increase.  This does not hold for 
> boosting, for example;
> >>> i.e., you can't "boost forever", which nececitate the 
> need to find the
> >>> optimal number of iterations.  You don't need that with RF.
> >>>
> >>>> -----Original Message-----
> >>>> From: r-help-bounces at r-project.org
> >>>> [mailto:r-help-bounces at r-project.org] On Behalf Of vioravis
> >>>> Sent: Saturday, October 23, 2010 12:15 AM
> >>>> To: r-help at r-project.org
> >>>> Subject: Re: [R] Random Forest AUC
> >>>>
> >>>>
> >>>> Thanks Max and Andy. If the Random Forest is always giving an
> >>>> AUC of 1, isn't
> >>>> it over fitting??? If not, how do you differentiate this 
> from over
> >>>> fitting??? I believe Random forests are claimed to never over
> >>>> fit (from the
> >>>> following link).
> >>>>
> >>>> 
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.ht
<http://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.ht>
> >>>> m#features
> >>>>
> >>>>
> >>>> Ravishankar R
> >>>> --
> >>>> View this message in context:
> >>>> 
> http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html
> >>>> Sent from the R help mailing list archive at Nabble.com.
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, 
> reproducible code.
> >>>>
> >>> Notice:  This e-mail message, together with any 
> attachme...{{dropped:11}}
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
Notice:  This e-mail message, together with any attachme...{{dropped:11}}