[R] Random Forest AUC
Liaw, Andy
andy_liaw at merck.com
Sun Oct 24 16:40:04 CEST 2010
The OOB error estimates in RF is one really nifty feature that alleviate
the need for additional cross-validation or resampling. I've done some
empirical comparison between OOB estimates and 10-fold CV estimates, and
they are basically the same.
Andy
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Claudia Beleites
> Sent: Saturday, October 23, 2010 3:39 PM
> To: r-help at r-project.org
> Subject: Re: [R] Random Forest AUC
>
> Dear List,
>
> Just curiosity (disclaimer: I never used random forests till now for
> more than a little playing around):
>
> Is there no out-of-bag estimate available?
> I mean, there are already ca. 1/e trees where a (one) given sample is
> out-of-bag, as Andy explained. If now the voting is done only
> over the
> oob trees, I should get a classical oob performance measure.
> Or is the oob estimate internally used up by some kind of
> optimization
> (what would that be, given that the trees are grown till the end?)?
>
> Hoping that I do not spoil the pedagogic efforts of the list
> in teaching
> Ravishankar to do his homework reasoning himself...
>
> Claudia
>
> Am 23.10.2010 20:49, schrieb Changbin Du:
> > I think you should use 10 fold cross validation to judge
> your performance on
> > the validation parts. What you did will be overfitted for
> sure, you test on
> > the same training set used for your model buliding.
> >
> >
> > On Sat, Oct 23, 2010 at 6:39 AM, mxkuhn<mxkuhn at gmail.com> wrote:
> >
> >> I think the issue is that you really can't use the
> training set to judge
> >> this (without resampling).
> >>
> >> For example, k nearest neighbors are not known to over
> fit, but a 1nn
> >> model will always perfectly predict the training data.
> >>
> >> Max
> >>
> >> On Oct 23, 2010, at 9:05 AM, "Liaw,
> Andy"<andy_liaw at merck.com> wrote:
> >>
> >>> What Breiman meant is that as the model gets more complex
> (i.e., as the
> >>> number of trees tends to infinity) the geneeralization
> error (test set
> >>> error) does not increase. This does not hold for
> boosting, for example;
> >>> i.e., you can't "boost forever", which nececitate the
> need to find the
> >>> optimal number of iterations. You don't need that with RF.
> >>>
> >>>> -----Original Message-----
> >>>> From: r-help-bounces at r-project.org
> >>>> [mailto:r-help-bounces at r-project.org] On Behalf Of vioravis
> >>>> Sent: Saturday, October 23, 2010 12:15 AM
> >>>> To: r-help at r-project.org
> >>>> Subject: Re: [R] Random Forest AUC
> >>>>
> >>>>
> >>>> Thanks Max and Andy. If the Random Forest is always giving an
> >>>> AUC of 1, isn't
> >>>> it over fitting??? If not, how do you differentiate this
> from over
> >>>> fitting??? I believe Random forests are claimed to never over
> >>>> fit (from the
> >>>> following link).
> >>>>
> >>>>
> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.ht
<http://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.ht>
> >>>> m#features
> >>>>
> >>>>
> >>>> Ravishankar R
> >>>> --
> >>>> View this message in context:
> >>>>
> http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html
> >>>> Sent from the R help mailing list archive at Nabble.com.
> >>>>
> >>>> ______________________________________________
> >>>> R-help at r-project.org mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained,
> reproducible code.
> >>>>
> >>> Notice: This e-mail message, together with any
> attachme...{{dropped:11}}
> >>>
> >>> ______________________________________________
> >>> R-help at r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide
> >> http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> >>
> >
> >
> >
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Notice: This e-mail message, together with any attachme...{{dropped:11}}
More information about the R-help
mailing list