[R] Random Forest AUC

Claudia Beleites cbeleites at units.it
Sat Oct 23 21:38:46 CEST 2010


Dear List,

Just curiosity (disclaimer: I never used random forests till now for 
more than a little playing around):

Is there no out-of-bag estimate available?
I mean, there are already ca. 1/e trees where a (one) given sample is 
out-of-bag, as Andy explained. If now the voting is done only over the 
oob trees, I should get a classical oob performance measure.
Or is the oob estimate internally used up by some kind of optimization 
(what would that be, given that the trees are grown till the end?)?

Hoping that I do not spoil the pedagogic efforts of the list in teaching 
Ravishankar to do his homework reasoning himself...

Claudia

Am 23.10.2010 20:49, schrieb Changbin Du:
> I think you should use 10 fold cross validation to judge your performance on
> the validation parts. What you did will be overfitted for sure, you test on
> the same training set used for your model buliding.
>
>
> On Sat, Oct 23, 2010 at 6:39 AM, mxkuhn<mxkuhn at gmail.com>  wrote:
>
>> I think the issue is that you really can't use the training set to judge
>> this (without resampling).
>>
>> For example, k nearest neighbors are not known to over fit, but  a 1nn
>> model will always perfectly predict the training data.
>>
>> Max
>>
>> On Oct 23, 2010, at 9:05 AM, "Liaw, Andy"<andy_liaw at merck.com>  wrote:
>>
>>> What Breiman meant is that as the model gets more complex (i.e., as the
>>> number of trees tends to infinity) the geneeralization error (test set
>>> error) does not increase.  This does not hold for boosting, for example;
>>> i.e., you can't "boost forever", which nececitate the need to find the
>>> optimal number of iterations.  You don't need that with RF.
>>>
>>>> -----Original Message-----
>>>> From: r-help-bounces at r-project.org
>>>> [mailto:r-help-bounces at r-project.org] On Behalf Of vioravis
>>>> Sent: Saturday, October 23, 2010 12:15 AM
>>>> To: r-help at r-project.org
>>>> Subject: Re: [R] Random Forest AUC
>>>>
>>>>
>>>> Thanks Max and Andy. If the Random Forest is always giving an
>>>> AUC of 1, isn't
>>>> it over fitting??? If not, how do you differentiate this from over
>>>> fitting??? I believe Random forests are claimed to never over
>>>> fit (from the
>>>> following link).
>>>>
>>>> http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.ht<http://www.stat.berkeley.edu/%7Ebreiman/RandomForests/cc_home.ht>
>>>> m#features
>>>>
>>>>
>>>> Ravishankar R
>>>> --
>>>> View this message in context:
>>>> http://r.789695.n4.nabble.com/Random-Forest-AUC-tp3006649p3008157.html
>>>> Sent from the R help mailing list archive at Nabble.com.
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>> Notice:  This e-mail message, together with any attachme...{{dropped:11}}
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>



More information about the R-help mailing list