[R] randomForest

Uwe Ligges ligges at statistik.tu-dortmund.de
Fri Mar 20 19:04:32 CET 2009



Uwe Ligges wrote:
> 
> 
> Anirudh Kondaveeti wrote:
>> To be more clear,
>>
>> My data set contains two classes.. Class 1 and Class 2
>> Class 1 has original data with 300 rows
>> Class 2 is randomly generated data with 1500 rows.
>>
>> I want to sample a new data set with
>> Class 1 - all the rows
>> Class 2 - only 300 rows out of 1500 rows
>>
>> and then use it in random forest with 500 trees.
>>
>> Also the Class 2 should have different 300 rows for different trees in 
>> the
>> forest. Thanks!
> 
> 
> Ah, in that case (stratified sampling) combine arguments "strata" and 
> "sampsize", in principle, but you cannot select ALL rows of one class: 
> you somehow ignore one of the main ideas of randomForests to bootstrap 
> observations - and randomForest will certainly bootstrap for you.

In fact, you can also use  replace = FALSE  as well, but then, as I 
said, one of the main  ideas of randomForest is ignored....

Uwe Ligges





> Uwe Ligges
> 
> 
> 
>> Anirudh Kondaveeti
>> ----------------------------
>>
>>
>> On Fri, Mar 20, 2009 at 1:45 PM, Anirudh Kondaveeti <
>> anirudh.kondaveeti at gmail.com> wrote:
>>
>>> sampsize uses the same sample for all the trees in the random Forest.
>>>
>>> But I want to use different sample for each tree of the 500 trees in the
>>> random Forest. Thanks!
>>>
>>>
>>> Anirudh Kondaveeti
>>> ----------------------------
>>>
>>>
>>> 2009/3/20 Uwe Ligges <ligges at statistik.tu-dortmund.de>
>>>
>>>
>>>> Anirudh Kondaveeti wrote:
>>>>
>>>>> Hi!
>>>>>
>>>>> I am dealing with random forest using R.
>>>>>
>>>>> Is there a way to sample a fixed no.of rows from a dataset for use 
>>>>> with
>>>>> different trees in random Forest.
>>>>> To be more clear, my data set contains 1500 rows, and I am growing 500
>>>>> trees
>>>>> in Random Forest
>>>>> Is it possible to sample only 500 rows of data from the data set 
>>>>> and use
>>>>> it
>>>>> for different trees in the forest. I mean each tree of the forest 
>>>>> should
>>>>> use
>>>>> a different 500 rows from the data set.
>>>>>
>>>>
>>>> See ?randomForest and the argument sampsize.
>>>>
>>>> Uwe Ligges
>>>>
>>>>
>>>>
>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> Anirudh Kondaveeti
>>>>> ----------------------------
>>>>>
>>>>>        [[alternative HTML version deleted]]
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>
>




More information about the R-help mailing list