[R] cforest sampling methods

Max Kuhn mxkuhn at gmail.com
Wed Mar 19 21:12:17 CET 2014


You might look at the 'bag' function in the caret package. It will not
do the subsampling of variables at each split but you can bag a tree
and down-sample the data at each iteration. The help page has an
examples bagging ctree (although you might want to play with the tree
depth a little).

Max

On Wed, Mar 19, 2014 at 3:32 PM, Maggie Makar <maggieymakar at gmail.com> wrote:
> Hi all,
>
> I've been using the randomForest package and I'm trying to make the switch
> over to party. My problem is that I have an extremely unbalanced outcome
> (only 1% of the data has a positive outcome) which makes resampling methods
> necessary.
>
> randomForest has a very useful argument that is sampsize which allows me to
> use a balanced subsample to build each tree in my forest. lets say the
> number of positive cases is 100, my forest would look something like this:
>
> rf<-randomForest(y~. ,data=train, ntree=800,replace=TRUE,sampsize = c(100,
> 100))
>
> so I use 100 cases and 100 controls to build each individual tree. Can I do
> the same for cforests? I know I can always upsample but I'd rather not.
>
> I've tried playing around with the weights argument but I'm either not
> getting it right or it's just the wrong thing to use.
>
> Any advice on how to adapt cforests to datasets with imbalanced outcomes is
> greatly appreciated...
>
>
>
> Thanks!
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.




More information about the R-help mailing list