[R] Random Forest - Strata

mxkuhn mxkuhn at gmail.com
Wed Jul 21 14:21:22 CEST 2010


If you use the index argument of the trainControl() function in the caret package, the train() function can be used for this type of resampling (and you'll get some decent summaries and visualizations to boot)

Max

On Jul 21, 2010, at 7:11 AM, "Tim Howard" <tghoward at gw.dec.state.ny.us> wrote:

> Coll,
> 
> An alternative approach is to do that subsetting yourself before sending it to RF and treat each group as an external validation group, as follows:
> - extract Site A, build a RF model (Model 1) on sites B and C
> - validate this model by running a predict on site A using the model, use ROCR or other evaluation metrics to look at the effectiveness of Model 1. 
> - extract Site B, build a RF model (Model 2) on sites A and C.
> - validate this model by trying to predict presence in site B using model 2.
> - continue through all your sites.
> 
> This is called 'leave-one-out' and is used in some fields for model validation.  You final accuracy estimates of your model could be based on the averages of values obtained for each model. 
> 
> Hope that Helps. 
> Tim
> 
> 
> 
> ------------------------------
> 
> Message: 44
> Date: Tue, 20 Jul 2010 08:48:04 -0700 (PDT)
> From: Coll <gbcoll2 at gmail.com>
> To: r-help at r-project.org 
> Subject: [R] Random Forest - Strata
> Message-ID: <1279640884553-2295731.post at n4.nabble.com>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> Hi all,
> 
> Had struggled in getting "Strata" in randomForest to work on this. 
> 
> Can I get randomForest for each of its TREE, to get ALL sample from some
> strata to build tree, while leaving some strata TOTALLY untouched as oob?
> 
> e.g. in below, how I can tell RF to, 
> - for tree 1 in the forest, to use only Site A and B to build the tree,
> while using the WHOLE Site C data for the oob error rate,
> - for tree 2, use only site A and C to build tree, while using whole site B
> data for oob
> - for tree 3, use Site B and C, A as oob...?
> 
> My command does not work as it would use some sample in all of the sites:
> rforest.obj <- randomForest(Presence.f ~., data=dataset.subset, strata =
> site.factor)
> 
> while 
> the setting the corresponding "sampsize" argument seems would only screen
> out the Site in all tree building...
> 
> Site    Presence      Length      Sulphur
> A            Yes           3.50            19.42
> A            No            3.90            51.09
> A            No            3.60            26.75
> B            Yes           2.60            9.71
> B            No            2.20            9.77
> B            No            2.60            8.60
> B            No            3.00            35.59
> C            Yes           3.50            16.07
> C            No            3.40            49.96
> C            No            3.10            35.35
> 
> Any idea / comments are welcomed.
> 
> Thanks in advance.
> 
> Coll
> -- 
> View this message in context: http://r.789695.n4.nabble.com/Random-Forest-Strata-tp2295731p2295731.html 
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list