[R] sampsize in Random Forests

Naiara Pinto naiara at mail.utexas.edu
Sun Mar 9 22:18:48 CET 2008


Hi all,

I have a dataset where each point is assigned to a class A, B, C, or
D. Each point is also assigned to a study site. Each study site is
coded with a number ranging between 1-100. This information is stored
in the vector studySites.

I want to run randomForests using stratified sampling, so I chose the option
strata = factor(studySites)

But I am not sure how to control the number of samples taken from each
study site. I tried to use 10 points from each study site:
mySampSize = rep(10, 100)

So my function call looks like:
RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE,
strata = factor(studySites), sampsize=mySampSize)

But randomForest gives me the following error:
Error in randomForest.default(m, y, ...) :
sampsize can not be larger than class frequency

Does anybody have any idea why this happens?

Thank you very much,

Naiara.



More information about the R-help mailing list