[R] sampsize in Random Forests
Naiara Pinto
naiara at mail.utexas.edu
Sun Mar 9 22:18:48 CET 2008
Hi all,
I have a dataset where each point is assigned to a class A, B, C, or
D. Each point is also assigned to a study site. Each study site is
coded with a number ranging between 1-100. This information is stored
in the vector studySites.
I want to run randomForests using stratified sampling, so I chose the option
strata = factor(studySites)
But I am not sure how to control the number of samples taken from each
study site. I tried to use 10 points from each study site:
mySampSize = rep(10, 100)
So my function call looks like:
RF = randomForest(myClass~., data=myData, mtry=5, importance=TRUE,
strata = factor(studySites), sampsize=mySampSize)
But randomForest gives me the following error:
Error in randomForest.default(m, y, ...) :
sampsize can not be larger than class frequency
Does anybody have any idea why this happens?
Thank you very much,
Naiara.
More information about the R-help
mailing list