[R] Using sample to create Training and Test sets

Max Kuhn mxkuhn at gmail.com
Fri May 15 16:26:24 CEST 2009


>> Forgive the newbie question, I want to select random rows from my
>> data.frame to create a test set (which I can do) but then I want to
>> create a training set using whats left over.
>>

The caret package has a function, createDataPartition, that does the
split taking into account the distribution of the outcome. This might
be good in classification cases where one or more classes have low
percentages in the data set.

There is more detail in the pdf:

 http://cran.r-project.org/web/packages/caret/vignettes/caretMisc.pdf

and examples in this pdf

  http://cran.r-project.org/web/packages/caret/vignettes/caretTrain.pdf

Max




More information about the R-help mailing list