[R] how to create stratified (cross-validation) partitions according to numerical features
Martin Guetlein
martin.guetlein at googlemail.com
Fri Jan 13 09:49:35 CET 2012
Hi all,
I want to fragment a dataset into k-cross-validation partitions
(folds). The content of the folds should be stratified, but not
according to a single (categorical) feature, but according to a range
of features (numeric, if possible numeric and categorical). Does
anybody know a way to do this?
I only found a way to do this for a single split (training-test split)
with the package sampling. I will paste the example code for the
training-test split below to make clear what I am looking for.
With best regards,
Martin
example code:
library("sampling")
data <- as.matrix( iris[1:4] ) # skipping iris class column as this
method only works for numerical features, but thats ok
prob <- 0.3 # probability to be selected into test set
samplecube(data, pik=rep(prob, times=nrow(data)), order=2)
>>>
[...]
QUALITY OF BALANCING
TOTALS HorvitzThompson_estimators Relative_deviation
Sepal.Length 876.5 874.6667 -0.20916524
Sepal.Width 458.6 458.3333 -0.05814799
Petal.Length 563.7 563.3333 -0.06504642
Petal.Width 179.9 178.6667 -0.68556606
[1] 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1
[38] 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0
[75] 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1
[112] 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1
[149] 0 0
--
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 7633 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein at informatik.uni-freiburg.de
More information about the R-help
mailing list