[R] how to create stratified (cross-validation) partitions according to numerical features

Martin Guetlein martin.guetlein at googlemail.com
Fri Jan 13 09:49:35 CET 2012


Hi all,

I want to fragment a dataset into k-cross-validation partitions
(folds). The content of the folds should be stratified, but not
according to a single (categorical) feature, but according to a range
of features (numeric, if possible numeric and categorical). Does
anybody know a way to do this?

I only found a way to do this for a single split (training-test split)
with the package sampling. I will paste the example code for the
training-test split below to make clear what I am looking for.

With best regards,
Martin

example code:

library("sampling")
data <- as.matrix( iris[1:4] ) # skipping iris class column as this
method only works for numerical features, but thats ok
prob <- 0.3 # probability to be selected into test set
samplecube(data, pik=rep(prob, times=nrow(data)), order=2)
>>>
[...]
QUALITY OF BALANCING
             TOTALS HorvitzThompson_estimators Relative_deviation
Sepal.Length  876.5                   874.6667        -0.20916524
Sepal.Width   458.6                   458.3333        -0.05814799
Petal.Length  563.7                   563.3333        -0.06504642
Petal.Width   179.9                   178.6667        -0.68556606
   [1] 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1
 [38] 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0
 [75] 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1
[112] 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 1
[149] 0 0

-- 
Dipl-Inf. Martin Gütlein
Phone:
+49 (0)761 203 7633 (office)
+49 (0)177 623 9499 (mobile)
Email:
guetlein at informatik.uni-freiburg.de



More information about the R-help mailing list