[R] Leave One Group Out with caret
Marcus Hanisch
marcus at deltalima.org
Thu Mar 2 12:09:40 CET 2017
In Social psychology we are working on a project where we try to predict
relationship quality (outcome) by personality (features). Main goal is
to contribute to better match people with have higher chances to have a
happy long lasting romantic relationship. I would be very grateful if
you could help me with this by answering the following question:
At the moment, in R the k-fold-cv randomly sorts rows of data/people
into the folds. A couple is represented by two rows in the dataset
(partner 1 and partner 2) which are of course not always equally happy
in the relationship they have with each other. But nevertheless the
relationship quality of partner 1 and partner 2 correlate, which means
the cases are somehow dependent. How can I sort partners of one couple
to the same fold (but still as two cases), so that the test sample is
always completely independent to the trainings sample? How can I write a
Leave One Group Out CS - command in R, as it exists in Python (which I
unfortunately cannot perform with)?
Couples are identified by the same number in the row paarID.
Here is the processing part of the code in R from the situation:
library(caret)
outcome <- "RQ_continuaryScale"
variables <- colnames(dat)[use_covar_i]
model <- paste(variables, collapse=" + ")
model <- paste(outcome, '~', model, collapse=' ')
training_config <- trainControl(method="cv", number=5, repeats = 100)
fit <- train(as.formula(model), data=dat_nomiss, "glmnet", trControl =
training_config)
Here is some Sampledata:
https://github.com/topepo/caret/files/796416/Testdata_couples_1.csv.2.zip
I'm quite new to R and not a pro to the statistics topic. :(
I already tried carets LGOCV method, but the results are not that what i
expected.
When I try following:
training_config <- trainControl(method="LGOCV", number=96, p=0.97)
then i just get a sample size of 188, but i need 190.
i hope i could describe my problem well for you. i am very thankful for
any help and support.
Best regards!
More information about the R-help
mailing list