[R-sig-ME] cross validation on discrete time survival analysis
shahla ebrahimi
shebrahimi_3622 at yahoo.com
Mon Aug 1 03:55:27 CEST 2016
Dear Dr.
Greetings
I would be very grateful if you could let me know how to do cross validation when estimating a discrete time survival analysis in R.
ID TIME EVENT x1 x2 x3 x4 x5
1 1 0 1.281 0.023 0.875 1.216 0.061
1 2 0 1.270 0.006 0.821 1.005 -0.014
1 3 0 1.053 -0.059 0.922 0.729 0.020
1 4 0 1.113 -0.015 0.859 0.810 0.076
1 5 1 1.220 -0.059 0.887 0.484 0.010
2 1 0 1.062 0.107 0.815 0.836 0.200
2 2 0 1.056 0.082 0.879 0.687 0.143
2 3 0 0.971 0.076 0.907 0.810 0.166
2 4 0 1.059 0.130 0.818 0.876 0.234
2 5 0 1.125 0.148 0.759 1.080 0.276
2 6 0 1.600 0.262 0.546 1.313 0.369
2 7 0 1.576 0.262 0.564 1.156 0.349
2 8 0 1.544 0.241 0.591 1.077 0.326
2 9 0 1.722 0.215 0.552 0.841 0.293
2 10 0 1.723 0.209 0.534 0.787 0.293
2 11 0 1.631 0.186 0.548 0.728 0.274
2 12 0 2.172 0.319 0.441 0.947 0.427
3 1 0 0.874 -0.035 0.794 0.610 -0.003
3 2 1 0.825 -0.142 0.952 0.573 -0.019
require(lme4) model <- glmer(EVENT ~ TIME + (1+TIME|ID)+x1+x2+x3+x4+x5, data=df, family=binomial) p <- as.numeric(predict(model, type="response")>0.5) acc=mean(p==df$EVENT)
Is it right to do this:
accuracy = function(pred, test_data){
num.correct.negatives = sum((pred == test_data$EVENT) & (test_data$EVENT==0))
num.correct.positives = sum((p == test_data$EVENT) & (test_data$EVENT==1))
num.false.negatives = sum((pred != test_data$EVENT) & (test_data$EVENT==1))
num.false.positives = sum((pred != test_data$EVENT) & (test_data$EVENT==0))
ppv = num.correct.positives / (num.false.positives + num.correct.positives)
npv = num.correct.negatives / (num.false.negatives + num.correct.negatives)
return(list(ppv = ppv, npv = npv))
}
df = read.table(file='clipboard', header=T)
idx_train = sapply(unique(df$ID), function(x){
sample(which(df$ID %in% x), 1)
}
)
#if you wish to sample based only on time. Note that k=5 here.
#idx_train = which(df$TIME %in% sample(unique(df$TIME), 5))
df_train = df[idx_train, ]
df_test = df[-idx_train,]
#perform training on training dataset
require(lme4)
model <- glmer(EVENT ~ TIME + (1+TIME|ID)+x1+x2+x3+x4+x5, data=df_train, family=binomial)
p <- as.numeric(predict(model, newdata = df_test, type="response")>0.5)
#cross validate
accuracy(p, df_test)
Thanks in advance.Best regards,
[[alternative HTML version deleted]]
More information about the R-sig-mixed-models
mailing list