[R-sig-ME] cross validation on discrete time survival analysis

Mon Aug 1 03:55:27 CEST 2016

Dear Dr.

Greetings
I would be very grateful if you could let me know how to do cross validation when estimating a discrete time survival analysis in R.
ID TIME EVENT  x1       x2    x3      x4       x5
1   1   0   1.281   0.023   0.875   1.216   0.061
1   2   0   1.270   0.006   0.821   1.005   -0.014
1   3   0   1.053   -0.059  0.922   0.729   0.020
1   4   0   1.113   -0.015  0.859   0.810   0.076
1   5   1   1.220   -0.059  0.887   0.484   0.010
2   1   0   1.062   0.107   0.815   0.836   0.200
2   2   0   1.056   0.082   0.879   0.687   0.143
2   3   0   0.971   0.076   0.907   0.810   0.166
2   4   0   1.059   0.130   0.818   0.876   0.234
2   5   0   1.125   0.148   0.759   1.080   0.276
2   6   0   1.600   0.262   0.546   1.313   0.369
2   7   0   1.576   0.262   0.564   1.156   0.349
2   8   0   1.544   0.241   0.591   1.077   0.326
2   9   0   1.722   0.215   0.552   0.841   0.293
2   10  0   1.723   0.209   0.534   0.787   0.293
2   11  0   1.631   0.186   0.548   0.728   0.274
2   12  0   2.172   0.319   0.441   0.947   0.427
3   1   0   0.874   -0.035  0.794   0.610   -0.003
3   2   1   0.825   -0.142  0.952   0.573   -0.019
 require(lme4) model <- glmer(EVENT ~ TIME + (1+TIME|ID)+x1+x2+x3+x4+x5, data=df, family=binomial) p <- as.numeric(predict(model, type="response")>0.5) acc=mean(p==df$EVENT)
Is it right to do this:

accuracy = function(pred, test_data){
num.correct.negatives = sum((pred == test_data$EVENT) & (test_data$EVENT==0))
num.correct.positives = sum((p == test_data$EVENT) & (test_data$EVENT==1))
num.false.negatives = sum((pred != test_data$EVENT) & (test_data$EVENT==1))
num.false.positives = sum((pred != test_data$EVENT) & (test_data$EVENT==0))

ppv = num.correct.positives / (num.false.positives + num.correct.positives)
npv = num.correct.negatives / (num.false.negatives + num.correct.negatives)
return(list(ppv = ppv, npv = npv))
}
df = read.table(file='clipboard', header=T)
idx_train =  sapply(unique(df$ID), function(x){
   sample(which(df$ID %in% x), 1)
  }
)

#if you wish to sample based only on time. Note that k=5 here.
#idx_train = which(df$TIME %in% sample(unique(df$TIME), 5))
df_train = df[idx_train, ]
df_test = df[-idx_train,]

#perform training on training dataset
require(lme4)
model <- glmer(EVENT ~ TIME + (1+TIME|ID)+x1+x2+x3+x4+x5, data=df_train, family=binomial)
p <- as.numeric(predict(model, newdata = df_test, type="response")>0.5)

#cross validate
accuracy(p, df_test)

Thanks in advance.Best regards,   

	[[alternative HTML version deleted]]