[R] Cox regression model for matched data with replacement

Therneau, Terry M., Ph.D. therneau at mayo.edu
Wed Aug 13 16:19:52 CEST 2014



On 08/13/2014 08:38 AM, John Pura wrote:
> Thank you for the reply. However, I think I may not have clarified what my cases are. I'm studying the effect of radiation treatment (vs. none) on survival. My cases are patients who received radiation and controls are those who did not. I used a propensity score model to match cases to controls in a 1:2 fashion. However, because the matching was done with replacement, some controls were matched to more than one case. How can I go about analyzing this - would frequency weighting work?
>
> Thanks,
> John

We went down the wrong path.  When people use the word "case" it almost always refers to 
"subjects who had the outcome".  If I read the above correctly you have the more simple 
situation of subset selection.  Subjects were chosen to be in the model without reference 
to their outcome status, with the goal of balancing treatment wrt other predictive 
factors.  Correct?   If so, my preferred modeling strategy, in order.

1. coxph(Surv(time, status) ~ treatment, data=one)
   Where data set "one" has one copy of each subject selected to be in the study.  If they 
were nominated twice they still appear once.  Optional: give each control a case weight 
equal to the number of times they were selected.  This will better balance the data set 
wrt the factors.

2. Same model, with covariates.  The argument about whether covariates on which you have 
balanced should be included in the model is as old the hills --- "belt AND suspenders?" 
--- with proponents on both sides.  Meh.  Unless there are too many of course. I still 
like the 10-20 events per covarate rule to choose the maximum number of predictors.

3. coxph(Surv(time, status) ~ treatment + strata(group), data=two)
  I veiw this as model 2 with paranoia.  "The covariate effects are so odd that we'll 
never model them correctly, so treat each combination as unique."   The data set two needs 
to have each treated subject + their controls in a separate stratum.  Even though some 
controls are in the data set twice, they don't need case weights since they are in any 
given stratum only once.

For any  of the above you can add a robust variance.  Required if case weights are used.

Terry T



More information about the R-help mailing list