[R] weighted Cox proportional hazards regression

Scott Bartell sbartell at uci.edu
Tue Dec 4 14:00:54 CET 2007


I'm getting unexpected results from the coxph function when using
weights from counter-matching.  For example, the following code
produces a parameter estimate of -1.59 where I expect 0.63:

d2 = structure(list(x = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
1, 0, 0, 1, 0, 1, 0, 1, 0, 1), wt = c(5, 42, 40, 4, 43, 4, 42,
4, 44, 5, 38, 4, 39, 4, 4, 37, 40, 4, 44, 5, 45, 5, 44, 5), riskset = c(1L,
1L, 4L, 4L, 6L, 6L, 12L, 12L, 13L, 13L, 19L, 19L, 23L, 23L, 31L,
31L, 42L, 42L, 45L, 45L, 70L, 70L, 93L, 93L), cc = c(1, 0, 1,
0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
), pseudotime = rep(1,24)), .Names = c("x", "wt", "riskset",
"cc", "pseudotime"), class = "data.frame", row.names=1:24)

coxph( Surv(pseudotime, cc) ~ x + strata(riskset), weights=wt,
robust=T, method="breslow",data=d2)

I'm expecting a value of about 0.63 to 0.64 based on the data source
(simulated) and the following hand-coded MLE:

negloglik = function(beta,dat) {
  dat$wexb = dat$wt * exp(dat$x * beta)
  agged = aggregate(dat$wexb,list(riskset=dat$riskset),sum)
  names(agged)[2] = "denom"
  dat = merge(dat[dat$cc==1,],agged,by="riskset")
  -sum(log(dat$wexb)-log(dat$denom))
  }
nlm(negloglik,0,hessian=T,dat=d2)

Am I misunderstanding the meaning of case weights in the coxph
function?  The help file is pretty terse.

Scott Bartell, PhD
Assistant Professor
Department of Epidemiology
University of California, Irvine



More information about the R-help mailing list