[R] weighted Cox proportional hazards regression
Thomas Lumley
tlumley at u.washington.edu
Tue Dec 4 23:28:30 CET 2007
On Tue, 4 Dec 2007, Scott Bartell wrote:
> I'm getting unexpected results from the coxph function when using
> weights from counter-matching. For example, the following code
> produces a parameter estimate of -1.59 where I expect 0.63:
You can get the answer you want with
coxph(Surv(pseudotime, cc)~x+strata(riskset)+offset(log(wt)), data=d2, robust=TRUE)
which is how countermatching usually seems to be done, and is what the
original paper by Langholz & Borgan recommends.
I think it's right that weight=wt doesn't do the same thing. The weights
are not simple inverse-probability sampling weights, because the sampling
units in this design are pairs, not individuals. If we assume the
distribution of x is the same across risk sets (which looks approximately
true in your data) then the sampling weight for a pair is proportional to
the number of eligible controls: ie, just your control weight. Using
these as weights for the pairs in the weight= argument I get 0.585 as the
hazard ratio, reasonably close to your 0.63 given that this is a different
estimator and given the assumptions.
-thomas
> d2 = structure(list(x = c(1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1,
> 1, 0, 0, 1, 0, 1, 0, 1, 0, 1), wt = c(5, 42, 40, 4, 43, 4, 42,
> 4, 44, 5, 38, 4, 39, 4, 4, 37, 40, 4, 44, 5, 45, 5, 44, 5), riskset = c(1L,
> 1L, 4L, 4L, 6L, 6L, 12L, 12L, 13L, 13L, 19L, 19L, 23L, 23L, 31L,
> 31L, 42L, 42L, 45L, 45L, 70L, 70L, 93L, 93L), cc = c(1, 0, 1,
> 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0
> ), pseudotime = rep(1,24)), .Names = c("x", "wt", "riskset",
> "cc", "pseudotime"), class = "data.frame", row.names=1:24)
>
> coxph( Surv(pseudotime, cc) ~ x + strata(riskset), weights=wt,
> robust=T, method="breslow",data=d2)
>
> I'm expecting a value of about 0.63 to 0.64 based on the data source
> (simulated) and the following hand-coded MLE:
>
> negloglik = function(beta,dat) {
> dat$wexb = dat$wt * exp(dat$x * beta)
> agged = aggregate(dat$wexb,list(riskset=dat$riskset),sum)
> names(agged)[2] = "denom"
> dat = merge(dat[dat$cc==1,],agged,by="riskset")
> -sum(log(dat$wexb)-log(dat$denom))
> }
> nlm(negloglik,0,hessian=T,dat=d2)
>
> Am I misunderstanding the meaning of case weights in the coxph
> function? The help file is pretty terse.
>
> Scott Bartell, PhD
> Assistant Professor
> Department of Epidemiology
> University of California, Irvine
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Thomas Lumley Assoc. Professor, Biostatistics
tlumley at u.washington.edu University of Washington, Seattle
More information about the R-help
mailing list