[R] Observational data questions <not S-language question>

Rob Balshaw Rob.Balshaw at syreon.com
Sat Mar 29 00:58:17 CET 2003

< This is not an S-language question, but I hoped it would be of at least
passing interest to some members of the group. >

I've encountered a situation which I'm sure is familiar to many.

We're looking at an observational dataset with data from many thousands of
patients.  (So many patients, I won't bother to discuss the observed
significance levels of our results.  Everything is significant.)

One of our predictive factors of interest is Treatment (Trt A vs Trt B).
There are several covariates measured on the patients at the time of entry
in the study (say, X1 and X2).  The outcome of interest is time to death.
Some patients will develop a disease prior to death, and it is thought that
Disease is an important risk factor for death.

The develpment of Disease has been linked to the use of Treatment B.
Covariate X1 is also thought to predict Disease.  Covariates X1 and X2 are
thought to influence the risk of death but may also influence the choice of

All in all, a pretty standard observational study scenario.

Now we conduct a proportional hazards regression analysis for time to death,
with Trt, X1 and X2 as covariates.  Using this model, we find that Trt A has
a hazard ratio considerably less than 1.  Treatment A appears to reduce the
risk of death after adjusting for differences in the observed covariates X1
and X2.

Next we include Disease as a time dependent covariate.  Under this model,
Trt A has a hazard ratio considerably greater than 1, as does Disease.
Thus, Trt A now appears to *increase* the risk of death (after adjusting for
the observed covariates X1 and X2 *and* the development of Disease).

My difficulty arises when I try to explain to clinicians that I do not find
these results contradictory.

The hazard ratio for drug A relative to drug B could easily be 1.2 when we
attempt to 'adjust for' the develpment of Disease.  This addresses a
completely different question than the analysis where Disease is ignored, so
it is quite possible for the answer to appear to be so different.

My questions:

(1) Does my interpretation sound reasonable?  I've had so many clinicians
question me, I'm starting to lose confidence...

(2) Has this phenomenon been explained nicely anywhere?  I'd love to be able
to argue by appeal to authority...

Thanks for any comments or suggestions.  (I'm tempted to build a simulation
of this effect, but I'm not certain the clinicians would be too impressed.)



-- Robert Balshaw, Ph.D.
-- Senior Biostatistician, Syreon Corp.
-- Phone: 604.676.5900x220; Fax: 604.676.5911

More information about the R-help mailing list