[R] Power calculation for survival analysis

Marc Schwartz marc_schwartz at me.com
Wed Sep 21 19:20:12 CEST 2011

On Sep 21, 2011, at 8:54 AM, Duke wrote:

> useR's,
> I am trying to do a power calculation for a survival analysis using a
> logrank test and I need some help properly doing this in R.  Here is the
> information that I know:
> - I have 2 groups, namely HG and LG
> - Retrospective analysis with subjects gathered from archival data over 20
> years. No new recruitment of subjects and no estimated time to target
> accrual and accrual rate.
> - Survival measured in both groups at 1 year, 3 years, 5 years.
> - Assume 50% survival for LG and 30% survival for HG at 5 years.
> - Assume a 6 month difference in overall survival to be statistically
> significant.
> - Total sample size is ~ N=500 with 15% of subjects comprising the LG group;
> 85% make up the HG group.
> The main hypothesis is that HG group has shorter overall survival than LG
> group.
> Can someone please help me out with how to properly calculate the power for
> such a situation using R? This is new to me.
> Thanks,
> D  

Short answer, look at the cpower() function in Frank's Hmisc package on CRAN.

Longer answer:

Have you already performed the data collection and analysis? If so, then performing a post hoc power calculation is highly problematic. Do a Google search on "post hoc power" and you will find a myriad of resources/citations.

Given the sizable differences in the two samples and that this is a retrospective analysis, you are almost certainly going to have selection bias issues to deal with in comparing the two groups, since presumably they were not prospectively randomized to group, even with the ratio indicated. 

Is the "HG" group High Grade Lymphoma and the LG group Low Grade Lymphoma? That would help to explain some of the issues here, since you have two groups with differing diagnoses, differing baseline characteristics and known material differences in prognosis.

With a retrospective analysis over this time frame, loss to follow up (LTFU) is likely to be another issue, impacting your available data over time, especially if there is a bias in LTFU between the groups. LTFU is hard enough to manage in a prospective study.

Using your numbers, you also have the potential for temporal issues impacting your comparison. If you are looking out to 5 years and the data was collected over a 20 year time frame, that suggests a possible 15 year difference between your first patient Time 0 and your last patient Time 0. What changes in patient and/or treatment profiles occurred over time that might impact your findings? Were the two groups treated concurrently or is there a stagger of some time window? Are the patients a consecutive series in each group or is there other selection bias involved as to why one patient is in the study and another is not.

If you are not comfortable with these issues, you have a lot of resources at Duke (eg. DCRI) with some very experienced folks there.


Marc Schwartz

More information about the R-help mailing list