[R] Incorrect 'n' returned by survfit()

Sat Oct 28 19:28:35 CEST 2006

On Wed, 25 Oct 2006, yongchuan wrote:

> I've a data set with 60000 rows of data representing 6000+ distinct loans. I did a coxph() regression on it (see call below), but a subsequent survfit() call on the coxph object is almost certainly wrong. It gives n=6 when it should be
> more like 6000+ (I think)
>
>> survfit(resultag)
> Call: survfit.coxph(object = resultag)
>
>      n  events  median 0.95LCL 0.95UCL
>      6     489     Inf       2     Inf
>
> When I reduced the dataset to just 1000 rows, the survfit()
> call on the coxph object looks more correct.
>
>> survfit(resulting)
> Call: survfit.coxph(object = resulting)
>
>      n  events  median 0.95LCL 0.95UCL
>    115      15     Inf     Inf     Inf
>
> Is there a limit to the size of the data set that I read in?
> Or am I just doing something silly above?
>
> (this is the coxph regression:
> resultag <- coxph(Surv(Start,Stop,PrepayDate)~modBalance + closingCoupon+lienPosition +originalFICO,table)
>

You may be misunderstanding the `n` column in the output.  If you read the 
help for print.survfit you will find:
      The "number of observations" is not well-defined for counting
      process data. Previous versions of this code used the number at
      risk at the first time point. This is misleading if many
      individuals enter late or change strata. The original S code for
      the current version uses the number of records, which is
      misleading when the counting process data actually represent a
      fixed cohort with time-dependent covariates.

      Four possibilities are provided, controlled by 'print.n' or by
      'options(survfit.print.n)': '"none"' prints 'NA', '"records"'
      prints the number of records, '"start"' prints the number at the
      first time point and '"max"' prints the maximum number at risk.
      The initial default is '"start"'.

 	-thomas

Thomas Lumley			Assoc. Professor, Biostatistics
tlumley at u.washington.edu	University of Washington, Seattle