[R] Outlier Problem in Survreg Function

Vipul Agarwal iitkvipul at gmail.com
Sun Jul 25 08:24:40 CEST 2010


Hi Everyone,
I have recently started using r and working on survival analysis using the
function survreg.
I am facing a trange problem. One of the covariates in my analysis has
outliers because of which survreg is giving incorrect results. Howevere when
I am removing the outliers or scaling down the values of the covariate by a
factor of 2 it is giving correct results. Below is a ditribution of the
ariable and the results 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0   30000   54500   95450  123000 1650000 

Survreg Resuts

survreg(formula = Surv(TIME_TO_FAILURE, CENSOR_DEFAULT) ~ ADVANCE, 
    data = data)

Coefficients:
(Intercept)     ADVANCE 
   0.000000   -6.385336 

Scale= 0.9785933 

Loglik(model)= -40227366   Loglik(intercept only)= -914141
        Chisq= -78626451 on 1 degrees of freedom, p= 1 
n=198099 (885 observations deleted due to missingness)

Survreg Results after scaling down the variable by 10 

survreg(formula = Surv(TIME_TO_FAILURE, CENSOR_DEFAULT) ~ ADVANCE_SCALED, 
    data = data)

Coefficients:
   (Intercept) ADVANCE_SCALED 
  4.132962e+00  -2.181577e-05 

Scale= 0.9428758 

Loglik(model)= -909139.4   Loglik(intercept only)= -914141
        Chisq= 10003.19 on 1 degrees of freedom, p= 0 
n=198099 (885 observations deleted due to missingness)

Survreg Results Afte removing the outliers(5% of the obs)

 data <- subset(data,data$ADVANCE <= 200000)
> survreg(Surv(TIME_TO_FAILURE,CENSOR_DEFAULT) ~ ADVANCE , data = data )
Call:
survreg(formula = Surv(TIME_TO_FAILURE, CENSOR_DEFAULT) ~ ADVANCE, 
    data = data)

Coefficients:
  (Intercept)       ADVANCE 
 4.224298e+00 -3.727171e-06 

Scale= 0.9601186 

Loglik(model)= -822521.9   Loglik(intercept only)= -825137.1
        Chisq= 5230.49 on 1 degrees of freedom, p= 0 
n=177332 (444 observations deleted due to missingness)


Please let me know if someone else has faced the same problem and what is
the way around to deal with it ? Should I scale down the variable or remove
the outliers? 
-- 
View this message in context: http://r.789695.n4.nabble.com/Outlier-Problem-in-Survreg-Function-tp2301422p2301422.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list