[R] Variance estimates for survreg vs. lm
Therneau, Terry M., Ph.D.
therneau at mayo.edu
Mon Jul 6 14:30:54 CEST 2015
The difference is that survreg is using a maximum likelihood estimate (MLE) of the
variance and that lm is using the unbiased (MVUE) estimate of variance. For simple linear
regression, the former divides by "n" and the latter by "n-p". The difference in your
variances is exactly n/(n-p) = 10/8.
With censored data the MLE estimate is still clear, but what exactly "n" is is not so
obvious (does a censored datum count as a whole observation?), so a simple (n-p)/n
variance correction is also not obvious. I would not be surprised if someone, somewhere
has hammered out a correction for "unbiased" variance; I've never looked for it. I'm not
convinced that it is worthwhile though.
Terry Therneau
On 07/04/2015 05:00 AM, r-help-request at r-project.org wrote:
> I would like help understanding why a survival regression with no censored
> data-points does not give the same variance estimates as a linear model
> (see code below).
>
> I think it must be something to do with the fact that the variance is an
> actual parameter in the survival version via the log(scale), and possibly
> that different assumptions are made about the distribution of the variance.
> But I really don't know, I'm just guessing.
>
> The reason I ask is because I am moving a process, that has always been
> modelled using a linear model, to a survival model (because there are
> sometimes a few censored data points). In the past, the censored data
> points have been treated as missing which imparts bias. The variance of the
> estimates in this process is key, so I need to know why they are changing
> in this systematic way?!
>
>
>
> library(survival)
>
> ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
> ctl.surv <- Surv(ctl)
>
> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
>
> lmod <- lm (ctl ~ trt )
> smod <- survreg(ctl.surv ~ trt,dist="gaussian")
>
> coef(lmod)
> coef(smod) # same
>
> vcov(lmod)
> vcov(smod) # smod is smaller
>
> diag(vcov(lmod)) /
> diag(vcov(smod))[1:2] # 1.25 == 0.5*(n/(n-1))
>
> ( summary(lmod)$coef [ ,"Std. Error"] /
> summary(smod)$table[1:2,"Std. Error"] )^2 # 1.25 = 0.5*(n/(n-1))
More information about the R-help
mailing list