[R] comparing lm(), survreg( ... , dist="gaussian") and survreg(... , dist="lognormal")

Tue May 3 15:20:40 CEST 2005

Thank you, Professor Ripley.

cbind(log(pr1$fit) - 1.96*pr1$se.fit/pr1$fit, log(pr1$fit) +
1.96*pr1$se.fit/pr1$fit)

... is precisely what had eluded me, self-evident though it appears after
you have illuminated the way.

Again, thank you. 

Charles Annis, P.E.

Charles.Annis at StatisticalEngineering.com
phone: 561-352-9699
eFax:  614-455-3265
http://www.StatisticalEngineering.com

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Prof Brian Ripley
Sent: Tuesday, May 03, 2005 2:55 AM
To: Charles Annis, P.E.
Cc: R-help at stat.math.ethz.ch
Subject: Re: [R] comparing lm(), survreg( ... , dist="gaussian") and
survreg(... , dist="lognormal")

On Mon, 2 May 2005, Charles Annis, P.E. wrote:

> I have tried everything I can think of and hope not to appear too foolish
> when my error is pointed out to me.
>
> I have some real data (18 points) that look linear on a log-log plot so I
> used them for a comparison of lm() and survreg.  There are no suspensions.
>
> survreg.df <- data.frame(Cycles=c(2009000, 577000, 145000, 376000, 37000,
> 979000, 17420000, 71065000, 46397000, 70168000, 69120000, 68798000,
> 72615000, 133051000, 38384000, 15204000, 1558000, 14181000), stress=c(90,
> 100, 110, 90, 100, 80, 70, 60, 56, 62, 62, 59, 56, 53, 59, 70, 90, 70),
> event=rep(1, 18))
>
>
> sN.lm<- lm(log(Cycles) ~ log10(stress), data=survreg.df)
>
> and
>                                             vvvvvvvvvvv
> gaussian.survreg<- survreg(formula=Surv(time=log(Cycles), event) ~
> log10(stress), dist="gaussian", data=survreg.df)
>
> produce identical parameter estimates and differ slightly in the residual
> standard error and scale, which is accounted for by scale being the MLE
and
> thus biased.  Correcting by sqrt(18/16) produces agreement.  Using
predict()
> for the lm, and predict.survreg() for the survreg model and correcting for
> the differences in stdev, produces identical plots of the fit and the
upper
> and lower confidence intervals.  All of this is as it should be.

I trust you called predict() on both and let R choose the method.

> And,
>                                               vvvvvv
> lognormal.survreg<- survreg(formula=Surv(time=(Cycles), event) ~
> log10(stress), dist="lognormal", data=survreg.df)
>
> produces summary() results that are identical to the earlier call to
> survreg(), except for the call, of course.  The parameter estimates and SE
> are identical.  Again this is as I would expect it.
>
> But since the call uses Cycles, rather than log(Cycles) predict.survreg()
> returns $fit in Cycles units, rather than logs, and of course the fits are
> identical when plotted on a log-log grid and also agree with lm()
>
> Here is the fly in the ointment:  The upper and lower confidence
intervals,
> based on the $se.fit for the dist="lognormal" are quite obviously
different
> from the other two methods, and although I have tried everything I could
> imagine I cannot reconcile the differences.

How did you do this?  (BTW, I assume you mean upper and lower confidence 
>limits< for the predicted means.)  For the predictions and standard 
errors are (or should be) on the response scale, a non-linear function of 
the parameters.  In that case it is normal to form confidence limits on 
the linear predictor scale and transform.

> I believe that the confidence bounds for both models should agree.  After
> all, both calls to survreg() produce identical parameter estimates.

They will, if computed on the same basis.  On log-scale (to avoid large 
numbers)

pr1 <- predict(lognormal.survreg, se.fit=T)
log(cbind(pr1$fit - 1.96*pr1$se.fit, pr1$fit + 1.96*pr1$se.fit))
pr2 <- predict(gaussian.survreg, se.fit=T)
cbind(pr2$fit - 1.96*pr2$se.fit, pr2$fit + 1.96*pr2$se.fit)

are really pretty close.  The main difference is a slight shift, which 
comes about because the mean of a log(X) is not log(mean(X)).  Note that 
the second set at the preferred ones.  Transforming to log scale before 
making the confidence limits:

cbind(log(pr1$fit) - 1.96*pr1$se.fit/pr1$fit, log(pr1$fit) +
1.96*pr1$se.fit/pr1$fit)

does give identical answers.

Consider care is needed in interpreting what predict() is actually 
predicting in non-linear models.  For both glm() and survreg() it is 
closer to the median of the uncertainty in the predictions than to the 
mean.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html