[R] survreg with measurement uncertainties

Thu Jun 13 14:46:26 CEST 2013

It seems a line through the origin doesn't fit the data very well.  That may be throwing off the fitting routine.

data = c(144.53, 1687.68, 5397.91)
err = c(8.32, 471.22, 796.67)
model = c(71.60, 859.23, 1699.19)
id = c(1, 2, 3)

# display
plot(data ~ model)
library("Hmisc")
errbar(model, add=TRUE, y = data, yplus= data+err, yminus=data-err, errbar.col=2)
abline(lm(data ~ model - 1))
abline(lm(data ~ model - 1, weights = 1/err^2), col=2)

Making err[1] larger, allows convergence:

err[1] <- 80.32

survreg(Surv(time = data)~model-1+cluster(id), weights=1/(err^2), dist='gaussian', init=c(2.1))

Call:
survreg(formula = Surv(time = data) ~ model - 1 + cluster(id), 
    weights = 1/(err^2), dist = "gaussian", init = c(2.1))

Coefficients:
 model 
2.6055 

Scale= 139.299 

Loglik(model)= 0   Loglik(intercept only)= 0
n= 3

And this matches the standard linear model call:

lm(data ~ model - 1, weights = 1/err^2)

Call:
lm(formula = data ~ model - 1, weights = 1/err^2)

Coefficients:
model  
2.606  

-----Original Message-----
From: Kyle Penner [mailto:kpenner at as.arizona.edu] 
Sent: Wednesday, June 12, 2013 3:49 PM
To: Terry Therneau
Cc: r-help at r-project.org
Subject: Re: [R] survreg with measurement uncertainties

Hi Terry,

Thanks for your quick reply.  I am talking about uncertainty in the response.  I have 2 follow up questions:

1) my understanding from the documentation is that 'id' in cluster(id) should be the same when the predictors are not independent.  Is this correct?  (To be more concrete: my data are brightnesses at different wavelengths.  Each brightness is an independent measurement, so the elements of id should all be different?)

2) I tested survreg with uncertainties on an example where I already know the answer (and where I am not using limits), and it does not converge.  Below is the code I used, does anything jump out as incorrect?

data = c(144.53, 1687.68, 5397.91)
err = c(8.32, 471.22, 796.67)
model = c(71.60, 859.23, 1699.19)
id = c(1, 2, 3)

This works (2.9 is the answer from simple chi_sq fitting):

survreg(Surv(time = data, event = c(1,1,1))~model-1, dist='gaussian',
init=c(2.9))

This does not converge (2.1 is the answer from chi_sq fitting):

survreg(Surv(time = data, event = c(1,1,1))~model-1+cluster(id), weights=1/(err^2), dist='gaussian', init=c(2.1))

And this does, but the answer it returns is wonky:

data[2] = 3*err[2] # data[2] is very close to 3*err[2] already survreg(Surv(time = data, event = c(1,2,1))~model-1+cluster(id), weights=1/(err^2), dist='gaussian', init=c(2.1))

Thanks,

Kyle

On Wed, Jun 12, 2013 at 6:51 AM, Terry Therneau <therneau at mayo.edu> wrote:
> I will assume that you are talking about uncertainty in the response.  
> Then one simple way to fit the model is to use case weights that are 
> proprional to 1/variance, along with +cluster(id) in the model 
> statement to get a correct variance for this case.  In linear models 
> this would be called the "White" or "Horvitz-Thompsen" or "GEE working 
> independence" variance estimate, depending on which literature you 
> happen to be reading (economics, survey sampling, or biostat).
>
> Now if you are talking about errors in the predictor variables, that 
> is a much harder problem.
>
> Terry Therneau
>
>
>
> On 06/12/2013 05:00 AM, Kyle Penner wrote:
>>
>> Hello,
>>
>> I have some measurements that I am trying to fit a model to.  I also 
>> have uncertainties for these measurements.  Some of the measurements 
>> are not well detected, so I'd like to use a limit instead of the 
>> actual measurement.  (I am always dealing with upper limits, i.e. 
>> left censored data.)
>>
>> I have successfully run survreg using the combination of well 
>> detected measurements and limits, but I would like to include the 
>> measurement uncertainty (for the well detected measurements) in the 
>> fitting.  As far as I can tell, survreg doesn't support this.  Does 
>> anyone have a suggestion for how to accomplish this?
>>
>> Thanks,
>>
>> Kyle

**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues