[R] Interval censored Data in survreg() with zero values!

Tue Jan 6 17:37:38 CET 2009

Terry Therneau schrieb:
> Apologies -- you are being more subtle than I thought.  Nevertheless, I think 
> that the censoring language isn't quite right.
>
>   You are thinking of a hierarchical model:
>   
>     z ~ N(Xb, sigma), where Xb is the linear predictor, whatever covariates you 
> think belong in the model.  Whether the distribution should be Gaussian or 
> somthing else depends not on the overall distribution of z, but on distribution 
> of (z | Xb).  We could have a skewed predictor leading to skewed z, even if the 
> distribution about any given expectation is symmetric.
>     
>     y = F(z) is what you observe.  The classic tobin model is y= max(0,z), which 
> does lead to censored data. 
>     
>     In your case y_i = Binomial(n_i, p_i = H(z)).  Note a binomial is k heads 
> out of n tries with a coin of probability p, a "Bernouli" is a binomial 
> restricted to a single coin flip.  From the way you wrote the problem I assumed 
> that there is some number of n "looks" at the subject and then you count them 
> up.  Note that var(y) = n p (1-p)
>     
>     H describes how the probability changes with z.  In biology we very rarely 
> use H(z)= max(min(z,1),0) because it gives a hard threshold, and the probability 
> of nearly anything doesn't go all the way to zero or one.  
>     
>     If H were as above and 
>     	var(y) = constant and
>     	n is sufficiently large so that Binomial dist is approx Gaussian and
>     	var(y |p) << var(z| Xb)
>
> then your y will fit a censored Gaussian.  Since at least the second is false, 
> it doesn't.  
>
>    A censored model may still be an ok first cut at fitting the data, but I 
> would be suspicious of variance estimates and particularly of any p-values.  The 
> bootstrap could help that.
>    
>    	Terry T.
>    	 
>
>
>   

@ Terry: thank you very much for the extended explanation. I will try
out your suggestion.

Geraldine