[R] Interval censored Data in survreg() with zero values!
Geraldine Henningsen
ghenningsen at email.uni-kiel.de
Mon Jan 12 17:29:24 CET 2009
Hello again,
I studied your suggestion but still I disagree. You wrote:
"From the way you wrote the problem I assumed
that there is some number of n "looks" at the subject and then you count them
up."
But this is not the case. My data is clearly continuous quantities and no discrete choices. I know nothing about the underlying choice process, the only thing I know is the final share of one of three regimes. So sorry for the bad description of the problem.
So I stick with my censored data model. Still the hint about the p-values is very helpful because I actually ran into this problem. So thank you for the hint.
Best, Geraldine
Terry Therneau schrieb:
> Apologies -- you are being more subtle than I thought. Nevertheless, I think
> that the censoring language isn't quite right.
>
> You are thinking of a hierarchical model:
>
> z ~ N(Xb, sigma), where Xb is the linear predictor, whatever covariates you
> think belong in the model. Whether the distribution should be Gaussian or
> somthing else depends not on the overall distribution of z, but on distribution
> of (z | Xb). We could have a skewed predictor leading to skewed z, even if the
> distribution about any given expectation is symmetric.
>
> y = F(z) is what you observe. The classic tobin model is y= max(0,z), which
> does lead to censored data.
>
> In your case y_i = Binomial(n_i, p_i = H(z)). Note a binomial is k heads
> out of n tries with a coin of probability p, a "Bernouli" is a binomial
> restricted to a single coin flip. From the way you wrote the problem I assumed
> that there is some number of n "looks" at the subject and then you count them
> up. Note that var(y) = n p (1-p)
>
> H describes how the probability changes with z. In biology we very rarely
> use H(z)= max(min(z,1),0) because it gives a hard threshold, and the probability
> of nearly anything doesn't go all the way to zero or one.
>
> If H were as above and
> var(y) = constant and
> n is sufficiently large so that Binomial dist is approx Gaussian and
> var(y |p) << var(z| Xb)
>
> then your y will fit a censored Gaussian. Since at least the second is false,
> it doesn't.
>
> A censored model may still be an ok first cut at fitting the data, but I
> would be suspicious of variance estimates and particularly of any p-values. The
> bootstrap could help that.
>
> Terry T.
>
>
>
>
More information about the R-help
mailing list