[R] Modelling an "incomplete Poisson" distribution ?

Sat Apr 18 21:12:26 CEST 2009

I forgot to add that yes, I've done my homework, and that it seems to me
that answers pointing to zero-inflated Poisson (and negative binomial)
are irrelevant ; I do not have a mixture of distributions but only part
of one distribution, or, if you'll have it, a "zero-deflated Poisson".

An answer by Brian Ripley
(http://finzi.psych.upenn.edu/R/Rhelp02/archive/11029.html) to a similar
question leaves me a bit dismayed : if it is easy to compute the
probability function of this zero-deflated RV (off the top of my head,
Pr(X=x)=e^-lambda.lambda^x/(x!.(1-e^-lambda))), and if I think that I'm
(still) able to use optim to ML-estimate lambda, using this to
(correctly) model my problem set and test it amounts to re-writing some
(large) part of glm. Furthermore, I'd be a bit embarrassed to test it
cleanly (i. e. justifiably) : out of the top of my head, only the
likelihood ration test seems readily applicable to my problem. Testing
contrasts in my covariates ... hum !

So if someone has written something to that effect, I'd be awfully glad
to use it. A not-so-cursory look at the existing packages did not ring
any bells to my (admittedly untrained) ears...

Of course, I could also bootstrap the damn thing and study the
distribution of my contrasts. I'd still been hard pressed to formally
test hypotheses on factors...

Any ideas ?

					Emmanuel Charpentier

Le samedi 18 avril 2009 à 19:28 +0200, Emmanuel Charpentier a écrit :
> Dear list,
> 
> I have the following problem : I want to model a series of observations
> of a given hospital activity on various days under various conditions.
> among my "outcomes" (dependent variables) is the number of patients for
> which a certain procedure is done. The problem is that, when no relevant
> patient is hospitalized on said day, there is no observation (for which
> the "number of patients" item would be 0). 
> 
> My goal is to model this number of patients as a function of the
> "various conditions" described by my independant variables, mosty of
> them observed but uncontrolled, some of them unobservable (random
> effects). I am tempted to model them along the lines of :
> 
> glm(NoP~X+Y+..., data=MyData, family=poisson(link=log))
> 
> or (accounting for some random effects) :
> 
> lmer(NoP~X+Y....+(X|Center)), data=Mydata, family=poisson(link=log))
> 
> While the preliminary analysis suggest that (the right part of) a
> Poisson distribution might be reasonable for all real observations, the
> lack of observations with count==0 bothers me.
> 
> Is there a way to cajole glm (and lmer, by the way) into modelling these
> data to an "incomplete Poisson" model, i. e. with unobserved "0"
> values ?
> 
> Sincerely,
> 
> 						Emmanuel Charpentier
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>