[R-sig-ME] glmmADMB fails to fit poisson data

Sat Jun 11 04:24:37 CEST 2016

I’d missed the ZINB bit.  There’s an issue whether the amount of
zero inflation is of interest in itself, or whether it is an annoyance
that the model needs to circumvent.  The mechanism that is
generating extra zeros (or, for that matter, deflating the zero counts)
may in any case require separate modeling,

John Maindonald             email: john.maindonald at anu.edu.au

> On 11/06/2016, at 14:07, Ben Bolker <bbolker at gmail.com> wrote:
> 
> 
>  I think that's very sensible, but note that if the OP thinks they need
> zero-inflation (they said that's why they came to glmmADMB), then they
> will probably need to go slightly beyond standard linear models ...
> perhaps a two-stage (zero vs non-zero logistic model + linear model on
> log(x) of positive cases)
> 
>  cheers
>   Ben Bolker
> 
> 
> On 16-06-10 09:56 PM, John Maindonald wrote:
>> With count data where the mean is large, there’s a strong case to
>> be made for finding a suitable power transform of (count+1) and
>> using a normal theory model.  If differences in residual variance
>> are evident, these can be accommodated by assigning weights.
>> Functions for getting diagnostics are better developed for the
>> normal theory models, and easier to interpret.
>> 
>> For the analysis of RNA-Seq data, there is a paper that compares
>> the two styles of model:
>> 
>> Law, CW, Chen, Y, Shi, W, Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.http://genomebiology.com/2014/15/2/R29
>> 
>> NB also the recent paper:
>> 
>> Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson GG, Owen-Hughes T, Blaxter M, Barton GJ (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA http://www.rnajournal.org/cgi/doi/10.1261/rna.053959.115
>> 
>> The log(count+1) approach is the best, or close to the best, in the field.
>> 
>> John Maindonald             email: john.maindonald at anu.edu.au
>> 
>>> On 11/06/2016, at 12:29, Ben Bolker <bbolker at gmail.com> wrote:
>>> 
>>> 
>>> * I can appreciate your frustration that glmmADMB doesn't work
>>> out of the box on what seems initially to be a simple problem,
>>> and I appreciate that you are being reasonably diplomatic about
>>> it (not everyone is), but ... it is surprisingly hard to make
>>> a general-purpose solve that works for every data set. glm()
>>> works so well because it uses special-purpose algorithms for
>>> a particular class of models that's narrower than what glmmADMB
>>> can do.
>>> 
>>> * The particular problem is that for Poisson responses with
>>> very large means, the coefficient of variation of the data
>>> also becomes very small.  That means that values that are a little
>>> bit off the mean become very unlikely, so unlikely that they
>>> underflow (the probabilities become numerically zero).  I can
>>> get your example to work if I set the starting value close enough:
>>> 
>>> set.seed(101)
>>> d <- data.frame(x=rpois(100, lambda=8000))
>>> 
>>> library(glmmADMB)
>>> glmmadmb(x~1, family="poisson", data=d)
>>> glmmadmb(x~1, family="poisson", data=d,
>>>        start=list(fixed=log(7500)))  ## works
>>> 
>>> ... it does, however, fail with fixed=log(7000).
>>> 
>>> * the "mysterious" poiss_prob_bound (which *is* documented in
>>> ?admbControl, so it may be obscure but it's not completely mysterious ...)
>>> was added to fix a problem that another user was having with
>>> another data set ... if you
>>> 
>>> file.show(system.file("tpl","glmmadmb.tpl",package="glmmADMB"))
>>> 
>>> and dig around, you'll find that glmmADMB bumps up the probability of
>>> a Poisson by 1e-6 or 1e-10 (in different optimization phases) to prevent
>>> a value of exactly zero. There *is* probably a better way to do this -
>>> there are approximations of the Poisson that work very well in such
>>> small ranges, and they could be used instead - but there is also
>>> a reasonably unending number of such edge cases, and not that many
>>> person-hours to deal with them.
>>> 
>>> * You could try the glmmTMB package, which is a more modern
>>> eventually-replacement for glmmADMB, and which seems
>>> to work on this particular problem -- but be warned that it's
>>> in development, so you may find holes!
>>> 
>>> devtools::install_github("glmmTMB/glmmTMB",sub="glmmTMB")
>>> library(glmmTMB)
>>> glmmTMB(x~1, family="poisson", data=d)
>>> 
>>> * if you have a reproducible example of 'dramatic stop that
>>> does not reach the top level and so cannot be caught with
>>> tryCatch' that would be useful.  Maybe we can help resolve
>>> that (more general) problem.
>>> 
>>> * Andrew Gelman has stated a principle called the "folk theorem of
>>> statistical computing": When you have computational problems, often
>>> there’s a problem with your model.
>>> http://andrewgelman.com/2008/05/13/the_folk_theore/
>>> In this case, I think the issue is that in real life (unless you're
>>> in a field like particle physics), Poisson models are generally
>>> not great models for processes with large means -- the coefficient
>>> of variation is way too small to be realistic. This works with
>>> a negative binomial ...
>>> 
>>> set.seed(101)
>>> d2 <- data.frame(x=rnbinom(100, mu=8000, size=2))
>>> glmmadmb(x~1,data=d2,family="nbinom")
>>> 
>>> (I can appreciate that you may have started with a neg binom
>>> and come to the Poisson in part as a way of trying to simplify
>>> your problem so you could diagnose it ...)
>>> 
>>> cheers
>>>   Ben Bolker
>>> 
>>>> While this is a “solution”, I would expect this kind of fit to work
>>>> “off-the-box” (it certainly does with the standard glm function). Can
>>>> someone comment on this.  I am not an expert on GLMs and needed
>>>> something that would fit ZINB + random effects (admittedly the deep
>>>> end of GLMs) and was excited to find glmmADMB, but using it has proven
>>>> rather frustrating because it often fails (with a dramatic stop that
>>>> does not reach the top level and so cannot be caught with tryCatch,
>>>> nonetheless) and after I found the above example, I am not so sure it
>>>> is always for a good reason.
>>> 
>>>> Thanks,
>>>> Vladimir
>>>> 
>>>> 
>>>> 
>>>> Session:
>>>> 
>>>>> sessionInfo()
>>> 
>>> 
>>> On 16-06-09 07:48 PM, Vladimir Trifonov wrote:
>>>> Hi,
>>>> 
>>>> Why is the following  “trivial” model fit failing? After all, we are
>>>> fitting a Poisson GLM on a Poisson data.
>>>> 
>>>> --------
>>>>> glmmadmb(rpois(100, lambda=8000)~1, family="poisson")
>>>> Hessian is 0 in row 1
>>>> This means that the derivative if probably identically 0  for this
>>> parameter
>>>> Error in matrix inverse -- matrix singular in inv(dmatrix)
>>>> Estimated covariance matrix may not be positive definite
>>>> 1e+20
>>>> Estimated covariance matrix may not be positive definite
>>>> 1e+20
>>>> 
>>>> GLMM's in R powered by AD Model Builder:
>>>> 
>>>> Family: poisson
>>>> link = log
>>>> 
>>>> I tried varying the number of samples (100 above), the mean (8000
>>> above) and this fails. Here is another attempt (with a different
>>> kind of fail)
>>> 
>>>>> glmmadmb(rpois(100, lambda=50)~1, family="poisson")
>>>> Parameters were estimated, but standard errors were not: the most
>>> likely problem is that the curvature at MLE was zero or negative
>>>> Error in glmmadmb(rpois(100, lambda = 50) ~ 1, family = "poisson") :
>>>> The function maximizer failed (couldn't find parameter file)
>>> Troubleshooting steps include (1) run with 'save.dir' set and inspect
>>> output files; (2) change run parameters: see '?admbControl';(3) re-run
>>> with debug=TRUE for more information on failure mode
>>>> In addition: Warning message:
>>>> running command './glmmadmb -maxfn 500 -maxph 5 -noinit -shess' had
>>> status 1
>>>> --------
>>> 
>>>> After some trial and error, the only thing I found that “works” (with
>>> a warning about the covariance matrix) consistently is to set a
>>> “mysterious” poiss_prob_bound = FALSE
>>>> 
>>>> --------
>>>>> glmmadmb(rpois(100, lambda=50)~1, family="poisson",
>>> admb.opts=admbControl(poiss_prob_bound=FALSE))
>>>> Estimated covariance matrix may not be positive definite
>>>> 0.020141
>>>> Estimated covariance matrix may not be positive definite
>>>> 0.020141
>>>> 
>>>> GLMM's in R powered by AD Model Builder:
>>>> 
>>>> Family: poisson
>>>> link = log
>>>> 
>>>> Fixed effects:
>>>> Log-likelihood: -327.749
>>>> AIC: 657.498
>>>> Formula: rpois(100, lambda = 50) ~ 1
>>>> (Intercept)
>>>>  3.904998
>>>> 
>>>> Number of observations: total=100
>>>> --------
>>>> 
>>> 
>>> _______________________________________________
>>> R-sig-mixed-models at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>>