[R-sig-ME] glmmADMB fails to fit poisson data

Sat Jun 11 04:07:04 CEST 2016

  I think that's very sensible, but note that if the OP thinks they need
zero-inflation (they said that's why they came to glmmADMB), then they
will probably need to go slightly beyond standard linear models ...
perhaps a two-stage (zero vs non-zero logistic model + linear model on
log(x) of positive cases)

  cheers
   Ben Bolker

On 16-06-10 09:56 PM, John Maindonald wrote:
> With count data where the mean is large, there’s a strong case to
> be made for finding a suitable power transform of (count+1) and
> using a normal theory model.  If differences in residual variance
> are evident, these can be accommodated by assigning weights.
> Functions for getting diagnostics are better developed for the
> normal theory models, and easier to interpret.
> 
> For the analysis of RNA-Seq data, there is a paper that compares
> the two styles of model:
> 
> Law, CW, Chen, Y, Shi, W, Smyth, GK (2014). Voom: precision weights unlock linear model analysis tools for RNA-seq read counts. Genome Biology 15, R29.http://genomebiology.com/2014/15/2/R29
> 
> NB also the recent paper:
> 
> Schurch NJ, Schofield P, Gierliński M, Cole C, Sherstnev A, Singh V, Wrobel N, Gharbi K, Simpson GG, Owen-Hughes T, Blaxter M, Barton GJ (2016). How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use? RNA http://www.rnajournal.org/cgi/doi/10.1261/rna.053959.115
> 
> The log(count+1) approach is the best, or close to the best, in the field.
> 
> John Maindonald             email: john.maindonald at anu.edu.au
> 
>> On 11/06/2016, at 12:29, Ben Bolker <bbolker at gmail.com> wrote:
>>
>>
>> * I can appreciate your frustration that glmmADMB doesn't work
>> out of the box on what seems initially to be a simple problem,
>> and I appreciate that you are being reasonably diplomatic about
>> it (not everyone is), but ... it is surprisingly hard to make
>> a general-purpose solve that works for every data set. glm()
>> works so well because it uses special-purpose algorithms for
>> a particular class of models that's narrower than what glmmADMB
>> can do.
>>
>> * The particular problem is that for Poisson responses with
>> very large means, the coefficient of variation of the data
>> also becomes very small.  That means that values that are a little
>> bit off the mean become very unlikely, so unlikely that they
>> underflow (the probabilities become numerically zero).  I can
>> get your example to work if I set the starting value close enough:
>>
>> set.seed(101)
>> d <- data.frame(x=rpois(100, lambda=8000))
>>
>> library(glmmADMB)
>> glmmadmb(x~1, family="poisson", data=d)
>> glmmadmb(x~1, family="poisson", data=d,
>>         start=list(fixed=log(7500)))  ## works
>>
>> ... it does, however, fail with fixed=log(7000).
>>
>> * the "mysterious" poiss_prob_bound (which *is* documented in
>> ?admbControl, so it may be obscure but it's not completely mysterious ...)
>> was added to fix a problem that another user was having with
>> another data set ... if you
>>
>> file.show(system.file("tpl","glmmadmb.tpl",package="glmmADMB"))
>>
>> and dig around, you'll find that glmmADMB bumps up the probability of
>> a Poisson by 1e-6 or 1e-10 (in different optimization phases) to prevent
>> a value of exactly zero. There *is* probably a better way to do this -
>> there are approximations of the Poisson that work very well in such
>> small ranges, and they could be used instead - but there is also
>> a reasonably unending number of such edge cases, and not that many
>> person-hours to deal with them.
>>
>> * You could try the glmmTMB package, which is a more modern
>> eventually-replacement for glmmADMB, and which seems
>> to work on this particular problem -- but be warned that it's
>> in development, so you may find holes!
>>
>> devtools::install_github("glmmTMB/glmmTMB",sub="glmmTMB")
>> library(glmmTMB)
>> glmmTMB(x~1, family="poisson", data=d)
>>
>> * if you have a reproducible example of 'dramatic stop that
>> does not reach the top level and so cannot be caught with
>> tryCatch' that would be useful.  Maybe we can help resolve
>> that (more general) problem.
>>
>> * Andrew Gelman has stated a principle called the "folk theorem of
>> statistical computing": When you have computational problems, often
>> there’s a problem with your model.
>> http://andrewgelman.com/2008/05/13/the_folk_theore/
>> In this case, I think the issue is that in real life (unless you're
>> in a field like particle physics), Poisson models are generally
>> not great models for processes with large means -- the coefficient
>> of variation is way too small to be realistic. This works with
>> a negative binomial ...
>>
>> set.seed(101)
>> d2 <- data.frame(x=rnbinom(100, mu=8000, size=2))
>> glmmadmb(x~1,data=d2,family="nbinom")
>>
>> (I can appreciate that you may have started with a neg binom
>> and come to the Poisson in part as a way of trying to simplify
>> your problem so you could diagnose it ...)
>>
>>  cheers
>>    Ben Bolker
>>
>>> While this is a “solution”, I would expect this kind of fit to work
>>> “off-the-box” (it certainly does with the standard glm function). Can
>>> someone comment on this.  I am not an expert on GLMs and needed
>>> something that would fit ZINB + random effects (admittedly the deep
>>> end of GLMs) and was excited to find glmmADMB, but using it has proven
>>> rather frustrating because it often fails (with a dramatic stop that
>>> does not reach the top level and so cannot be caught with tryCatch,
>>> nonetheless) and after I found the above example, I am not so sure it
>>> is always for a good reason.
>>
>>> Thanks,
>>> Vladimir
>>>
>>>
>>>
>>> Session:
>>>
>>>> sessionInfo()
>>
>>
>> On 16-06-09 07:48 PM, Vladimir Trifonov wrote:
>>> Hi,
>>>
>>> Why is the following  “trivial” model fit failing? After all, we are
>>> fitting a Poisson GLM on a Poisson data.
>>>
>>> --------
>>>> glmmadmb(rpois(100, lambda=8000)~1, family="poisson")
>>> Hessian is 0 in row 1
>>> This means that the derivative if probably identically 0  for this
>> parameter
>>> Error in matrix inverse -- matrix singular in inv(dmatrix)
>>> Estimated covariance matrix may not be positive definite
>>> 1e+20
>>> Estimated covariance matrix may not be positive definite
>>> 1e+20
>>>
>>> GLMM's in R powered by AD Model Builder:
>>>
>>>  Family: poisson
>>>  link = log
>>>
>>> I tried varying the number of samples (100 above), the mean (8000
>>  above) and this fails. Here is another attempt (with a different
>>  kind of fail)
>>
>>>> glmmadmb(rpois(100, lambda=50)~1, family="poisson")
>>> Parameters were estimated, but standard errors were not: the most
>> likely problem is that the curvature at MLE was zero or negative
>>> Error in glmmadmb(rpois(100, lambda = 50) ~ 1, family = "poisson") :
>>>  The function maximizer failed (couldn't find parameter file)
>> Troubleshooting steps include (1) run with 'save.dir' set and inspect
>> output files; (2) change run parameters: see '?admbControl';(3) re-run
>> with debug=TRUE for more information on failure mode
>>> In addition: Warning message:
>>> running command './glmmadmb -maxfn 500 -maxph 5 -noinit -shess' had
>> status 1
>>> --------
>>
>>> After some trial and error, the only thing I found that “works” (with
>> a warning about the covariance matrix) consistently is to set a
>> “mysterious” poiss_prob_bound = FALSE
>>>
>>> --------
>>>> glmmadmb(rpois(100, lambda=50)~1, family="poisson",
>> admb.opts=admbControl(poiss_prob_bound=FALSE))
>>> Estimated covariance matrix may not be positive definite
>>> 0.020141
>>> Estimated covariance matrix may not be positive definite
>>> 0.020141
>>>
>>> GLMM's in R powered by AD Model Builder:
>>>
>>>  Family: poisson
>>>  link = log
>>>
>>> Fixed effects:
>>>  Log-likelihood: -327.749
>>>  AIC: 657.498
>>>  Formula: rpois(100, lambda = 50) ~ 1
>>> (Intercept)
>>>   3.904998
>>>
>>> Number of observations: total=100
>>> --------
>>>
>>
>> _______________________________________________
>> R-sig-mixed-models at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>