[R-sig-ME] zero-truncated negative binomial distribution

Fri Nov 3 23:04:36 CET 2017

   I assume you have multiple observations per individual (an
observation-level random effect wouldn't make sense with a model like
the [truncated] negative binomial, which includes an estimated
dispersion parameter)?

  How big is your data set overall? What is summary(count) for your data
(e.g. min/max, 10% and 90% quantiles, mean, std dev) ?  (The marginal
distribution is less important than the conditional distribution, but
getting information about the conditional distribution is more difficult.)

   Transforming data and fitting with a linear model is always a
reasonable alternative if you can find a distribution that makes the
(conditional) distributions approximately normal (and homoscedastic).

  What is your evidence of "a hard time"?  Warning/error messages?

  How important is the zero-truncation?  Do you have a lot of small
counts (1,2,3) in addition to your extremely large values?

  Other more heavy-tailed distributions do exist (e.g.
https://en.wikipedia.org/wiki/Beta_negative_binomial_distribution ) but
not yet implemented in glmmTMB (and we'd have to implement both the BNB
and its zero-truncated version).  I think they'd likely be overkill.

On 17-11-03 05:07 PM, Alice Domalik wrote:
> Hi all, 
> 
> I am fitting mixed effects models using the package glmmTMB to investigate habitat use. 
> My data does not contain any zeros, so I have considered the zero-truncated poisson and the zero-truncated negative binomial. 
> Of these two distributions, the zt negative binomial was better, so I tried fitting my model: 
> 
> m1<-glmmTMB(count~waterdepth + temperature + chl.conc + (1|individual), family=list(family="truncated_nbinom1", link="log"), data=mydata) 
> 
> However, it is clear that the model is having a hard time fitting my very high response values (the distribution of my response variable has a very long tail). 
> The QQplot also shows the high 'count' values being above the QQline. 
> 
> What are my options for improving model fit? Are there any distributions that might be better? Is it permissible to transform my response variable (eg. sqrt or log)? 
> 
> Any suggestions are greatly appreciated. 
> 
> -Ally 
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>