[R-sig-ME] zero-truncated negative binomial distribution

Highland Statistics Ltd highstat at highstat.com
Sat Nov 4 10:09:14 CET 2017


Ally,

Did you not count zeros...or is it not possible to observe zeros for 
your data? If theoretically you can observe zeros, but by chance you 
didn't observe them then you better stick to an ordinary distribution.

If theoretically you cannot get them (e.g. numbers of eggs in a birds 
nest...it is always >0), then a zero-truncated distribution is a better 
option. But if your data is relative far away from 0 then you could 
decide to stick to an ordinary (e.g. NB) distribution.

If you have very high values for your response variable....and if a 
covariate cannot explain that, then you could also consider NB-p models. 
In such a model you use:

E(Y) = mu

var(Y) = mu + mu^p / theta

and p is estimated. (In an ordinary NB p = 2).

Apologies for self-citing here....but we apply them in Chapter 5 of our 
Beginner's Guide to GAMM with R (2014). Unfortunately, this does mean 
that you have to use MCMC.

Instead of looking at QQ-plots I suggest that you also simulate data 
from your model and see whether it produces similar values (especially 
the large values) as your observed data.

Kind regards,

Alain



Hi all,

I am fitting mixed effects models using the package glmmTMB to 
investigate habitat use.
My data does not contain any zeros, so I have considered the 
zero-truncated poisson and the zero-truncated negative binomial.
Of these two distributions, the zt negative binomial was better, so I 
tried fitting my model:

m1<-glmmTMB(count~waterdepth + temperature + chl.conc + (1|individual), 
family=list(family="truncated_nbinom1", link="log"), data=mydata)

However, it is clear that the model is having a hard time fitting my 
very high response values (the distribution of my response variable has 
a very long tail).
The QQplot also shows the high 'count' values being above the QQline.

What are my options for improving model fit? Are there any distributions 
that might be better? Is it permissible to transform my response 
variable (eg. sqrt or log)?

Any suggestions are greatly appreciated.

-Ally


-- 

Dr. Alain F. Zuur
Highland Statistics Ltd.
9 St Clair Wynd
AB41 6DZ Newburgh, UK
Email: highstat at highstat.com
URL:   www.highstat.com

And:
NIOZ Royal Netherlands Institute for Sea Research,
Department of Coastal Systems, and Utrecht University,
P.O. Box 59, 1790 AB Den Burg,
Texel, The Netherlands



Author of:
1. Beginner's Guide to Spatial, Temporal and Spatial-Temporal Ecological Data Analysis with R-INLA. (2017).
2. Beginner's Guide to Zero-Inflated Models with R (2016).
3. Beginner's Guide to Data Exploration and Visualisation with R (2015).
4. Beginner's Guide to GAMM with R (2014).
5. Beginner's Guide to GLM and GLMM with R (2013).
6. Beginner's Guide to GAM with R (2012).
7. Zero Inflated Models and GLMM with R (2012).
8. A Beginner's Guide to R (2009).
9. Mixed effects models and extensions in ecology with R (2009).
10. Analysing Ecological Data (2007).



More information about the R-sig-mixed-models mailing list