[R-sig-ME] zero-truncated negative binomial distribution

Alice Domalik adomalik at sfu.ca
Fri Nov 3 23:21:21 CET 2017


Thank you very much for your reply! 

I have ~1200 observations (and 24 individuals), and summary(count) looks like the following: 

min:1 
max: 351 
mean: 20 
10% quantile: 2 
90% quantile: 44 
std dev: 37.5 

Most of my values are extremely small (1's, 2's). The distribution of my response variable essentially looks like an exponential decay. 

The models run without errors, but the QQ plot suggests that the model is doing a poor job fitting the heavy tail. The high values fall well above the QQ line. 


Thanks again, Alice 

From: "Ben Bolker" <bbolker at gmail.com> 
To: r-sig-mixed-models at r-project.org 
Sent: Friday, November 3, 2017 3:04:36 PM 
Subject: Re: [R-sig-ME] zero-truncated negative binomial distribution 

I assume you have multiple observations per individual (an 
observation-level random effect wouldn't make sense with a model like 
the [truncated] negative binomial, which includes an estimated 
dispersion parameter)? 

How big is your data set overall? What is summary(count) for your data 
(e.g. min/max, 10% and 90% quantiles, mean, std dev) ? (The marginal 
distribution is less important than the conditional distribution, but 
getting information about the conditional distribution is more difficult.) 

Transforming data and fitting with a linear model is always a 
reasonable alternative if you can find a distribution that makes the 
(conditional) distributions approximately normal (and homoscedastic). 

What is your evidence of "a hard time"? Warning/error messages? 

How important is the zero-truncation? Do you have a lot of small 
counts (1,2,3) in addition to your extremely large values? 

Other more heavy-tailed distributions do exist (e.g. 
https://en.wikipedia.org/wiki/Beta_negative_binomial_distribution ) but 
not yet implemented in glmmTMB (and we'd have to implement both the BNB 
and its zero-truncated version). I think they'd likely be overkill. 

On 17-11-03 05:07 PM, Alice Domalik wrote: 
> Hi all, 
> 
> I am fitting mixed effects models using the package glmmTMB to investigate habitat use. 
> My data does not contain any zeros, so I have considered the zero-truncated poisson and the zero-truncated negative binomial. 
> Of these two distributions, the zt negative binomial was better, so I tried fitting my model: 
> 
> m1<-glmmTMB(count~waterdepth + temperature + chl.conc + (1|individual), family=list(family="truncated_nbinom1", link="log"), data=mydata) 
> 
> However, it is clear that the model is having a hard time fitting my very high response values (the distribution of my response variable has a very long tail). 
> The QQplot also shows the high 'count' values being above the QQline. 
> 
> What are my options for improving model fit? Are there any distributions that might be better? Is it permissible to transform my response variable (eg. sqrt or log)? 
> 
> Any suggestions are greatly appreciated. 
> 
> -Ally 
> 
> 
> [[alternative HTML version deleted]] 
> 
> _______________________________________________ 
> R-sig-mixed-models at r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models 
> 

_______________________________________________ 
R-sig-mixed-models at r-project.org mailing list 
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models 

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list