[R-sig-ME] Modelling count data in glmer with an apriori model selection

Tue Apr 18 03:03:31 CEST 2017

On 17-04-17 08:51 PM, Lorraine Scotson wrote:
> Hi All,
> 
> I am modeling bear distribution in Lao PDR, with sign count data collected
> on transects, in glmer, using a degrees of freedom spending, apriori
> modeling approach. I have calculated the number of degrees of freedom my
> model can afford based on my effective sample size (i.e. number of line
> transects), with degrees of freedoms calculated as the number of
> non-intercept model-generated coefficients to be estimated. I have study
> site as a random effect (n=7).

  Out of curiosity, how many df *can* you afford (how many line transects)?

> My objectives are to model bear occurrence as a function of covariates, to
> rank those covariates in order of importance, and predict the distribution
> of bears throughout the whole country (i.e extrapolate outside study
> sites). This is my first experience with an apriori modelling strategy, and
> i have a number of questions for which i have not found answers in the
> published literature. I would be grateful for any advice anyone may have:
> 
> - how many degrees of freedom will including a 7-level random effect incur?

   If you don't allow for variation in covariate effect across sites, 1.
   If you allow for (correlated) variation in n covariate effects across
sites, n*(n+1)/2.  (The number of levels of the random effect does not
affect this conclusion, although 7 sites is small for using a random
effect - you might end up with a singular model, and have to decide what
to do about it).

> - My understanding is that i must pick my probability distribution (i.e.
> Poisson, Neg Bin) apriori, and so i cannot use the usual post model checks
> to determine is my chosen distribution was appropriate. Is this correct?

  You should choose your probability distribution a priori, but you
*can* (and should) use post-fitting checks (scale-location, Q-Q,
overdispersion analysis, etc.) to see if there are any big problems with
your choice.
> 
> - My understanding is that i'll be penalized an extra degree of freedom by
> using a Negative Binomial distribution. Is this correct?

  Yes.  But this is a case where "saving" a degree of freedom wouldn't
be wise.

> 
> - How do i decide between using a Poisson or a Negative binomial
> distribution?  Is there some post hoc checks i can do, without exploring
> the relationship between the response and the predictors, to inform my
> decision?

  Yes.  Check for overdispersion.

> 
> (The literature tells me that count data are rarely Poisson distributed,
> and that Negative binomial is the most common distribution that accounts
> for over dispersion. I have ruled out zero-inflation; my response has
> plenty of zero's, but i feel they they will be accounted for by the model
> covariates).
> 
> - In the context of my study objectives, what are the consequences of using
> a Poisson distribution when my data are really Negative Binomial (i.e. does
> the distribution of the residuals of the response really matter?)?

  If your data are overdispersed (variance greater than expected from
Poisson), you will be in big trouble -- all of your conclusions
(p-values, confidence intervals) will be overconfident.

  I would recommend http://bbolker.github.io/mixedmodels-misc/ ,
especially "GLMM FAQ" and "supplementary materials for Bolker (2015)",
both of which have sections on overdispersion.

  It would be possible to use a "quasi-likelihood approach" -- correct
your estimated confidence intervals and p-values (as well as AICs etc.)
for overdispersion, without explicitly using an overdispersed distribution.

> 
> Many thanks in advance for any insights you can offer.
> 
> Best wishes
> Lorraine
>