[R-sig-ME] Modelling count data in glmer with an apriori model selection
Ben Bolker
bbolker at gmail.com
Tue Apr 18 03:03:31 CEST 2017
On 17-04-17 08:51 PM, Lorraine Scotson wrote:
> Hi All,
>
> I am modeling bear distribution in Lao PDR, with sign count data collected
> on transects, in glmer, using a degrees of freedom spending, apriori
> modeling approach. I have calculated the number of degrees of freedom my
> model can afford based on my effective sample size (i.e. number of line
> transects), with degrees of freedoms calculated as the number of
> non-intercept model-generated coefficients to be estimated. I have study
> site as a random effect (n=7).
Out of curiosity, how many df *can* you afford (how many line transects)?
> My objectives are to model bear occurrence as a function of covariates, to
> rank those covariates in order of importance, and predict the distribution
> of bears throughout the whole country (i.e extrapolate outside study
> sites). This is my first experience with an apriori modelling strategy, and
> i have a number of questions for which i have not found answers in the
> published literature. I would be grateful for any advice anyone may have:
>
> - how many degrees of freedom will including a 7-level random effect incur?
If you don't allow for variation in covariate effect across sites, 1.
If you allow for (correlated) variation in n covariate effects across
sites, n*(n+1)/2. (The number of levels of the random effect does not
affect this conclusion, although 7 sites is small for using a random
effect - you might end up with a singular model, and have to decide what
to do about it).
> - My understanding is that i must pick my probability distribution (i.e.
> Poisson, Neg Bin) apriori, and so i cannot use the usual post model checks
> to determine is my chosen distribution was appropriate. Is this correct?
You should choose your probability distribution a priori, but you
*can* (and should) use post-fitting checks (scale-location, Q-Q,
overdispersion analysis, etc.) to see if there are any big problems with
your choice.
>
> - My understanding is that i'll be penalized an extra degree of freedom by
> using a Negative Binomial distribution. Is this correct?
Yes. But this is a case where "saving" a degree of freedom wouldn't
be wise.
>
> - How do i decide between using a Poisson or a Negative binomial
> distribution? Is there some post hoc checks i can do, without exploring
> the relationship between the response and the predictors, to inform my
> decision?
Yes. Check for overdispersion.
>
> (The literature tells me that count data are rarely Poisson distributed,
> and that Negative binomial is the most common distribution that accounts
> for over dispersion. I have ruled out zero-inflation; my response has
> plenty of zero's, but i feel they they will be accounted for by the model
> covariates).
>
> - In the context of my study objectives, what are the consequences of using
> a Poisson distribution when my data are really Negative Binomial (i.e. does
> the distribution of the residuals of the response really matter?)?
If your data are overdispersed (variance greater than expected from
Poisson), you will be in big trouble -- all of your conclusions
(p-values, confidence intervals) will be overconfident.
I would recommend http://bbolker.github.io/mixedmodels-misc/ ,
especially "GLMM FAQ" and "supplementary materials for Bolker (2015)",
both of which have sections on overdispersion.
It would be possible to use a "quasi-likelihood approach" -- correct
your estimated confidence intervals and p-values (as well as AICs etc.)
for overdispersion, without explicitly using an overdispersed distribution.
>
> Many thanks in advance for any insights you can offer.
>
> Best wishes
> Lorraine
>
More information about the R-sig-mixed-models
mailing list