[R] Question on overdispersion

Fri Nov 19 14:11:25 CET 2010

cct663 <cct663 <at> gmail.com> writes:

> I have a few questions relating to overdispersion in a sex ratio data set
> that I am working with (note that I already have an analysis with GLMMs for
> fixed effects, this is just to estimate dispersion). The response variable
> is binomial because nestlings can only be male or female. I have samples of
> 1-5 nestlings from each nest (individuals within a nest are not independent,
> so the response variable is the ratio of sons to daughters) and some females
> have multiple nests in the data set (so I need to include female identity as
> a random effect). 
> 
> Here is an example of what the three vectors used in the model look like
> (the real data set is much bigger, just to illustrate what I’m talking
> about):
> 
> male_chick_no=c(2,4,1,0,3,5,2)
> female_chick_no=c(1,0,3,3,1,0,2)
> FemaleID=c(A,A,B,B,C,D,E)
> 
> My first question relates to coding the test in R. I received this suggested
> R syntax from a reviewer:
> 
> SexRatio = cbind(male_chick_no, female_chick_no)
> 
> Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial)
> 
> But when I try to use this in R I get the error: “Error in glmer(formula =
> ratio ~ 1 + (1 | femid), family = quasibinomial) : "quasi" families cannot
> be used in glmer”.

   After many discussions of the unreliability of quasi-likelihood
estimation in lme4, Doug Bates added this error/warning in a fairly
recent version.  The recommended approach now is to use individual-level
random effects: to continue your example above,

 indiv <- 1:length(FemaleID)
 Model <- lmer(SexRatio ~ 1 + (1|indiv) + (1|FemaleID), family=binomial)

  By the way, it is generally better practice to put all of your data
into a dataframe and use the data= argument to lmer.

> My second question is more general: I understand that with binomial data
> overdispersion suggests that the observed data have a greater variance than
> expected given binomial errors (in my case this means that more nests would
> be all male/all female than expected if sex is random). So with binomial
> errors the expected estimate of dispersion is 1, if I find that dispersion
> is > 1 it suggests that my data are overdispersed. My question is, how much
> greater than 1 should that number be to conclude that the data are
> overdispersed? Is there a rule of thumb or does it just depend on the
> dataset? 

   Very generally/crudely, the (squared) Pearson residuals are expected to be
chi-square distributed.  Specifically, if you know the residual
degrees of freedom (which can be a bit tricky/poorly defined for mixed
models, but is approximately (# data points)-(# estimated parameters),
then sum(residuals^2) should be chi-squared distributed with df equal
to the residual degrees of freedom.  Venables and Ripley have a useful
discussion.

> I was thinking of doing a randomization test with the same structure (nest
> size and female id) as my real data set but with sex ratio of each nest
> randomized with a 50:50 chance of individuals being sons or daughters and
> comparing my observed dispersion to the distribution of dispersions from the
> randomization test. Would this be a valid way to ask whether my data are
> overdispersed? Is it even necessary?

  It seems reasonable.  You could go even farther and simulate data
from your estimated model with binomial errors (i.e. use the estimated
sex ratios rather than assuming a 50/50 sex ratio).

  Followups should probably be sent to the r-sig-mixed-models list.