[R] Question on overdispersion
Ben Bolker
bbolker at gmail.com
Fri Nov 19 14:11:25 CET 2010
cct663 <cct663 <at> gmail.com> writes:
> I have a few questions relating to overdispersion in a sex ratio data set
> that I am working with (note that I already have an analysis with GLMMs for
> fixed effects, this is just to estimate dispersion). The response variable
> is binomial because nestlings can only be male or female. I have samples of
> 1-5 nestlings from each nest (individuals within a nest are not independent,
> so the response variable is the ratio of sons to daughters) and some females
> have multiple nests in the data set (so I need to include female identity as
> a random effect).
>
> Here is an example of what the three vectors used in the model look like
> (the real data set is much bigger, just to illustrate what I’m talking
> about):
>
> male_chick_no=c(2,4,1,0,3,5,2)
> female_chick_no=c(1,0,3,3,1,0,2)
> FemaleID=c(A,A,B,B,C,D,E)
>
> My first question relates to coding the test in R. I received this suggested
> R syntax from a reviewer:
>
> SexRatio = cbind(male_chick_no, female_chick_no)
>
> Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial)
>
> But when I try to use this in R I get the error: “Error in glmer(formula =
> ratio ~ 1 + (1 | femid), family = quasibinomial) : "quasi" families cannot
> be used in glmer”.
After many discussions of the unreliability of quasi-likelihood
estimation in lme4, Doug Bates added this error/warning in a fairly
recent version. The recommended approach now is to use individual-level
random effects: to continue your example above,
indiv <- 1:length(FemaleID)
Model <- lmer(SexRatio ~ 1 + (1|indiv) + (1|FemaleID), family=binomial)
By the way, it is generally better practice to put all of your data
into a dataframe and use the data= argument to lmer.
> My second question is more general: I understand that with binomial data
> overdispersion suggests that the observed data have a greater variance than
> expected given binomial errors (in my case this means that more nests would
> be all male/all female than expected if sex is random). So with binomial
> errors the expected estimate of dispersion is 1, if I find that dispersion
> is > 1 it suggests that my data are overdispersed. My question is, how much
> greater than 1 should that number be to conclude that the data are
> overdispersed? Is there a rule of thumb or does it just depend on the
> dataset?
Very generally/crudely, the (squared) Pearson residuals are expected to be
chi-square distributed. Specifically, if you know the residual
degrees of freedom (which can be a bit tricky/poorly defined for mixed
models, but is approximately (# data points)-(# estimated parameters),
then sum(residuals^2) should be chi-squared distributed with df equal
to the residual degrees of freedom. Venables and Ripley have a useful
discussion.
> I was thinking of doing a randomization test with the same structure (nest
> size and female id) as my real data set but with sex ratio of each nest
> randomized with a 50:50 chance of individuals being sons or daughters and
> comparing my observed dispersion to the distribution of dispersions from the
> randomization test. Would this be a valid way to ask whether my data are
> overdispersed? Is it even necessary?
It seems reasonable. You could go even farther and simulate data
from your estimated model with binomial errors (i.e. use the estimated
sex ratios rather than assuming a 50/50 sex ratio).
Followups should probably be sent to the r-sig-mixed-models list.
More information about the R-help
mailing list