[R-sig-ME] [R] Question on overdispersion

Fri Nov 19 10:39:45 CET 2010

Dear Nameless,

The quasi distribution can no longer be used in lme4 because a) the
results were not very reliable b) there is an alternative to model
overdispersion.

The alternative is to expand your dataset to bernoulli trials. Then add
a random effect with one level per observation. This random effect will
model additive overdisperion. The quasi distributions model
overdisperion multiplicative.

In the example below, the random effect of RowID has 0 variance. Hence
no overdispersion.

dataset <- data.frame(
	male_chick_no = c(2,4,1,0,3,5,2), 
	female_chick_no=c(1,0,3,3,1,0,2), 
	FemaleID=c("A","A","B","B","C","D","E"))

longFormat <- do.call(rbind, lapply(seq_len(nrow(dataset)), function(i){
	with(dataset, data.frame(Sex = c(rep("M", male_chick_no[i]),
rep("F", female_chick_no[i])), FemaleID = FemaleID[i]))
}))
longFormat$FemaleID <- factor(longFormat$FemaleID)
longFormat$RowID <- factor(seq_len(nrow(longFormat)))
longFormat$Male <- longFormat$Sex == "M"

library(lme4)
fit1 <- glmer(Male ~ (1|FemaleID), data = longFormat, family = binomial)
fit2 <- glmer(Male ~ (1|FemaleID) + (1|RowID), data = longFormat, family
= binomial)
anova(fit1, fit2)

Best regards,

Thierry

PS sig-mixed-models is a better mailinglist for this kind of questions.

------------------------------------------------------------------------
----
ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek
team Biometrie & Kwaliteitszorg
Gaverstraat 4
9500 Geraardsbergen
Belgium

Research Institute for Nature and Forest
team Biometrics & Quality Assurance
Gaverstraat 4
9500 Geraardsbergen
Belgium

tel. + 32 54/436 185
Thierry.Onkelinx at inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to
say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of
data.
~ John Tukey

> -----Oorspronkelijk bericht-----
> Van: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] Namens cct663
> Verzonden: vrijdag 19 november 2010 5:39
> Aan: r-help at r-project.org
> Onderwerp: [R] Question on overdispersion
> 
> 
> I have a few questions relating to overdispersion in a sex 
> ratio data set that I am working with (note that I already 
> have an analysis with GLMMs for fixed effects, this is just 
> to estimate dispersion). The response variable is binomial 
> because nestlings can only be male or female. I have samples of
> 1-5 nestlings from each nest (individuals within a nest are 
> not independent, so the response variable is the ratio of 
> sons to daughters) and some females have multiple nests in 
> the data set (so I need to include female identity as a 
> random effect). 
> 
> Here is an example of what the three vectors used in the 
> model look like (the real data set is much bigger, just to 
> illustrate what I'm talking
> about):
> 
> male_chick_no=c(2,4,1,0,3,5,2)
> female_chick_no=c(1,0,3,3,1,0,2)
> FemaleID=c(A,A,B,B,C,D,E)
> 
> My first question relates to coding the test in R. I received 
> this suggested R syntax from a reviewer:
> 
> SexRatio = cbind(male_chick_no, female_chick_no)
> 
> Model <- lmer(SexRatio ~ 1 +(1|FemaleID), family = quasibinomial)
> 
> But when I try to use this in R I get the error: "Error in 
> glmer(formula = ratio ~ 1 + (1 | femid), family = 
> quasibinomial) : "quasi" families cannot be used in glmer".
> 
> I've tried playing around with some other mixed model 
> functions but can't seem to find one that will provide an 
> estimate of dispersion and allow me to include my random effect. 
> 
> Is there some other function I should be using? Or is there a 
> different syntax that I should use for lmer?
> 
> My second question is more general: I understand that with 
> binomial data overdispersion suggests that the observed data 
> have a greater variance than expected given binomial errors 
> (in my case this means that more nests would be all male/all 
> female than expected if sex is random). So with binomial 
> errors the expected estimate of dispersion is 1, if I find 
> that dispersion is > 1 it suggests that my data are 
> overdispersed. My question is, how much greater than 1 should 
> that number be to conclude that the data are overdispersed? 
> Is there a rule of thumb or does it just depend on the dataset? 
> 
> I was thinking of doing a randomization test with the same 
> structure (nest size and female id) as my real data set but 
> with sex ratio of each nest randomized with a 50:50 chance of 
> individuals being sons or daughters and comparing my observed 
> dispersion to the distribution of dispersions from the 
> randomization test. Would this be a valid way to ask whether 
> my data are overdispersed? Is it even necessary?
> 
> Any help/advice that you can provide would be greatly 
> appreciated. I am relatively new to R so explicit 
> instructions (i.e. easy to follow) would be wonderful.
> 
> Thanks.
> 
> --
> View this message in context: 
> http://r.789695.n4.nabble.com/Question-on-overdispersion-tp304
> 9898p3049898.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>