[R-sig-ME] very different model estimates in GLM vs. GLMM

Sat Dec 20 16:25:44 CET 2014

Dear Thierry,

Thank you so much for the example and explanation. I have just a few
follow-up questions for you or anyone else in the group if you have time.
Or if there is a good reference that I could be directed to, that would
also be great, as I clearly have some reading to do.

1. On what are the conditional distributions conditional? I noticed that in
the predict function in your code for the glmer model you set re.form equal
to ~0, which the help file for predict.merMod leads me to believe means
that the random effect in the model was not accounted for. So what is going
on there, i.e., what is the conditional distribution based on there?  And
are the fixed effects estimates from the glmer model conditional on the
random effect of the model?

2. If you had let the predict function for the glmer model use the random
effect from that model (e.g., re.form = NULL), how would it have worked?
Would it have compared observations that are in different levels of the
random effect somehow?

3. This last one has been asked before on this listserv (
https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q2/022258.html), and
that is how to make glmer give marginal probabilities instead of
conditional probabilities. The suggested answers apparently all ignore
random effects in the model. If you ignore the random effects in the glmer
model would the marginal probabilities be equivalent to those from a glm
model with no random effects?

Thanks again for all your help.

Vincenzo

On Fri, Dec 19, 2014 at 8:33 PM, ONKELINX, Thierry <Thierry.ONKELINX at inbo.be
> wrote:

> Dear Vincenzo,
>
> You are getting confused by the difference between a marginal and
> conditinal distribution. The glm returns the marginal distribution, while
> the glmm returns the conditional distribution. These distributions coindice
> with the gaussian distribution, but they don't with the binomial
> distribution.
>
> I've put a example together to illustrate the difference between the
> marginal and conditional distribution in the binomial case.
> https://gist.github.com/anonymous/b1188a5cd3213132267a
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
> Forest
> team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> Kliniekstraat 25
> 1070 Anderlecht
> Belgium
> + 32 2 525 02 51
> + 32 54 43 61 85
> Thierry.Onkelinx at inbo.be
> www.inbo.be
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
>
> ________________________________________
> Van: R-sig-mixed-models [r-sig-mixed-models-bounces at r-project.org] namens
> Vincenzo Ellis [vincenzoaellis at gmail.com]
> Verzonden: vrijdag 19 december 2014 18:45
> Aan: r-sig-mixed-models at r-project.org
> Onderwerp: [R-sig-ME] very different model estimates in GLM vs. GLMM
>
> Dear all,
>
> I'm trying to model parasite prevalence in bird species as a function of
> the nest height (factor) of those species. When I run a binomial GLM I get
> estimates for each nest height level that are rather similar to the mean
> observed prevalence at each level. However when I run a binomial GLMM with
> taxonomic family as a random effect (intercept) one of the level estimates
> differs greatly from the GLM estimates. I suspect the problem is that the
> random effect has only one or two replicates per level (i.e., species), but
> I want to confirm that I haven't made a mistake in coding. Data and code
> follow. Thanks for any insights and for taking the time to help.
>
> Vincenzo
>
> *# the data*
> data <- structure(list(Species_Code =
>         structure(1:20, .Label = c("AMGO", "AMRO", "BGGN", "BHCO", "BRTH",
> "CARW", "COYE", "EABL", "EAPH",
>         "FISP", "HOFI", "HOSP", "INBU", "NOBO", "NOCA", "NOMO", "SOSP",
> "TRES", "WEVI", "YBCH"), class =
>           "factor"), n = c(19L, 5L, 5L, 6L, 8L, 16L, 32L, 4L, 4L, 27L, 4L,
> 12L, 50L, 4L, 36L, 9L, 5L,
>         8L, 8L, 29L), taxonomic.family = structure(c(3L, 12L, 10L, 5L, 6L,
> 11L, 8L, 12L, 13L, 2L, 3L, 9L,
>         1L, 7L, 1L, 6L, 2L, 4L, 14L, 8L), .Label = c("Cardinalidae",
> "Emberizidae", "Fringillidae",
>         "Hirundinidae", "Icteridae", "Mimidae", "Odontophoridae",
> "Parulidae", "Passeridae", "Polioptilidae",
>         "Troglodytidae", "Turdidae", "Tyrannidae", "Vireonidae"), class =
> "factor"), Total.pos = c(1L, 3L, 2L, 2L,
>         4L, 1L, 7L, 3L, 0L, 13L, 1L, 1L, 29L, 0L, 32L, 6L, 5L, 0L, 1L,
> 18L), Total.neg = c(18L, 2L, 3L, 4L, 4L, 15L,
>         25L, 1L, 4L, 14L, 3L, 11L, 21L, 4L, 4L, 3L, 0L, 8L, 7L, 11L),
> nest_ht2 = structure(c(3L, 2L, 3L, 3L, 2L, 2L,
>         1L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L), .Label =
> c("1", "2", "3"), class = "factor")),
>         .Names = c("Species_Code", "n", "taxonomic.family", "Total.pos",
> "Total.neg", "nest_ht2"),
>         class = "data.frame", row.names = c(3L, 5L, 7L, 8L, 11L, 13L, 17L,
> 19L, 20L, 23L, 25L, 26L, 27L, 28L, 29L,
>         30L, 34L, 37L, 40L, 43L))
>
>
>
> *# when we look at the total prevalence across nest heights (i.e.,
> nest_ht2) we see:*
>
> require(plyr)
> (observed <- ddply(data, "nest_ht2",
>                   summarize, mean.prevalence =
>                     mean(Total.pos/n)))
>
>
> *# if we run a normal binomial GLM the estimates for each of the nest
> heights are*
>
>     # quick function to convert logit scale values to probabilities
> probs <- function(x){
>   exp(x)/(exp(x)+1)
> }
>
> model.basic <- glm(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2,
> family=binomial, data=data)
> probs(coef(model.basic))
>
>
> *# if we run a binomial GLMM with taxonomic.family as a random effect
> (intercept), the estimates*
> *# for each of the nest heights are*
>
> require(lme4)
> model.mixed <- glmer(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2 +
> (1|taxonomic.family),
>                      family=binomial, data=data)
> probs(fixef(model.mixed))
>
>
> *# if we put all of these together we can see that the model.basic is
> closer to the observed mean prevalence across*
> *# nest heights than the model.mixed is. In particular, in the model.mixed
> the estimate for the first level of nest *
> *# height is much lower (0.14) than in the model.basic (0.45).*
>
> data.frame(observed, model.basic = probs(coef(model.basic)), model.mixed =
> probs(fixef(model.mixed)))
>
>
> *# Could this difference between the GLM and the GLMM have to do with the
> low number of replicates in each of the*
> *# levels of the random effect (taxonomic.family)? Or is there a problem
> with coding?*
>
> arrange(ddply(data, "taxonomic.family", summarize,
> taxonomic.family_sample_size = length(taxonomic.family)),
>         desc(taxonomic.family_sample_size))
>
> *# You can see that only six of the families have more than one species
> sampled and none of those have more than two. If the*
> *# problem really is the mixed effect model not having enough data, how
> should one respond to reviewers who insist that*
> *# taxonomy must be controlled for in such an analysis?*
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> Disclaimer Bezoek onze website / Visit our website<
> https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>
>

	[[alternative HTML version deleted]]