[R-sig-ME] very different model estimates in GLM vs. GLMM

Vincenzo Ellis vincenzoaellis at gmail.com
Fri Dec 19 18:45:43 CET 2014

Dear all,

I'm trying to model parasite prevalence in bird species as a function of
the nest height (factor) of those species. When I run a binomial GLM I get
estimates for each nest height level that are rather similar to the mean
observed prevalence at each level. However when I run a binomial GLMM with
taxonomic family as a random effect (intercept) one of the level estimates
differs greatly from the GLM estimates. I suspect the problem is that the
random effect has only one or two replicates per level (i.e., species), but
I want to confirm that I haven't made a mistake in coding. Data and code
follow. Thanks for any insights and for taking the time to help.


*# the data*
data <- structure(list(Species_Code =
        structure(1:20, .Label = c("AMGO", "AMRO", "BGGN", "BHCO", "BRTH",
        "FISP", "HOFI", "HOSP", "INBU", "NOBO", "NOCA", "NOMO", "SOSP",
"TRES", "WEVI", "YBCH"), class =
          "factor"), n = c(19L, 5L, 5L, 6L, 8L, 16L, 32L, 4L, 4L, 27L, 4L,
12L, 50L, 4L, 36L, 9L, 5L,
        8L, 8L, 29L), taxonomic.family = structure(c(3L, 12L, 10L, 5L, 6L,
11L, 8L, 12L, 13L, 2L, 3L, 9L,
        1L, 7L, 1L, 6L, 2L, 4L, 14L, 8L), .Label = c("Cardinalidae",
"Emberizidae", "Fringillidae",
        "Hirundinidae", "Icteridae", "Mimidae", "Odontophoridae",
"Parulidae", "Passeridae", "Polioptilidae",
        "Troglodytidae", "Turdidae", "Tyrannidae", "Vireonidae"), class =
"factor"), Total.pos = c(1L, 3L, 2L, 2L,
        4L, 1L, 7L, 3L, 0L, 13L, 1L, 1L, 29L, 0L, 32L, 6L, 5L, 0L, 1L,
18L), Total.neg = c(18L, 2L, 3L, 4L, 4L, 15L,
        25L, 1L, 4L, 14L, 3L, 11L, 21L, 4L, 4L, 3L, 0L, 8L, 7L, 11L),
nest_ht2 = structure(c(3L, 2L, 3L, 3L, 2L, 2L,
        1L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L), .Label =
c("1", "2", "3"), class = "factor")),
        .Names = c("Species_Code", "n", "taxonomic.family", "Total.pos",
"Total.neg", "nest_ht2"),
        class = "data.frame", row.names = c(3L, 5L, 7L, 8L, 11L, 13L, 17L,
19L, 20L, 23L, 25L, 26L, 27L, 28L, 29L,
        30L, 34L, 37L, 40L, 43L))

*# when we look at the total prevalence across nest heights (i.e.,
nest_ht2) we see:*

(observed <- ddply(data, "nest_ht2",
                  summarize, mean.prevalence =

*# if we run a normal binomial GLM the estimates for each of the nest
heights are*

    # quick function to convert logit scale values to probabilities
probs <- function(x){

model.basic <- glm(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2,
family=binomial, data=data)

*# if we run a binomial GLMM with taxonomic.family as a random effect
(intercept), the estimates*
*# for each of the nest heights are*

model.mixed <- glmer(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2 +
                     family=binomial, data=data)

*# if we put all of these together we can see that the model.basic is
closer to the observed mean prevalence across*
*# nest heights than the model.mixed is. In particular, in the model.mixed
the estimate for the first level of nest *
*# height is much lower (0.14) than in the model.basic (0.45).*

data.frame(observed, model.basic = probs(coef(model.basic)), model.mixed =

*# Could this difference between the GLM and the GLMM have to do with the
low number of replicates in each of the*
*# levels of the random effect (taxonomic.family)? Or is there a problem
with coding?*

arrange(ddply(data, "taxonomic.family", summarize,
taxonomic.family_sample_size = length(taxonomic.family)),

*# You can see that only six of the families have more than one species
sampled and none of those have more than two. If the*
*# problem really is the mixed effect model not having enough data, how
should one respond to reviewers who insist that*
*# taxonomy must be controlled for in such an analysis?*

	[[alternative HTML version deleted]]

More information about the R-sig-mixed-models mailing list