[R-sig-ME] very different model estimates in GLM vs. GLMM

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Fri Dec 19 23:33:54 CET 2014


Dear Vincenzo,

You are getting confused by the difference between a marginal and conditinal distribution. The glm returns the marginal distribution, while the glmm returns the conditional distribution. These distributions coindice with the gaussian distribution, but they don't with the binomial distribution.

I've put a example together to illustrate the difference between the marginal and conditional distribution in the binomial case. https://gist.github.com/anonymous/b1188a5cd3213132267a

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

________________________________________
Van: R-sig-mixed-models [r-sig-mixed-models-bounces op r-project.org] namens Vincenzo Ellis [vincenzoaellis op gmail.com]
Verzonden: vrijdag 19 december 2014 18:45
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] very different model estimates in GLM vs. GLMM

Dear all,

I'm trying to model parasite prevalence in bird species as a function of
the nest height (factor) of those species. When I run a binomial GLM I get
estimates for each nest height level that are rather similar to the mean
observed prevalence at each level. However when I run a binomial GLMM with
taxonomic family as a random effect (intercept) one of the level estimates
differs greatly from the GLM estimates. I suspect the problem is that the
random effect has only one or two replicates per level (i.e., species), but
I want to confirm that I haven't made a mistake in coding. Data and code
follow. Thanks for any insights and for taking the time to help.

Vincenzo

*# the data*
data <- structure(list(Species_Code =
        structure(1:20, .Label = c("AMGO", "AMRO", "BGGN", "BHCO", "BRTH",
"CARW", "COYE", "EABL", "EAPH",
        "FISP", "HOFI", "HOSP", "INBU", "NOBO", "NOCA", "NOMO", "SOSP",
"TRES", "WEVI", "YBCH"), class =
          "factor"), n = c(19L, 5L, 5L, 6L, 8L, 16L, 32L, 4L, 4L, 27L, 4L,
12L, 50L, 4L, 36L, 9L, 5L,
        8L, 8L, 29L), taxonomic.family = structure(c(3L, 12L, 10L, 5L, 6L,
11L, 8L, 12L, 13L, 2L, 3L, 9L,
        1L, 7L, 1L, 6L, 2L, 4L, 14L, 8L), .Label = c("Cardinalidae",
"Emberizidae", "Fringillidae",
        "Hirundinidae", "Icteridae", "Mimidae", "Odontophoridae",
"Parulidae", "Passeridae", "Polioptilidae",
        "Troglodytidae", "Turdidae", "Tyrannidae", "Vireonidae"), class =
"factor"), Total.pos = c(1L, 3L, 2L, 2L,
        4L, 1L, 7L, 3L, 0L, 13L, 1L, 1L, 29L, 0L, 32L, 6L, 5L, 0L, 1L,
18L), Total.neg = c(18L, 2L, 3L, 4L, 4L, 15L,
        25L, 1L, 4L, 14L, 3L, 11L, 21L, 4L, 4L, 3L, 0L, 8L, 7L, 11L),
nest_ht2 = structure(c(3L, 2L, 3L, 3L, 2L, 2L,
        1L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L), .Label =
c("1", "2", "3"), class = "factor")),
        .Names = c("Species_Code", "n", "taxonomic.family", "Total.pos",
"Total.neg", "nest_ht2"),
        class = "data.frame", row.names = c(3L, 5L, 7L, 8L, 11L, 13L, 17L,
19L, 20L, 23L, 25L, 26L, 27L, 28L, 29L,
        30L, 34L, 37L, 40L, 43L))



*# when we look at the total prevalence across nest heights (i.e.,
nest_ht2) we see:*

require(plyr)
(observed <- ddply(data, "nest_ht2",
                  summarize, mean.prevalence =
                    mean(Total.pos/n)))


*# if we run a normal binomial GLM the estimates for each of the nest
heights are*

    # quick function to convert logit scale values to probabilities
probs <- function(x){
  exp(x)/(exp(x)+1)
}

model.basic <- glm(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2,
family=binomial, data=data)
probs(coef(model.basic))


*# if we run a binomial GLMM with taxonomic.family as a random effect
(intercept), the estimates*
*# for each of the nest heights are*

require(lme4)
model.mixed <- glmer(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2 +
(1|taxonomic.family),
                     family=binomial, data=data)
probs(fixef(model.mixed))


*# if we put all of these together we can see that the model.basic is
closer to the observed mean prevalence across*
*# nest heights than the model.mixed is. In particular, in the model.mixed
the estimate for the first level of nest *
*# height is much lower (0.14) than in the model.basic (0.45).*

data.frame(observed, model.basic = probs(coef(model.basic)), model.mixed =
probs(fixef(model.mixed)))


*# Could this difference between the GLM and the GLMM have to do with the
low number of replicates in each of the*
*# levels of the random effect (taxonomic.family)? Or is there a problem
with coding?*

arrange(ddply(data, "taxonomic.family", summarize,
taxonomic.family_sample_size = length(taxonomic.family)),
        desc(taxonomic.family_sample_size))

*# You can see that only six of the families have more than one species
sampled and none of those have more than two. If the*
*# problem really is the mixed effect model not having enough data, how
should one respond to reviewers who insist that*
*# taxonomy must be controlled for in such an analysis?*

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models op r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>



More information about the R-sig-mixed-models mailing list