[R-sig-ME] very different model estimates in GLM vs. GLMM

ONKELINX, Thierry Thierry.ONKELINX at inbo.be
Mon Dec 22 09:22:58 CET 2014


Dear Vincenzo,

1. They are conditional on the random effects. Using re.form = ~ 0 gives predictions for the ‘average’ item, that is the item with random intercept = 0. It is NOT the average over all items. It only takes the fixed effects into account to make the **predictions**. Note that the fixed effects were fitted conditional on the random effects.
2. Re.form = NULL in my example would give a bunch of parallel curves. It would give predictions for the **observed** items.
3. No, that would still be conditional. If you need the marginal distribution, you need to integrate over the random effects. As Henrik pointed out in https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q2/022258.html.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.
~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

Van: Vincenzo Ellis [mailto:vincenzoaellis op gmail.com]
Verzonden: zaterdag 20 december 2014 16:26
Aan: ONKELINX, Thierry
CC: r-sig-mixed-models op r-project.org
Onderwerp: Re: [R-sig-ME] very different model estimates in GLM vs. GLMM

Dear Thierry,

Thank you so much for the example and explanation. I have just a few follow-up questions for you or anyone else in the group if you have time. Or if there is a good reference that I could be directed to, that would also be great, as I clearly have some reading to do.

1. On what are the conditional distributions conditional? I noticed that in the predict function in your code for the glmer model you set re.form equal to ~0, which the help file for predict.merMod leads me to believe means that the random effect in the model was not accounted for. So what is going on there, i.e., what is the conditional distribution based on there?  And are the fixed effects estimates from the glmer model conditional on the random effect of the model?

2. If you had let the predict function for the glmer model use the random effect from that model (e.g., re.form = NULL), how would it have worked? Would it have compared observations that are in different levels of the random effect somehow?

3. This last one has been asked before on this listserv (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q2/022258.html), and that is how to make glmer give marginal probabilities instead of conditional probabilities. The suggested answers apparently all ignore random effects in the model. If you ignore the random effects in the glmer model would the marginal probabilities be equivalent to those from a glm model with no random effects?


Thanks again for all your help.

Vincenzo

On Fri, Dec 19, 2014 at 8:33 PM, ONKELINX, Thierry <Thierry.ONKELINX op inbo.be> wrote:
Dear Vincenzo,

You are getting confused by the difference between a marginal and conditinal distribution. The glm returns the marginal distribution, while the glmm returns the conditional distribution. These distributions coindice with the gaussian distribution, but they don't with the binomial distribution.

I've put a example together to illustrate the difference between the marginal and conditional distribution in the binomial case. https://gist.github.com/anonymous/b1188a5cd3213132267a

Best regards,

Thierry

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium
+ 32 2 525 02 51
+ 32 54 43 61 85
Thierry.Onkelinx op inbo.be
www.inbo.be
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

________________________________________
Van: R-sig-mixed-models [r-sig-mixed-models-bounces op r-project.org] namens Vincenzo Ellis [vincenzoaellis op gmail.com]
Verzonden: vrijdag 19 december 2014 18:45
Aan: r-sig-mixed-models op r-project.org
Onderwerp: [R-sig-ME] very different model estimates in GLM vs. GLMM

Dear all,

I'm trying to model parasite prevalence in bird species as a function of
the nest height (factor) of those species. When I run a binomial GLM I get
estimates for each nest height level that are rather similar to the mean
observed prevalence at each level. However when I run a binomial GLMM with
taxonomic family as a random effect (intercept) one of the level estimates
differs greatly from the GLM estimates. I suspect the problem is that the
random effect has only one or two replicates per level (i.e., species), but
I want to confirm that I haven't made a mistake in coding. Data and code
follow. Thanks for any insights and for taking the time to help.

Vincenzo

*# the data*
data <- structure(list(Species_Code =
        structure(1:20, .Label = c("AMGO", "AMRO", "BGGN", "BHCO", "BRTH",
"CARW", "COYE", "EABL", "EAPH",
        "FISP", "HOFI", "HOSP", "INBU", "NOBO", "NOCA", "NOMO", "SOSP",
"TRES", "WEVI", "YBCH"), class =
          "factor"), n = c(19L, 5L, 5L, 6L, 8L, 16L, 32L, 4L, 4L, 27L, 4L,
12L, 50L, 4L, 36L, 9L, 5L,
        8L, 8L, 29L), taxonomic.family = structure(c(3L, 12L, 10L, 5L, 6L,
11L, 8L, 12L, 13L, 2L, 3L, 9L,
        1L, 7L, 1L, 6L, 2L, 4L, 14L, 8L), .Label = c("Cardinalidae",
"Emberizidae", "Fringillidae",
        "Hirundinidae", "Icteridae", "Mimidae", "Odontophoridae",
"Parulidae", "Passeridae", "Polioptilidae",
        "Troglodytidae", "Turdidae", "Tyrannidae", "Vireonidae"), class =
"factor"), Total.pos = c(1L, 3L, 2L, 2L,
        4L, 1L, 7L, 3L, 0L, 13L, 1L, 1L, 29L, 0L, 32L, 6L, 5L, 0L, 1L,
18L), Total.neg = c(18L, 2L, 3L, 4L, 4L, 15L,
        25L, 1L, 4L, 14L, 3L, 11L, 21L, 4L, 4L, 3L, 0L, 8L, 7L, 11L),
nest_ht2 = structure(c(3L, 2L, 3L, 3L, 2L, 2L,
        1L, 2L, 2L, 1L, 3L, 2L, 1L, 1L, 2L, 2L, 2L, 3L, 1L, 1L), .Label =
c("1", "2", "3"), class = "factor")),
        .Names = c("Species_Code", "n", "taxonomic.family", "Total.pos",
"Total.neg", "nest_ht2"),
        class = "data.frame", row.names = c(3L, 5L, 7L, 8L, 11L, 13L, 17L,
19L, 20L, 23L, 25L, 26L, 27L, 28L, 29L,
        30L, 34L, 37L, 40L, 43L))


*# when we look at the total prevalence across nest heights (i.e.,
nest_ht2) we see:*

require(plyr)
(observed <- ddply(data, "nest_ht2",
                  summarize, mean.prevalence =
                    mean(Total.pos/n)))


*# if we run a normal binomial GLM the estimates for each of the nest
heights are*

    # quick function to convert logit scale values to probabilities
probs <- function(x){
  exp(x)/(exp(x)+1)
}

model.basic <- glm(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2,
family=binomial, data=data)
probs(coef(model.basic))


*# if we run a binomial GLMM with taxonomic.family as a random effect
(intercept), the estimates*
*# for each of the nest heights are*

require(lme4)
model.mixed <- glmer(cbind(Total.pos, Total.neg) ~ 0 + nest_ht2 +
(1|taxonomic.family),
                     family=binomial, data=data)
probs(fixef(model.mixed))


*# if we put all of these together we can see that the model.basic is
closer to the observed mean prevalence across*
*# nest heights than the model.mixed is. In particular, in the model.mixed
the estimate for the first level of nest *
*# height is much lower (0.14) than in the model.basic (0.45).*

data.frame(observed, model.basic = probs(coef(model.basic)), model.mixed =
probs(fixef(model.mixed)))


*# Could this difference between the GLM and the GLMM have to do with the
low number of replicates in each of the*
*# levels of the random effect (taxonomic.family)? Or is there a problem
with coding?*

arrange(ddply(data, "taxonomic.family", summarize,
taxonomic.family_sample_size = length(taxonomic.family)),
        desc(taxonomic.family_sample_size))

*# You can see that only six of the families have more than one species
sampled and none of those have more than two. If the*
*# problem really is the mixed effect model not having enough data, how
should one respond to reviewers who insist that*
*# taxonomy must be controlled for in such an analysis?*

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models op r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>

Disclaimer Bezoek onze website / Visit our website<https://drupal.inbo.be/nl/disclaimer-mailberichten-van-het-inbo>


More information about the R-sig-mixed-models mailing list