[R-sig-ME] Binomial GLMM vs GLM question (Ken Beath)
John Maindonald
john.maindonald at anu.edu.au
Sat May 17 02:51:54 CEST 2008
Philosophically, I am sure that Andrew Gelman is right, and
that applies also when estimates of one or other of the effects
are based on very limited information. One can indeed banish
the random effect and its associated uncertainty, but all that
does is to transfer the uncertainty to another place. The
conclusions from the study must be hedged around with ifs and
buts, relative to the generalizations that the researcher might
have wanted to draw.
Other evidence that places the results in some kind of context
becomes crucial. "X did a study that is roughly comparable to
ours, where variation between the 5 years of the study was of
no consequence relative to variation between the 10 ponds ...".
Or, "Four of the 5 years were dry, with variation between years
that was of minor consequence. In the one wet year, results
were very different ...". Or "X's results suggest that we might
have to multiply the SEs by as much as 3 to account for variation
between years. In other words, the researcher, while gaining
all the leverage possible from other sources of evidence, has
to be brutally honest about limitations, even to the extent of
dealing it a mortal blow! RA Fisher's dictum is no doubt too
extreme as a general comment, but it does have a certain
relevance (one can see where it is pointing):
"To call in the statistician after the experiment is done may be
no more than asking him to perform a post-mortem examination:
he may be able to say what the experiment died of."
Even with data from 4 or 5 years, there's not much that can be
done about sequential correlation between years. One has
little choice but to treat the years as random, or at least to assume
an AR1 or similarly simple error structure.
Non-normality of the rainfall data does not invalidate the analysis.
There is however an issue of leverage. If the distribution is highly
positively skew, then the heavy tail may unduly wag (lever) the
analysis. Maybe, try the cube root transformation that has some
currency in the literature.
John Maindonald email: john.maindonald at anu.edu.au
phone : +61 2 (6125)3473 fax : +61 2(6125)5549
Centre for Mathematics & Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 17 May 2008, at 6:53 AM, Ben Bolker wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jessica Stapley wrote:
> |> Message: 2
> |> Date: Fri, 16 May 2008 02:36:41 -0000 (UTC)
> |> From: "Ken Beath" <kjbeath at kagi.com>
> |> Subject: Re: [R-sig-ME] Binomial GLMM vs GLM question
> |> To: "Justin Touchon" <jtouchon at bu.edu>
> |> Cc: r-sig-mixed-models at r-project.org
> |> Message-ID:
> |> <37086.129.78.64.102.1210905401.squirrel at mail.kjbeath.com.au>
> |> Content-Type: text/plain;charset=iso-8859-1
> |>
> |>> Dear Dr. Bates and other LMER experts,
> |>> I am admittedly entry level in my R and mixed-model knowledge,
> |>> but
> |>> I'm hoping that someone can help me and also forgive my lack of
> |>> insight. Over 3 years, I monitored survival of 350 egg masses
> at two
> |>> ponds. I thus have one continuous variable (rainfall) and two
> |>> discrete
> |>> variables (year and pond). My response variable, mortality, is
> |>> coded as
> |>> a two column matrix featuring eggs survived and eggs dead. I'm
> |>> primarily
> |>> interested in the effect of rain on survival, but also if rain has
> |>> different impacts at the different ponds and how much survival
> varied
> |>> over the three years. Originally, I though I could tackle this
> |>> with a
> |>> binomial GLM, but do I need a binomial GLMM instead, as rainfall
> and
> |>> year would be random and pond fixed? The problem with this is
> |>> trying to
> |>> make biological sense out of the results. I've spent the last
> week
> |>> reading all the past posts about why p-values can't be
> calculated and
> |>> all that, which I'm fine with. But what can I say about the
> |>> effects of
> |>> rainfall or year on egg survival from the variance estimates?
> Also,
> |>> doesn't LMER require that random factors be normally distributed,
> |>> because my rainfall measurements are far from it. Is that a
> problem?
> |>> Thank you in advance for any advice you can give.
> |>> -Justin Touchon
> |>>
> |> I think your misunderstanding the idea of a random effect. This is
> |> something that is unobserved, causing correlation within a group.
> |> In your
> |> data this might be year or pond but definitely not rainfall which
> is
> |> simply a covariate. You have more than one measurement on a pond
> |> and more
> |> than one for each year, so it is likely that there will be
> correlation
> |> between them and one way of dealing with this is a random effect.
> The
> |> alternative is to use a fixed effects model. In your case, there
> |> are only
> |> 2 and 3 groups, so a fixed effects model is the best approach, so a
> |> GLM is
> |> appropriate. If there were say 20 ponds, a random effects model
> |> would be
> |> much more suitable.
> |>
> |> Ken
> |
>
> ~ So -- I may be wrong (others please jump in) -- but I disagree with
> several points in the most recent comment.
>
> | I think this is a good example of what seems a common problem for
> | people using mixed models - How to decide what are the random
> factors
> | and what are the fixed factors.
>
> ~ I agree that this is difficult, and contentious (see e.g.
> Andrew Gelman, "Analysis of variance--why it is more important than
> ever," Annals of Statistics 33, no. 1 (2005): 1-53,
> doi:http://dx.doi.org/doi:10.1214/009053604000001048)
>
>
> | In your study you have taken repeated measurements on the same
> ponds,
> | as such, observations from each pond across years are not
> independent
> | (observations in pond 1 at t+1 will be correlated with observations
> | at t). If you were only interested in the effect of rainfall on egg
> | survival I would suggest you use a mixed model with year and pond as
> | random effects (based on a similar eg in Crawley's R Book p605). In
> | this case I assume you have measurements on every pond in every year
> | and as such these random effects are crossed (not nested).
>
> ~ (a) I would suggest that a mixed model is NOT going to work well
> in this case, even if year and pond are "philosophically" random
> effects
> (i.e., you don't really care about what happens in those specific
> ponds, and you may even have chosen them with a random-number
> generator
> out of a list of all possible ponds -- although this is much less
> likely with years ...). The technical problem is that estimating
> variances from 2 or 3 points is nasty. This translates into
> inference/philosophical terms because these few points really
> don't give you the data to generalize about the population, even if
> you want to. (I think one of the confusions is that in the classical
> method-of-moments world there's nothing that says you can't have 2
> denominator degrees of freedom -- your power will be terrible, but
> the expressions won't blow up on you [unless you get negative variance
> estimates ...])
> |
> | m1 <- lmer(mort~rainfall + (year|pond), family = binomial, data=
> | FieldData0305)
>
> ~ (1|year)+(1|pond) might work OK, and be slightly more
> parsimonious (OR ponds nested within years, (1|year)+(1|pond:year), or
> vice versa -- and if you're not dead set on treating the random
> effects
> as "effects of year" and "effects of pond", but were willing to
> treat it
> as "effects of pond within year" that would buy you the flexibility
> to use some other package that wasn't so good at crossed
> effects.)
>
>
> | summary(m1)
> | From the summary output you can assess the influence of the fixed
> | effect to see if the estimate for rainfall (slope estimate) is
> | different to zero.
> |
> | Even if the number of groups (years and ponds) is small it is still
> | better to take account of this group variation than ignore it (see
> | Gelman and Hill 200&.
>
> ~ Yes, but ... what if all you can get out of the model is that
> the estimate is nearly zero? If you go Bayesian instead you can
> deal with some problems by setting an informative prior [in theory
> setting a proper prior, no matter how weak, would generally solve
> the problem, but in reality I suspect that if the model is
> overparameterized you're going to have nasty technical difficulties
> even if the model is theoretically OK])
> ~ With respect to Andrew Gelman (I can't believe I'm saying this --
> sacrilege) but I think he's used to really big social-science data
> sets,
> where "leave stuff in when you're not sure" is more generally
> a good idea than it is in typical field ecology data sets, where
> the bias-variance tradeoff bites harder. (At least there are 350
> data points here,
>
> | From this model you can also get a feeling for the between year and
> | between pond variation by looking at the random effects estimates
> and
> | variances. Obviously you will have some idea if you just plot the
> | means for each year and each pond.
>
> ~ Only if the variance estimates don't suck.
>
> | If you are also interested in understanding how egg survival differs
> | between ponds or between years and if this interacts with rainfall
> | then it become less straight forward.
> |
> | I would suggest that you try
> |
> | m2 <- lmer(mort~rainfall * year * pond + (year|pond), family =
> | binomial, data= FieldData0305)
> | summary(m2)
> |
> | In the summary you will have estimates for all three factors and
> | their interactions and you can ascertain if these are good
> | explanatory variables for egg survival.
>
> ~ Isn't this overparameterized? We have a fixed effect for
> each year:pond combination (and variation in the slopes of
> the effect with respect to rainfall), as well as a random-effect
> level for years and ponds?
>
> |
> | You may be able to argue that the correlation between years or
> | between ponds is negligible and therefore you can leave out the
> | random effect and only use a GLM, but I would be inclined to avoid
> | model simplification - but other may suggest otherwise.
> |
> | Jess
> |
>
>
> |
> |>> My model and output are as follows:
> |>>
> |>>> LMER.1<-lmer(mort~Pond + (Pond|total_rainfall) + (1|Year),
> |>> family=binomial, data= FieldData0305)
> |>>
> |>>> summary(LMER.1)
> |>> Generalized linear mixed model fit using Laplace
> |>> Formula: mort ~ Pond + (Pond | total_rainfall) + (1 | Year)
> |>> Data: FieldData0305
> |>> Family: binomial(logit link)
> |>> AIC BIC logLik deviance
> |>> 7657 7680 -3822 7645
> |>> Random effects:
> |>> Groups Name Variance Std.Dev. Corr
> |>> total_rainfall (Intercept) 21.66535 4.65461
> |>> Pond[T.Ocelot] 6.44297 2.53830 -0.627
> |>> Year (Intercept) 0.74082 0.86071
> |>> number of obs: 350, groups: total_rainfall, 48; Year, 3
> |>>
> |>> Estimated scale (compare to 1 ) 4.603433
> |>>
> |>> Fixed effects:
> |>> Estimate Std. Error z value Pr(>|z|)
> |>> (Intercept) -0.4678 1.0173 -0.4598 0.646
> |>> Pond[T.Ocelot] -0.9831 0.9330 -1.0538 0.292
> |>>
> |>> Correlation of Fixed Effects:
> |>> (Intr)
> |>> Pnd[T.Oclt] -0.648
> |>>
> |>> _______________________________________________
> |>> R-sig-mixed-models at r-project.org mailing list
> |>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> |>>
> |
> | Jessica Stapley
> | Department of Animal and Plant Sciences,
> | Alfred Denny Building
> | Sheffield, S10 2TN, UK
> | ph +44 (0)114 222 0112; fx +44 (0)114 222 0002
> | [[alternative HTML version deleted]]
> |
> | _______________________________________________
> | R-sig-mixed-models at r-project.org mailing list
> | https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.6 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFILfQzc5UpGjwzenMRArz/AJ9D/bNPvQwicMp1n9KTxo4ZvXZ7iACfdznJ
> F3NWHTkIVWVG5N2zxdgNnDk=
> =rT/4
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> R-sig-mixed-models at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
More information about the R-sig-mixed-models
mailing list