[R-sig-ME] Binomial GLMM vs GLM question (Ken Beath)

Fri May 16 22:53:07 CEST 2008

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jessica Stapley wrote:
|> Message: 2
|> Date: Fri, 16 May 2008 02:36:41 -0000 (UTC)
|> From: "Ken Beath" <kjbeath at kagi.com>
|> Subject: Re: [R-sig-ME] Binomial GLMM vs GLM question
|> To: "Justin Touchon" <jtouchon at bu.edu>
|> Cc: r-sig-mixed-models at r-project.org
|> Message-ID:
|> 	<37086.129.78.64.102.1210905401.squirrel at mail.kjbeath.com.au>
|> Content-Type: text/plain;charset=iso-8859-1
|>
|>> Dear Dr. Bates and other LMER experts,
|>>     I am admittedly entry level in my R and mixed-model knowledge,
|>> but
|>> I'm hoping that someone can help me and also forgive my lack of
|>> insight.  Over 3 years, I monitored survival of 350 egg masses at two
|>> ponds.  I thus have one continuous variable (rainfall) and two
|>> discrete
|>> variables (year and pond).  My response variable, mortality, is
|>> coded as
|>> a two column matrix featuring eggs survived and eggs dead. I'm
|>> primarily
|>> interested in the effect of rain on survival, but also if rain has
|>> different impacts at the different ponds and how much survival varied
|>> over the three years.  Originally, I though I could tackle this
|>> with a
|>> binomial GLM, but do I need a binomial GLMM instead, as rainfall and
|>> year would be random and pond fixed?  The problem with this is
|>> trying to
|>> make biological sense out of the results.  I've spent the last week
|>> reading all the past posts about why p-values can't be calculated and
|>> all that, which I'm fine with.  But what can I say about the
|>> effects of
|>> rainfall or year on egg survival from the variance estimates?  Also,
|>> doesn't LMER require that random factors be normally distributed,
|>> because my rainfall measurements are far from it.  Is that a problem?
|>> Thank you in advance for any advice you can give.
|>> -Justin Touchon
|>>
|> I think your misunderstanding the idea of a random effect. This is
|> something that is unobserved, causing correlation within a group.
|> In your
|> data this might be year or pond but definitely not rainfall which is
|> simply a covariate. You have more than one measurement on a pond
|> and more
|> than one for each year, so it is likely that there will be correlation
|> between them and one way of dealing with this is a random effect. The
|> alternative is to use a fixed effects model. In your case, there
|> are only
|> 2 and 3 groups, so a fixed effects model is the best approach, so a
|> GLM is
|> appropriate. If there were say 20 ponds, a random effects model
|> would be
|> much more suitable.
|>
|> Ken
|

~  So -- I may be wrong (others please jump in) -- but I disagree with
several points in the most recent comment.

| I think this is a good example of what seems a common problem for
| people using mixed models - How to decide what are the random factors
| and what are the fixed factors.

~  I agree that this is difficult, and contentious (see e.g.
Andrew Gelman, "Analysis of variance--why it is more important than
ever," Annals of Statistics 33, no. 1 (2005): 1-53,
doi:http://dx.doi.org/doi:10.1214/009053604000001048)

| In your study you have taken repeated measurements on the same ponds,
| as such, observations from each pond across years are not independent
| (observations in pond 1 at t+1 will be correlated with observations
| at t).  If you were only interested in the effect of rainfall on egg
| survival I would suggest you use a mixed model with year and pond as
| random effects (based on a similar eg in Crawley's R Book p605). In
| this case I assume you have measurements on every pond in every year
| and as such these random effects are crossed (not nested).

~  (a) I would suggest that a mixed model is NOT going to work well
in this case, even if year and pond are "philosophically" random effects
(i.e., you don't really care about what happens in those specific
ponds, and you may even have chosen them with a random-number generator
out of a list of all possible ponds -- although this is much less
likely with years ...).  The technical problem is that estimating
variances from 2 or 3 points is nasty.  This translates into
inference/philosophical terms because these few points really
don't give you the data to generalize about the population, even if
you want to.  (I think one of the confusions is that in the classical
method-of-moments world there's nothing that says you can't have 2
denominator degrees of freedom -- your power will be terrible, but
the expressions won't blow up on you [unless you get negative variance
estimates ...])
|
| m1 <- lmer(mort~rainfall + (year|pond), family = binomial, data=
| FieldData0305)

~   (1|year)+(1|pond) might work OK, and be slightly more
parsimonious (OR ponds nested within years, (1|year)+(1|pond:year), or
vice versa -- and if you're not dead set on treating the random effects
as "effects of year" and "effects of pond", but were willing to treat it
as "effects of pond within year" that would buy you the flexibility
to use some other package that wasn't so good at crossed
effects.)

| summary(m1)
|  From the summary output you can assess the influence of the fixed
| effect to see if the estimate for rainfall (slope estimate) is
| different to zero.
|
| Even if the number of groups (years and ponds) is small it is still
| better to take account of this group variation than ignore it (see
| Gelman and Hill 200&.

~  Yes, but ... what if all you can get out of the model is that
the estimate is nearly zero?  If you go Bayesian instead you can
deal with some problems by setting an informative prior [in theory
setting a proper prior, no matter how weak, would generally solve
the problem, but in reality I suspect that if the model is
overparameterized you're going to have nasty technical difficulties
even if the model is theoretically OK])
~  With respect to Andrew Gelman (I can't believe I'm saying this --
sacrilege) but I think he's used to really big social-science data sets,
where "leave stuff in when you're not sure" is more generally
a good idea than it is in typical field ecology data sets, where
the bias-variance tradeoff bites harder.  (At least there are 350
data points here,

|  From this model you can also get a feeling for the between year and
| between pond variation by looking at the random effects estimates and
| variances. Obviously you will have some idea if you just plot the
| means for each year and each pond.

~  Only if the variance estimates don't suck.

| If you are also interested in understanding how egg survival differs
| between ponds or between years and if this interacts with rainfall
| then it become less straight forward.
|
| I would suggest that you try
|
| m2 <- lmer(mort~rainfall * year * pond + (year|pond), family =
| binomial, data= FieldData0305)
| summary(m2)
|
| In the summary you will have estimates for all three factors and
| their interactions and you can ascertain if these are good
| explanatory variables for egg survival.

~   Isn't this overparameterized?  We have a fixed effect for
each year:pond combination (and variation in the slopes of
the effect with respect to rainfall), as well as a random-effect
level for years and ponds?

|
| You may be able to argue that the correlation between years or
| between ponds is negligible and therefore you can leave out the
| random effect and only use a GLM, but I would be inclined to avoid
| model simplification - but other may suggest otherwise.
|
| Jess
|

|
|>> My model and output are as follows:
|>>
|>>> LMER.1<-lmer(mort~Pond + (Pond|total_rainfall) + (1|Year),
|>> family=binomial, data= FieldData0305)
|>>
|>>> summary(LMER.1)
|>> Generalized linear mixed model fit using Laplace
|>> Formula: mort ~ Pond + (Pond | total_rainfall) + (1 | Year)
|>>    Data: FieldData0305
|>>  Family: binomial(logit link)
|>>   AIC  BIC logLik deviance
|>>  7657 7680  -3822     7645
|>> Random effects:
|>>  Groups         Name           Variance Std.Dev. Corr
|>>  total_rainfall (Intercept)    21.66535 4.65461
|>>                 Pond[T.Ocelot]  6.44297 2.53830  -0.627
|>>  Year           (Intercept)     0.74082 0.86071
|>> number of obs: 350, groups: total_rainfall, 48; Year, 3
|>>
|>> Estimated scale (compare to  1 )  4.603433
|>>
|>> Fixed effects:
|>>                Estimate Std. Error z value Pr(>|z|)
|>> (Intercept)     -0.4678     1.0173 -0.4598    0.646
|>> Pond[T.Ocelot]  -0.9831     0.9330 -1.0538    0.292
|>>
|>> Correlation of Fixed Effects:
|>>             (Intr)
|>> Pnd[T.Oclt] -0.648
|>>
|>> _______________________________________________
|>> R-sig-mixed-models at r-project.org mailing list
|>> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
|>>
|
| Jessica Stapley
| Department of Animal and Plant Sciences,
| Alfred Denny Building
| Sheffield, S10 2TN, UK
| ph +44 (0)114 222 0112; fx +44 (0)114 222 0002
| 	[[alternative HTML version deleted]]
|
| _______________________________________________
| R-sig-mixed-models at r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFILfQzc5UpGjwzenMRArz/AJ9D/bNPvQwicMp1n9KTxo4ZvXZ7iACfdznJ
F3NWHTkIVWVG5N2zxdgNnDk=
=rT/4
-----END PGP SIGNATURE-----