[R-sig-ME] Binomial GLMM vs GLM question

Sat May 17 09:39:46 CEST 2008

On 17/05/2008, at 6:53 AM, Ben Bolker wrote:

<snip>
>
> | I think this is a good example of what seems a common problem for
> | people using mixed models - How to decide what are the random  
> factors
> | and what are the fixed factors.
>
> ~  I agree that this is difficult, and contentious (see e.g.
> Andrew Gelman, "Analysis of variance--why it is more important than
> ever," Annals of Statistics 33, no. 1 (2005): 1-53,
> doi:http://dx.doi.org/doi:10.1214/009053604000001048)
>
Available at
http://www.stat.columbia.edu/~gelman/research/published/AOS259.pdf

>
> | In your study you have taken repeated measurements on the same  
> ponds,
> | as such, observations from each pond across years are not  
> independent
> | (observations in pond 1 at t+1 will be correlated with observations
> | at t).  If you were only interested in the effect of rainfall on egg
> | survival I would suggest you use a mixed model with year and pond as
> | random effects (based on a similar eg in Crawley's R Book p605). In
> | this case I assume you have measurements on every pond in every year
> | and as such these random effects are crossed (not nested).
>
> ~  (a) I would suggest that a mixed model is NOT going to work well
> in this case, even if year and pond are "philosophically" random  
> effects
> (i.e., you don't really care about what happens in those specific
> ponds, and you may even have chosen them with a random-number  
> generator
> out of a list of all possible ponds -- although this is much less
> likely with years ...).  The technical problem is that estimating
> variances from 2 or 3 points is nasty.  This translates into
> inference/philosophical terms because these few points really
> don't give you the data to generalize about the population, even if
> you want to.  (I think one of the confusions is that in the classical
> method-of-moments world there's nothing that says you can't have 2
> denominator degrees of freedom -- your power will be terrible, but
> the expressions won't blow up on you [unless you get negative variance
> estimates ...])
> |
> | m1 <- lmer(mort~rainfall + (year|pond), family = binomial, data=
> | FieldData0305)
>
> ~   (1|year)+(1|pond) might work OK, and be slightly more
> parsimonious (OR ponds nested within years, (1|year)+(1|pond:year), or
> vice versa -- and if you're not dead set on treating the random  
> effects
> as "effects of year" and "effects of pond", but were willing to  
> treat it
> as "effects of pond within year" that would buy you the flexibility
> to use some other package that wasn't so good at crossed
> effects.)
>
>
> | summary(m1)
> |  From the summary output you can assess the influence of the fixed
> | effect to see if the estimate for rainfall (slope estimate) is
> | different to zero.
> |
> | Even if the number of groups (years and ponds) is small it is still
> | better to take account of this group variation than ignore it (see
> | Gelman and Hill 200&.
>
> ~  Yes, but ... what if all you can get out of the model is that
> the estimate is nearly zero?  If you go Bayesian instead you can
> deal with some problems by setting an informative prior [in theory
> setting a proper prior, no matter how weak, would generally solve
> the problem, but in reality I suspect that if the model is
> overparameterized you're going to have nasty technical difficulties
> even if the model is theoretically OK])
> ~  With respect to Andrew Gelman (I can't believe I'm saying this --
> sacrilege) but I think he's used to really big social-science data  
> sets,
> where "leave stuff in when you're not sure" is more generally
> a good idea than it is in typical field ecology data sets, where
> the bias-variance tradeoff bites harder.  (At least there are 350
> data points here,
>
> |  From this model you can also get a feeling for the between year and
> | between pond variation by looking at the random effects estimates  
> and
> | variances. Obviously you will have some idea if you just plot the
> | means for each year and each pond.
>
> ~  Only if the variance estimates don't suck.
>
> | If you are also interested in understanding how egg survival differs
> | between ponds or between years and if this interacts with rainfall
> | then it become less straight forward.
> |
> | I would suggest that you try
> |
> | m2 <- lmer(mort~rainfall * year * pond + (year|pond), family =
> | binomial, data= FieldData0305)
> | summary(m2)
> |
> | In the summary you will have estimates for all three factors and
> | their interactions and you can ascertain if these are good
> | explanatory variables for egg survival.
>
> ~   Isn't this overparameterized?  We have a fixed effect for
> each year:pond combination (and variation in the slopes of
> the effect with respect to rainfall), as well as a random-effect
> level for years and ponds?
>

This does bring up the important point as to whether the effect of  
rainfall does vary within year and site. One advantage of the mixed  
effects method for dealing with varying slopes is it gives a nice  
population estimate. Unfortunately this is a lot of random effects for  
not many groups.  It is looking very messy.

I think with this sort of data, as in it has very few groups) the best  
option is to start with the basics (probably always the best option).  
Fit a separate model for each group and determine an appropriate  
transform for rainfall. Do the models fit reasonably? Look at the  
parameter estimates with 95% CI (plotting is a good idea) and see how  
much variation there is, and there is nothing strange happening like  
one unusual year-pond. Then decide whether the more complex models  
will tell you anything more.

Ken
>