[R-sig-ME] Binomial GLMM vs GLM question

Ken Beath kjbeath at kagi.com
Sun May 18 10:24:28 CEST 2008


On 18/05/2008, at 5:45 AM, Justin Touchon wrote:

> Hi all,
> Thank you for all for you informative and helpful comments about my  
> dilemma. I've pasted below a summary table of egg survival at the  
> two ponds during the three years and the GLM output for a model  
> looking for interactions between pond, rainfall and year. Below that  
> is the output of a LMER model Ben suggested. Also, as Ben wondered  
> earlier, the data are highly overdispersed. To give a little  
> background, this wasn't an experiment but an observational study of  
> what was happening in nature. The two ponds were selected because  
> they are near my field site, no other reason. They differ in the  
> amount of canopy shade overhead, which in turn is clearly going to  
> affect egg desiccation (the primary source of mortality). Rainfall  
> did vary during the three years (it was much heavier in 2005 than in  
> 2003 or 2004). As a result, mortality was higher at Bridge Pond  
> (little shade) than at Ocelot Pond, and it varied between years.  
> These points are evident from the GLM anova table. All three  
> factors, significantly affect egg mortality, pond being the most  
> important. Rainfall varied across the three years, and there is a 3- 
> way interaction which indicates to me simply that the effects of  
> habitat (pond) and rainfall are not consistent across time. The  
> model estimates make sense (e.g. rainfall has a negative slope, the  
> estimate for Ocelot Pond is highly negative, etc.) In other words,  
> the GLM output does make biological sense given what we know about  
> the data and lends itself to fairly easy interpretation. Regarding  
> the LMER output, it seems the variance of both rain and pond is  
> obscenely huge. Is that a result of only have 2 ponds and 3 years?  
> The estimate for the rainfall effect is negative, like in the GLM,  
> which makes sense as well (as rain increase, mortality decreases).  
> Any opinions on where to go from here? Again, thank you all for you  
> helpful advice. I am very appreciative.
> -Justin
>

The model fit has got into problems, as Ben predicted. A really bad  
sign is the standard errors of the fixed effects are extreme. Also the  
lmer model doesn't allow for the slopes to vary, which from the glm  
they obviously do, which may be causing problems.

It may be possible to coax lmer into fitting the model by choosing  
starting values in roughly the right locations but it is probably  
optimistic.

I would strongly recommend checking whether a transform for rainfall  
is needed. If this isn't done correctly then individual observations  
may have excessive influence which may be causing the variation.

Ken

>
> Egg Survival
> Year Bridge Ocelot Total
> 2003 0.069 0.453 0.251
> 2004 0.115 0.429 0.272
> 2005 0.270 0.505 0.376
> Total 0.141 0.460 0.292
>
>
> > GLM.2 <- glm(tbl_mort ~ Pond *total_rainfall *Year ,  
> family=quasibinomial(logit), data=FieldData0305)
>
> > summary(GLM.2)
>
> Call:
> glm(formula = tbl_mort ~ Pond * total_rainfall * Year, family =  
> quasibinomial(logit),
> data = FieldData0305)
>
> Deviance Residuals:
> Min 1Q Median 3Q Max
> -17.965 -4.838 1.062 5.197 15.225
>
> Coefficients:
> Estimate Std. Error t value Pr(>|t|)
> (Intercept) 2.68318 0.47947 5.596 4.53e-08 ***
> Pond[T.Ocelot] -2.30872 0.53614 -4.306 2.18e-05 ***
> total_rainfall -0.11465 0.02333 -4.915 1.39e-06 ***
> Year[T.2004] -0.97218 0.63583 -1.529 0.127198
> Year[T.2005] -3.92557 0.76808 -5.111 5.38e-07 ***
> Pond[T.Ocelot]:total_rainfall 0.09799 0.02609 3.756 0.000204 ***
> Pond[T.Ocelot]:Year[T.2004] 0.48230 0.77189 0.625 0.532506
> Pond[T.Ocelot]:Year[T.2005] 2.15914 1.22853 1.757 0.079739 .
> total_rainfall:Year[T.2004] 0.08475 0.02717 3.120 0.001967 **
> total_rainfall:Year[T.2005] 0.12082 0.02440 4.952 1.16e-06 ***
> Pond[T.Ocelot]:total_rainfall:Year[T.2004] -0.05511 0.03585 -1.537  
> 0.125162
> Pond[T.Ocelot]:total_rainfall:Year[T.2005] -0.11371 0.02948 -3.857  
> 0.000137 ***
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> (Dispersion parameter for quasibinomial family taken to be 38.23861)
>
> Null deviance: 20762 on 349 degrees of freedom
> Residual deviance: 15252 on 338 degrees of freedom
> AIC: NA
>
> Number of Fisher Scoring iterations: 5
>
> > Anova(GLM.2, test="F")
> Anova Table (Type II tests)
>
> Response: tbl_mort
> SS Df F Pr(>F)
> Pond 1002.7 1 26.2229 5.119e-07 ***
> total_rainfall 238.2 1 6.2306 0.0130318 *
> Year 496.3 2 6.4894 0.0017157 **
> Pond:total_rainfall 101.7 1 2.6599 0.1038378
> Pond:Year 172.7 2 2.2587 0.1060579
> total_rainfall:Year 650.3 2 8.5037 0.0002493 ***
> Pond:total_rainfall:Year 704.8 2 9.2159 0.0001267 ***
> Residuals 12924.5 338
> ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
>
> > m1 <- lmer(tbl_mort~total_rainfall + (1|Year)+(1|Pond), family =  
> quasibinomial, data=FieldData0305)
>
> > summary(m1)
> Generalized linear mixed model fit using Laplace
> Formula: tbl_mort ~ total_rainfall + (1 | Year) + (1 | Pond)
> Data: FieldData0305
> Family: quasibinomial(logit link)
> AIC BIC logLik deviance
> 14657 14673 -7325 14649
> Random effects:
> Groups Name Variance Std.Dev.
> Year (Intercept) 2.0141e+15 44878595
> Pond (Intercept) 1.3427e+15 36643219
> Residual 8.8116e+16 296844003
> number of obs: 350, groups: Year, 3; Pond, 2
>
> Fixed effects:
> Estimate Std. Error t value
> (Intercept) 7.486e-01 5.736e+13 1.305e-14
> total_rainfall -2.316e-02 4.406e+11 -5.257e-14
>
> Correlation of Fixed Effects:
> (Intr)
> total_rnfll -0.956
>
>
>
>
>
>
>> On 17/05/2008, at 6:53 AM, Ben Bolker wrote:
>>
>> <snip>
>> >/
>> />/ | I think this is a good example of what seems a common problem  
>> for
>> />/ | people using mixed models - How to decide what are the  
>> random  />/ factors
>> />/ | and what are the fixed factors.
>> />/
>> />/ ~  I agree that this is difficult, and contentious (see e.g.
>> />/ Andrew Gelman, "Analysis of variance--why it is more important  
>> than
>> />/ ever," Annals of Statistics 33, no. 1 (2005): 1-53,
>> />/ doi:http://dx.doi.org/doi:10.1214/009053604000001048)
>> />/
>> /Available at
>> http://www.stat.columbia.edu/~gelman/research/published/AOS259.pdf <http://www.stat.columbia.edu/%7Egelman/research/published/AOS259.pdf 
>> >
>>
>> >/
>> />/ | In your study you have taken repeated measurements on the  
>> same  />/ ponds,
>> />/ | as such, observations from each pond across years are not  / 
>> >/ independent
>> />/ | (observations in pond 1 at t+1 will be correlated with  
>> observations
>> />/ | at t).  If you were only interested in the effect of rainfall  
>> on egg
>> />/ | survival I would suggest you use a mixed model with year and  
>> pond as
>> />/ | random effects (based on a similar eg in Crawley's R Book  
>> p605). In
>> />/ | this case I assume you have measurements on every pond in  
>> every year
>> />/ | and as such these random effects are crossed (not nested).
>> />/
>> />/ ~  (a) I would suggest that a mixed model is NOT going to work  
>> well
>> />/ in this case, even if year and pond are "philosophically"  
>> random  />/ effects
>> />/ (i.e., you don't really care about what happens in those specific
>> />/ ponds, and you may even have chosen them with a random-number  / 
>> >/ generator
>> />/ out of a list of all possible ponds -- although this is much less
>> />/ likely with years ...).  The technical problem is that estimating
>> />/ variances from 2 or 3 points is nasty.  This translates into
>> />/ inference/philosophical terms because these few points really
>> />/ don't give you the data to generalize about the population,  
>> even if
>> />/ you want to.  (I think one of the confusions is that in the  
>> classical
>> />/ method-of-moments world there's nothing that says you can't  
>> have 2
>> />/ denominator degrees of freedom -- your power will be terrible,  
>> but
>> />/ the expressions won't blow up on you [unless you get negative  
>> variance
>> />/ estimates ...])
>> />/ |
>> />/ | m1 <- lmer(mort~rainfall + (year|pond), family = binomial,  
>> data=
>> />/ | FieldData0305)
>> />/
>> />/ ~   (1|year)+(1|pond) might work OK, and be slightly more
>> />/ parsimonious (OR ponds nested within years, (1|year)+(1| 
>> pond:year), or
>> />/ vice versa -- and if you're not dead set on treating the  
>> random  />/ effects
>> />/ as "effects of year" and "effects of pond", but were willing  
>> to  />/ treat it
>> />/ as "effects of pond within year" that would buy you the  
>> flexibility
>> />/ to use some other package that wasn't so good at crossed
>> />/ effects.)
>> />/
>> />/
>> />/ | summary(m1)
>> />/ |  From the summary output you can assess the influence of the  
>> fixed
>> />/ | effect to see if the estimate for rainfall (slope estimate) is
>> />/ | different to zero.
>> />/ |
>> />/ | Even if the number of groups (years and ponds) is small it is  
>> still
>> />/ | better to take account of this group variation than ignore it  
>> (see
>> />/ | Gelman and Hill 200&.
>> />/
>> />/ ~  Yes, but ... what if all you can get out of the model is that
>> />/ the estimate is nearly zero?  If you go Bayesian instead you can
>> />/ deal with some problems by setting an informative prior [in  
>> theory
>> />/ setting a proper prior, no matter how weak, would generally solve
>> />/ the problem, but in reality I suspect that if the model is
>> />/ overparameterized you're going to have nasty technical  
>> difficulties
>> />/ even if the model is theoretically OK])
>> />/ ~  With respect to Andrew Gelman (I can't believe I'm saying  
>> this --
>> />/ sacrilege) but I think he's used to really big social-science  
>> data  />/ sets,
>> />/ where "leave stuff in when you're not sure" is more generally
>> />/ a good idea than it is in typical field ecology data sets, where
>> />/ the bias-variance tradeoff bites harder.  (At least there are 350
>> />/ data points here,
>> />/
>> />/ |  From this model you can also get a feeling for the between  
>> year and
>> />/ | between pond variation by looking at the random effects  
>> estimates  />/ and
>> />/ | variances. Obviously you will have some idea if you just plot  
>> the
>> />/ | means for each year and each pond.
>> />/
>> />/ ~  Only if the variance estimates don't suck.
>> />/
>> />/ | If you are also interested in understanding how egg survival  
>> differs
>> />/ | between ponds or between years and if this interacts with  
>> rainfall
>> />/ | then it become less straight forward.
>> />/ |
>> />/ | I would suggest that you try
>> />/ |
>> />/ | m2 <- lmer(mort~rainfall * year * pond + (year|pond), family =
>> />/ | binomial, data= FieldData0305)
>> />/ | summary(m2)
>> />/ |
>> />/ | In the summary you will have estimates for all three factors  
>> and
>> />/ | their interactions and you can ascertain if these are good
>> />/ | explanatory variables for egg survival.
>> />/
>> />/ ~   Isn't this overparameterized?  We have a fixed effect for
>> />/ each year:pond combination (and variation in the slopes of
>> />/ the effect with respect to rainfall), as well as a random-effect
>> />/ level for years and ponds?
>> />/
>> /
>> This does bring up the important point as to whether the effect of   
>> rainfall does vary within year and site. One advantage of the  
>> mixed  effects method for dealing with varying slopes is it gives a  
>> nice  population estimate. Unfortunately this is a lot of random  
>> effects for  not many groups.  It is looking very messy.
>>
>> I think with this sort of data, as in it has very few groups) the  
>> best  option is to start with the basics (probably always the best  
>> option).  Fit a separate model for each group and determine an  
>> appropriate  transform for rainfall. Do the models fit reasonably?  
>> Look at the  parameter estimates with 95% CI (plotting is a good  
>> idea) and see how  much variation there is, and there is nothing  
>> strange happening like  one unusual year-pond. Then decide whether  
>> the more complex models  will tell you anything more.
>>
>> Ken
>> >
>
> -- 
> Justin Touchon
> Doctoral Student
> Boston University
> Ecology, Behavior and Evolution
> http://people.bu.edu/jtouchon
>
>




More information about the R-sig-mixed-models mailing list