[R-sig-ME] Fwd: random effects heteroscedasticity glmm

Wed May 13 10:23:49 CEST 2015

Dear Lauren,

Note that you have only 8 observations per site. This might give the
impression of heteroscedasticity. See the examples below. Your model first
model seems reasonable. The second model is too complex for the data.

# 8 observations per category. Some replicates display "heteroscedasticity".
set.seed(123456)
dataset <- data.frame(
  Lambda = rep(c(10, 100), 8)
)
layout(matrix(1:16, nrow=4))
replicate(16, {
  dataset$Y <- rpois(nrow(dataset), lambda = dataset$Lambda)
  model <- glm(Y ~ 0 + factor(Lambda), data = dataset, family = "poisson")
  boxplot(residuals(model) ~ dataset$Lambda)
})

# 200 observations per category. Homoscedasticity.
set.seed(123456)
dataset <- data.frame(
  Lambda = rep(c(10, 100), 200)
)
layout(matrix(1:16, nrow=4))
replicate(16, {
  dataset$Y <- rpois(nrow(dataset), lambda = dataset$Lambda)
  model <- glm(Y ~ 0 + factor(Lambda), data = dataset, family = "poisson")
  boxplot(residuals(model) ~ dataset$Lambda)
})

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-05-13 8:04 GMT+02:00 Lauren Catherine Ponisio <lponisio op gmail.com>:

> Dear R-sig-mixed-models list members,
>
> I have a glmm with some pretty severe heteroscedasticity in the random
> effects and was hoping for some advice on how to model it, or alternative
> approaches.
>
> I have floral richness data from 18 sites over two years, four observations
> within each year at each site. I am interested in the effect of landscape
> diversity and fire severity. I have model with an interaction between
> landscape diversity and fire severity, and a fixed effect of year, day of
> the year, and a quadratic day of the year.  Also a random effect of site.
> The continuous variables are scaled. I assumed a poisson error distribution
> (I checked for overdispersion).
>
> This is my model:
>
> FloralRichness ~ s.simpson.div * SiteStatus + Year + s.doy +
>     I(s.doy^2) + (1 | Site)
>
> The data is available here:
> https://www.dropbox.com/s/efdcxz20l3dpwvl/alldata.csv?dl=0
>
> ## my code
> formula.flower <- as.formula(FloralRichness ~ s.simpson.div * SiteStatus +
> Year + s.doy +
>     I(s.doy^2) + (1 | Site))
>
> flower.mod <- glmer(formula.flower, family="poisson", data=dat.mods)
>
> ## Generalized linear mixed model fit by maximum likelihood (Laplace
> Approximation) [
> ## glmerMod]
> ##  Family: poisson  ( log )
> ## Formula: FloralRichness ~ s.simpson.div * SiteStatus + Year + s.doy +
> ##     I(s.doy^2) + (1 | Site)
> ##    Data: dat.mods
>
> ##      AIC      BIC   logLik deviance df.resid
> ##    688.7    718.4   -334.4    668.7      134
>
> ## Scaled residuals:
> ##      Min       1Q   Median       3Q      Max
> ## -2.77658 -0.39078 -0.02645  0.35389  1.83799
>
> ## Random effects:
> ##  Groups Name        Variance Std.Dev.
> ##  Site   (Intercept) 0.04093  0.2023
> ## Number of obs: 144, groups:  Site, 18
>
> ## Fixed effects:
> ##                              Estimate Std. Error z value Pr(>|z|)
> ## (Intercept)                   2.39930    0.11243  21.340  < 2e-16 ***
> ## s.simpson.div                 0.32838    0.11193   2.934 0.003347 **
> ## SiteStatusLOW                -0.36443    0.13975  -2.608 0.009115 **
> ## SiteStatusHIGH               -0.03427    0.14189  -0.242 0.809143
> ## Year2014                     -0.09582    0.06714  -1.427 0.153517
> ## s.doy                         0.05077    0.03658   1.388 0.165188
> ## I(s.doy^2)                   -0.09105    0.02764  -3.294 0.000986 ***
> ## s.simpson.div:SiteStatusLOW  -0.01073    0.15090  -0.071 0.943336
> ## s.simpson.div:SiteStatusHIGH -0.40605    0.14121  -2.876 0.004034 **
> ## ---
> ## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> ## residuals don't look great and also the site random effects are all
> ## over the place!
>
> layout(matrix(1:4, nrow=2))
> hist(residuals(flower.mod))
> boxplot(residuals(flower.mod) ~ Site,
>         data = dat.mods, main = "Site", ylab = "Residuals")
>
> plot(fitted(flower.mod), residuals(flower.mod),
>      xlab = "Fitted Values", ylab = "Residuals")
> abline(h=0, lty=2)
> lines(smooth.spline(fitted(flower.mod), residuals(flower.mod)))
>
> ## one thought was to create site-specific variances, so basically
> ## create a variable called observation for each sample at each site,
> ## and nest observation within site. I found a explanation of how to
> ## do this for a fixed effect here:
> ## http://rpubs.com/bbolker/6298
>
> ## and created this model that I think does the trick
>
> as.formula(FloralRichness ~ s.simpson.div * SiteStatus + SiteStatus * Year
> +
>     s.doy + I(s.doy^2) + (Site | obs))
>
> ## the model slowly runs, but with 18 sites and not that much data, perhaps
> ##  this is not the best approach.
>
> ## is modeling the heteroscedasticity of the random effects a fruitful
> ## path to follow? Or have I made a more fundamental misstep?
>
> Thanks,
> Lauren
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]