[R-sig-ME] Mixed-effects modelling on data from a highly unbalanced design
Andrews, Chris
chr|@@@ @end|ng |rom med@um|ch@edu
Wed Mar 27 13:36:30 CET 2019
Given the observed data,
hist(Birds$Arter)
with(ArterYears, interaction.plot(ordered(YearS), Site, Arter)))
I would just use a normal linear mixed model. Although the data are counts, they are not counts that would generate a Poisson distribution. And you'll not likely have convergence issues.
-----Original Message-----
From: Adriaan De Jong [mailto:Adriaan.de.Jong using slu.se]
Sent: Wednesday, March 27, 2019 5:50 AM
To: R-sig-mixed-models using r-project.org
Subject: [R-sig-ME] Mixed-effects modelling on data from a highly unbalanced design
Dear forum,
I'm struggling with a data set from 12 years of breeding bird monitoring on 13 sites that were affected by the construction of a new railway (there are control sites, too, but these fall outside the scope of this question). I have data from (a) before, (b) during construction, (c) "ready" (= railway ready but no traffic) and (d) traffic. Obviously, these four steps came in a fixed sequence. Because the railway was built in separate sections, the numbers of years of Before, Construction and Ready (coded in variable Status) vary among sites in terms of timing and duration, but all sites were monitored for the same three years while traffic was on. In short, the "design" is highly unbalanced.
At this stage, I'm looking at the possible effect of railway construction on avian biodiversity expressed as the total number of species observed per year per site (data below). I've used the following script to prepare the data:
ArterYears$ArterI<-as.integer(ArterYears$Arter) ## Arter = number of observed species
ArterYears$Site<-as.factor(ArterYears$Site)
ArterYears$NArea<-ArterYears$Areal/mean(ArterYears$Areal) ## Areal = area in hectares
ArterYears$NArea2<-ArterYears$NArea^2
ArterYears$YearS<-as.integer(ArterYears$Year - 2001) ## the study was performed 2002-2015
ArterYears$Status<-as.factor(ArterYears$Status)
ArterYears$response<-ArterYears$ArterI
(Some of these steps are not really necessary, but I've included them just to be sure. The same goes for the assignment of NA's to the model names below.)
The data for species number (range 28 - 83) are overdispersed for all sites combined, but not for the individual sites (rather underdispersed). To be on the safe side, I chose to use a negative binomial family anyway: nbinom2 in glmmTMB (I've tested glmm.nb as well with the same kind of problematic results).
>From biological reasoning, the following model is the one I'd prefer. It uses Status, adjusted Year and log("normalized" area) as fixed effects and Site as random effect. For the avian biodiversity data, this model works technically fine. Biologically speaking, the later Status classes did not differ significantly from Before (the intercept). So far so good.
glmmTMB31B<-NA
glmmTMB31B<-glmmTMB(response ~ Status + YearS + log(NArea) + (1|Site), data=ArterYears, family = nbinom2)
summary(glmmTMB31B)
AICc(glmmTMB31B)
Now I would like to evaluate this model, either against other models or with an R2 of some sort. Now problems pile up.
The "null" model (Site needs to be in the model because numbers really vary among sites, see: plot(ArterYears$ArterI ~ ArterYears$Site) )
glmmTMB0<-NA
glmmTMB0<-glmmTMB(response ~ 1 + (1|Site), data=ArterYears, family = nbinom2)
summary(glmmTMB0)
AICc(glmmTMB0)
generates the following warning messages:
1: In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
2: In fitTMB(TMBStruc) :
Model convergence problem; false convergence (8). See vignette('troubleshooting')
Most other models including combinations of YearS, NArea and Status generate these warning messages (in particular the second one). This puzzles me, but unfortunately, I'm not capable of interpreting the troubleshooting messages. If the model without fixed effects had worked fine and the complex model had failed, I would suspect overparameterization (N = 156), but now I have no clue.
When I apply r.squaredGLMM (MuMIn) to the complex model, warning messages come up even here, despite the fact that they were not generated for the model as such. How should I interpret this?
> r.squaredGLMM(glmmTMB31B)
R2m R2c
delta 1.112643e-18 2.247703e-18
lognormal 4.805332e-04 9.707482e-04
trigamma 6.896496e-35 1.393194e-34
Warning messages:
1: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
2: The null model is correct only if all variables used by the original model remain unchanged.
3: In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
4: In fitTMB(TMBStruc) :
Model convergence problem; false convergence (8). See vignette('troubleshooting')
I assume that I either make a fundamental mistake somewhere, or the data are "problematic". Could I solve all this by abandoning family = nbinom2 and go back to family = poisson (which worked fine technically)? Or test yet another package?
Thanks in advance for any suggestions.
Cheers,
Adriaan "Adjan" de Jong
Senior researcher
Swedish University of Agricultural Sciences
Data:
Site Year Status Areal Arter
Nyland 2002 Before 33.6 54
Nyland 2003 Construction 33.6 51
Nyland 2004 Construction 33.6 56
Nyland 2005 Construction 33.6 52
Nyland 2006 Ready 33.6 48
Nyland 2007 Ready 33.6 43
Nyland 2008 Ready 33.6 64
Nyland 2009 Ready 33.6 58
Nyland 2010 Ready 33.6 57
Nyland 2013 Traffic 33.6 50
Nyland 2014 Traffic 33.6 53
Nyland 2015 Traffic 33.6 61
Kornsjo 2002 Before 82.8 59
Kornsjo 2003 Before 82.8 66
Kornsjo 2004 Construction 82.8 62
Kornsjo 2005 Construction 82.8 58
Kornsjo 2006 Ready 82.8 60
Kornsjo 2007 Ready 82.8 68
Kornsjo 2008 Ready 82.8 70
Kornsjo 2009 Ready 82.8 59
Kornsjo 2010 Ready 82.8 62
Kornsjo 2013 Traffic 82.8 57
Kornsjo 2014 Traffic 82.8 58
Kornsjo 2015 Traffic 82.8 63
Stranne 2002 Construction 37.0 53
Stranne 2003 Construction 37.0 64
Stranne 2004 Ready 37.0 60
Stranne 2005 Ready 37.0 60
Stranne 2006 Ready 37.0 59
Stranne 2007 Ready 37.0 60
Stranne 2008 Ready 37.0 55
Stranne 2009 Ready 37.0 53
Stranne 2010 Ready 37.0 46
Stranne 2013 Traffic 37.0 58
Stranne 2014 Traffic 37.0 53
Stranne 2015 Traffic 37.0 56
Strandnyland 2002 Construction 69.2 43
Strandnyland 2003 Construction 69.2 44
Strandnyland 2004 Ready 69.2 52
Strandnyland 2005 Ready 69.2 56
Strandnyland 2006 Ready 69.2 46
Strandnyland 2007 Ready 69.2 52
Strandnyland 2008 Ready 69.2 48
Strandnyland 2009 Ready 69.2 43
Strandnyland 2010 Ready 69.2 55
Strandnyland 2013 Traffic 69.2 38
Strandnyland 2014 Traffic 69.2 41
Strandnyland 2015 Traffic 69.2 46
Hjalta 2002 Ready 91.1 61
Hjalta 2003 Ready 91.1 61
Hjalta 2004 Ready 91.1 62
Hjalta 2005 Ready 91.1 61
Hjalta 2006 Ready 91.1 66
Hjalta 2007 Ready 91.1 71
Hjalta 2008 Ready 91.1 73
Hjalta 2009 Ready 91.1 58
Hjalta 2010 Ready 91.1 64
Hjalta 2013 Traffic 91.1 67
Hjalta 2014 Traffic 91.1 71
Hjalta 2015 Traffic 91.1 67
Kasa 2002 Construction 98.3 65
Kasa 2003 Ready 98.3 59
Kasa 2004 Ready 98.3 59
Kasa 2005 Ready 98.3 66
Kasa 2006 Ready 98.3 52
Kasa 2007 Ready 98.3 53
Kasa 2008 Ready 98.3 56
Kasa 2009 Ready 98.3 53
Kasa 2010 Ready 98.3 51
Kasa 2013 Traffic 98.3 51
Kasa 2014 Traffic 98.3 54
Kasa 2015 Traffic 98.3 64
Ava 2002 Before 109.5 67
Ava 2003 Before 109.5 79
Ava 2004 Construction 109.5 70
Ava 2005 Construction 109.5 77
Ava 2006 Construction 109.5 76
Ava 2007 Ready 109.5 80
Ava 2008 Ready 109.5 77
Ava 2009 Ready 109.5 81
Ava 2010 Ready 109.5 83
Ava 2013 Traffic 109.5 76
Ava 2014 Traffic 109.5 81
Ava 2015 Traffic 109.5 78
Logdea 2002 Before 67.4 54
Logdea 2003 Before 67.4 53
Logdea 2004 Before 67.4 57
Logdea 2005 Construction 67.4 55
Logdea 2006 Construction 67.4 49
Logdea 2007 Ready 67.4 61
Logdea 2008 Ready 67.4 56
Logdea 2009 Ready 67.4 56
Logdea 2010 Ready 67.4 53
Logdea 2013 Traffic 67.4 55
Logdea 2014 Traffic 67.4 53
Logdea 2015 Traffic 67.4 51
Langed 2002 Before 34.2 49
Langed 2003 Before 34.2 56
Langed 2004 Before 34.2 51
Langed 2005 Before 34.2 52
Langed 2006 Construction 34.2 47
Langed 2007 Ready 34.2 56
Langed 2008 Ready 34.2 54
Langed 2009 Ready 34.2 51
Langed 2010 Ready 34.2 54
Langed 2013 Traffic 34.2 57
Langed 2014 Traffic 34.2 51
Langed 2015 Traffic 34.2 50
Hornea 2002 Before 48.9 48
Hornea 2003 Before 48.9 49
Hornea 2004 Before 48.9 41
Hornea 2005 Before 48.9 37
Hornea 2006 Construction 48.9 40
Hornea 2007 Construction 48.9 48
Hornea 2008 Ready 48.9 48
Hornea 2009 Ready 48.9 39
Hornea 2010 Ready 48.9 47
Hornea 2013 Traffic 48.9 47
Hornea 2014 Traffic 48.9 43
Hornea 2015 Traffic 48.9 47
Stocke 2002 Before 208.4 58
Stocke 2003 Before 208.4 69
Stocke 2004 Before 208.4 73
Stocke 2005 Before 208.4 67
Stocke 2006 Before 208.4 66
Stocke 2007 Construction 208.4 71
Stocke 2008 Construction 208.4 75
Stocke 2009 Construction 208.4 60
Stocke 2010 Ready 208.4 68
Stocke 2013 Traffic 208.4 73
Stocke 2014 Traffic 208.4 67
Stocke 2015 Traffic 208.4 75
StockeNE 2002 Before 33.9 34
StockeNE 2003 Before 33.9 36
StockeNE 2004 Before 33.9 32
StockeNE 2005 Before 33.9 28
StockeNE 2006 Before 33.9 35
StockeNE 2007 Construction 33.9 43
StockeNE 2008 Construction 33.9 41
StockeNE 2009 Construction 33.9 34
StockeNE 2010 Ready 33.9 34
StockeNE 2013 Traffic 33.9 33
StockeNE 2014 Traffic 33.9 32
StockeNE 2015 Traffic 33.9 37
Degernas 2002 Before 68.8 56
Degernas 2003 Before 68.8 56
Degernas 2004 Before 68.8 66
Degernas 2005 Before 68.8 64
Degernas 2006 Before 68.8 58
Degernas 2007 Before 68.8 64
Degernas 2008 Construction 68.8 62
Degernas 2009 Construction 68.8 61
Degernas 2010 Construction 68.8 68
Degernas 2013 Traffic 68.8 69
Degernas 2014 Traffic 68.8 57
Degernas 2015 Traffic 68.8 70
---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
[[alternative HTML version deleted]]
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
More information about the R-sig-mixed-models
mailing list