[R-sig-ME] Mixed-effects modelling on data from a highly unbalanced design
Adriaan De Jong
Adr|@@n@de@Jong @end|ng |rom @|u@@e
Wed Mar 27 13:50:58 CET 2019
Dear Chris,
Thanks for your suggestion. That would be a lovely easy-way-out-of-my-misery :)
But for a non-professional statistician, how far could I go here? Would the data set below still qualify for "normal" lme's? In case of the Key's?, The Support's? The Vv's? The Mf's? The Sr's?
Is there a "rule" that I can use or a paper I can refer to?
Cheers,
Adjan
Data:
SiteSitenrYearStatusArealVvMfSrKeySupport
Nyl2012002Before33.620020
Nyl2012003Construction33.600000
Nyl2012004Construction33.600000
Nyl2012005Construction33.600000
Nyl2012006Ready33.600101
Nyl2012007Ready33.600000
Nyl2012008Ready33.620020
Nyl2012009Ready33.620121
Nyl2012010Ready33.630030
Nyl2012013Traffic33.610313
Nyl2012014Traffic33.610313
Nyl2012015Traffic33.600505
Korn2022002Before82.81511152
Korn2022003Before82.81400140
Korn2022004Construction82.81901191
Korn2022005Construction82.81311132
Korn2022006Ready82.81610161
Korn2022007Ready82.81413144
Korn2022008Ready82.81710171
Korn2022009Ready82.81001101
Korn2022010Ready82.81301131
Korn2022013Traffic82.81201121
Korn2022014Traffic82.81201121
Korn2022015Traffic82.81001101
Str2032002Construction37.000202
Str2032003Construction37.000202
Str2032004Ready37.000404
Str2032005Ready37.000202
Str2032006Ready37.000101
Str2032007Ready37.000101
Str2032008Ready37.000202
Str2032009Ready37.000202
Str2032010Ready37.000000
Str2032013Traffic37.000202
Str2032014Traffic37.000101
Str2032015Traffic37.010010
StrN2042002Construction69.231233
StrN2042003Construction69.242244
StrN2042004Ready69.273376
StrN2042005Ready69.234337
StrN2042006Ready69.274276
StrN2042007Ready69.2664610
StrN2042008Ready69.234438
StrN2042009Ready69.242345
StrN2042010Ready69.223225
StrN2042013Traffic69.213619
StrN2042014Traffic69.232436
StrN2042015Traffic69.230333
Hjalta2052002Ready91.122123
Hjalta2052003Ready91.112315
Hjalta2052004Ready91.113518
Hjalta2052005Ready91.111314
Hjalta2052006Ready91.121425
Hjalta2052007Ready91.140545
Hjalta2052008Ready91.120323
Hjalta2052009Ready91.130131
Hjalta2052010Ready91.110212
Hjalta2052013Traffic91.110111
Hjalta2052014Traffic91.130333
Hjalta2052015Traffic91.130232
Kasa2062002Construction98.357158
Kasa2062003Ready98.39131914
Kasa2062004Ready98.357057
Kasa2062005Ready98.310851013
Kasa2062006Ready98.376177
Kasa2062007Ready98.396298
Kasa2062008Ready98.31352137
Kasa2062009Ready98.393295
Kasa2062010Ready98.31011102
Kasa2062013Traffic98.31102112
Kasa2062014Traffic98.380383
Kasa2062015Traffic98.31102112
Ava2072002Before109.574276
Ava2072003Before109.553457
Ava2072004Construction109.564569
Ava2072005Construction109.5466412
Ava2072006Construction109.5264210
Ava2072007Ready109.5285213
Ava2072008Ready109.5156111
Ava2072009Ready109.5347311
Ava2072010Ready109.5565511
Ava2072013Traffic109.5537510
Ava2072014Traffic109.554559
Ava2072015Traffic109.552658
Logd2082002Before67.421324
Logd2082003Before67.401203
Logd2082004Before67.402406
Logd2082005Construction67.404408
Logd2082006Construction67.403407
Logd2082007Ready67.412416
Logd2082008Ready67.402406
Logd2082009Ready67.403609
Logd2082010Ready67.4146110
Logd2082013Traffic67.402507
Logd2082014Traffic67.400606
Logd2082015Traffic67.400404
Lang2092002Before34.200000
Lang2092003Before34.200000
Lang2092004Before34.200101
Lang2092005Before34.200101
Lang2092006Construction34.210010
Lang2092007Ready34.200000
Lang2092008Ready34.200000
Lang2092009Ready34.200000
Lang2092010Ready34.200000
Lang2092013Traffic34.200000
Lang2092014Traffic34.200000
Lang2092015Traffic34.210010
Hornea2102002Before48.920222
Hornea2102003Before48.930333
Hornea2102004Before48.910515
Hornea2102005Before48.900404
Hornea2102006Construction48.900303
Hornea2102007Construction48.900404
Hornea2102008Ready48.900303
Hornea2102009Ready48.900202
Hornea2102010Ready48.900101
Hornea2102013Traffic48.900101
Hornea2102014Traffic48.900202
Hornea2102015Traffic48.900000
Stocke2112002Before208.460262
Stocke2112003Before208.480484
Stocke2112004Before208.450353
Stocke2112005Before208.490494
Stocke2112006Before208.41404144
Stocke2112007Construction208.41605165
Stocke2112008Construction208.41001101
Stocke2112009Construction208.470373
Stocke2112010Ready208.41502152
Stocke2112013Traffic208.41302132
Stocke2112014Traffic208.460060
Stocke2112015Traffic208.450353
StNE2122002Before33.900000
StNE2122003Before33.900000
StNE2122004Before33.900000
StNE2122005Before33.900000
StNE2122006Before33.900101
StNE2122007Construction33.910010
StNE2122008Construction33.900000
StNE2122009Construction33.900000
StNE2122010Ready33.900000
StNE2122013Traffic33.900000
StNE2122014Traffic33.900000
StNE2122015Traffic33.900000
Deg2132002Before68.860161
Deg2132003Before68.850050
Deg2132004Before68.81201121
Deg2132005Before68.890292
Deg2132006Before68.81102112
Deg2132007Before68.81301131
Deg2132008Construction68.81022104
Deg2132009Construction68.81520152
Deg2132010Construction68.81022104
Deg2132013Traffic68.81152117
Deg2132014Traffic68.81145119
Deg2132015Traffic68.885388
-----Original Message-----
From: Andrews, Chris <chrisaa using med.umich.edu>
Sent: den 27 mars 2019 13:37
To: Adriaan De Jong <Adriaan.de.Jong using slu.se>; R-sig-mixed-models using r-project.org
Subject: RE: [R-sig-ME] Mixed-effects modelling on data from a highly unbalanced design
Given the observed data,
hist(Birds$Arter)
with(ArterYears, interaction.plot(ordered(YearS), Site, Arter)))
I would just use a normal linear mixed model. Although the data are counts, they are not counts that would generate a Poisson distribution. And you'll not likely have convergence issues.
-----Original Message-----
From: Adriaan De Jong [mailto:Adriaan.de.Jong using slu.se]
Sent: Wednesday, March 27, 2019 5:50 AM
To: R-sig-mixed-models using r-project.org
Subject: [R-sig-ME] Mixed-effects modelling on data from a highly unbalanced design
Dear forum,
I'm struggling with a data set from 12 years of breeding bird monitoring on 13 sites that were affected by the construction of a new railway (there are control sites, too, but these fall outside the scope of this question). I have data from (a) before, (b) during construction, (c) "ready" (= railway ready but no traffic) and (d) traffic. Obviously, these four steps came in a fixed sequence. Because the railway was built in separate sections, the numbers of years of Before, Construction and Ready (coded in variable Status) vary among sites in terms of timing and duration, but all sites were monitored for the same three years while traffic was on. In short, the "design" is highly unbalanced.
At this stage, I'm looking at the possible effect of railway construction on avian biodiversity expressed as the total number of species observed per year per site (data below). I've used the following script to prepare the data:
ArterYears$ArterI<-as.integer(ArterYears$Arter) ## Arter = number of observed species
ArterYears$Site<-as.factor(ArterYears$Site)
ArterYears$NArea<-ArterYears$Areal/mean(ArterYears$Areal) ## Areal = area in hectares
ArterYears$NArea2<-ArterYears$NArea^2
ArterYears$YearS<-as.integer(ArterYears$Year - 2001) ## the study was performed 2002-2015
ArterYears$Status<-as.factor(ArterYears$Status)
ArterYears$response<-ArterYears$ArterI
(Some of these steps are not really necessary, but I've included them just to be sure. The same goes for the assignment of NA's to the model names below.)
The data for species number (range 28 - 83) are overdispersed for all sites combined, but not for the individual sites (rather underdispersed). To be on the safe side, I chose to use a negative binomial family anyway: nbinom2 in glmmTMB (I've tested glmm.nb as well with the same kind of problematic results).
>From biological reasoning, the following model is the one I'd prefer. It uses Status, adjusted Year and log("normalized" area) as fixed effects and Site as random effect. For the avian biodiversity data, this model works technically fine. Biologically speaking, the later Status classes did not differ significantly from Before (the intercept). So far so good.
glmmTMB31B<-NA
glmmTMB31B<-glmmTMB(response ~ Status + YearS + log(NArea) + (1|Site), data=ArterYears, family = nbinom2)
summary(glmmTMB31B)
AICc(glmmTMB31B)
Now I would like to evaluate this model, either against other models or with an R2 of some sort. Now problems pile up.
The "null" model (Site needs to be in the model because numbers really vary among sites, see: plot(ArterYears$ArterI ~ ArterYears$Site) )
glmmTMB0<-NA
glmmTMB0<-glmmTMB(response ~ 1 + (1|Site), data=ArterYears, family = nbinom2)
summary(glmmTMB0)
AICc(glmmTMB0)
generates the following warning messages:
1: In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
2: In fitTMB(TMBStruc) :
Model convergence problem; false convergence (8). See vignette('troubleshooting')
Most other models including combinations of YearS, NArea and Status generate these warning messages (in particular the second one). This puzzles me, but unfortunately, I'm not capable of interpreting the troubleshooting messages. If the model without fixed effects had worked fine and the complex model had failed, I would suspect overparameterization (N = 156), but now I have no clue.
When I apply r.squaredGLMM (MuMIn) to the complex model, warning messages come up even here, despite the fact that they were not generated for the model as such. How should I interpret this?
> r.squaredGLMM(glmmTMB31B)
R2m R2c
delta 1.112643e-18 2.247703e-18
lognormal 4.805332e-04 9.707482e-04
trigamma 6.896496e-35 1.393194e-34
Warning messages:
1: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
2: The null model is correct only if all variables used by the original model remain unchanged.
3: In fitTMB(TMBStruc) :
Model convergence problem; non-positive-definite Hessian matrix. See vignette('troubleshooting')
4: In fitTMB(TMBStruc) :
Model convergence problem; false convergence (8). See vignette('troubleshooting')
I assume that I either make a fundamental mistake somewhere, or the data are "problematic". Could I solve all this by abandoning family = nbinom2 and go back to family = poisson (which worked fine technically)? Or test yet another package?
Thanks in advance for any suggestions.
Cheers,
Adriaan "Adjan" de Jong
Senior researcher
Swedish University of Agricultural Sciences
Data:
Site Year Status Areal Arter
Nyland 2002 Before 33.6 54
Nyland 2003 Construction 33.6 51
Nyland 2004 Construction 33.6 56
Nyland 2005 Construction 33.6 52
Nyland 2006 Ready 33.6 48
Nyland 2007 Ready 33.6 43
Nyland 2008 Ready 33.6 64
Nyland 2009 Ready 33.6 58
Nyland 2010 Ready 33.6 57
Nyland 2013 Traffic 33.6 50
Nyland 2014 Traffic 33.6 53
Nyland 2015 Traffic 33.6 61
Kornsjo 2002 Before 82.8 59
Kornsjo 2003 Before 82.8 66
Kornsjo 2004 Construction 82.8 62
Kornsjo 2005 Construction 82.8 58
Kornsjo 2006 Ready 82.8 60
Kornsjo 2007 Ready 82.8 68
Kornsjo 2008 Ready 82.8 70
Kornsjo 2009 Ready 82.8 59
Kornsjo 2010 Ready 82.8 62
Kornsjo 2013 Traffic 82.8 57
Kornsjo 2014 Traffic 82.8 58
Kornsjo 2015 Traffic 82.8 63
Stranne 2002 Construction 37.0 53
Stranne 2003 Construction 37.0 64
Stranne 2004 Ready 37.0 60
Stranne 2005 Ready 37.0 60
Stranne 2006 Ready 37.0 59
Stranne 2007 Ready 37.0 60
Stranne 2008 Ready 37.0 55
Stranne 2009 Ready 37.0 53
Stranne 2010 Ready 37.0 46
Stranne 2013 Traffic 37.0 58
Stranne 2014 Traffic 37.0 53
Stranne 2015 Traffic 37.0 56
Strandnyland 2002 Construction 69.2 43
Strandnyland 2003 Construction 69.2 44
Strandnyland 2004 Ready 69.2 52
Strandnyland 2005 Ready 69.2 56
Strandnyland 2006 Ready 69.2 46
Strandnyland 2007 Ready 69.2 52
Strandnyland 2008 Ready 69.2 48
Strandnyland 2009 Ready 69.2 43
Strandnyland 2010 Ready 69.2 55
Strandnyland 2013 Traffic 69.2 38
Strandnyland 2014 Traffic 69.2 41
Strandnyland 2015 Traffic 69.2 46
Hjalta 2002 Ready 91.1 61
Hjalta 2003 Ready 91.1 61
Hjalta 2004 Ready 91.1 62
Hjalta 2005 Ready 91.1 61
Hjalta 2006 Ready 91.1 66
Hjalta 2007 Ready 91.1 71
Hjalta 2008 Ready 91.1 73
Hjalta 2009 Ready 91.1 58
Hjalta 2010 Ready 91.1 64
Hjalta 2013 Traffic 91.1 67
Hjalta 2014 Traffic 91.1 71
Hjalta 2015 Traffic 91.1 67
Kasa 2002 Construction 98.3 65
Kasa 2003 Ready 98.3 59
Kasa 2004 Ready 98.3 59
Kasa 2005 Ready 98.3 66
Kasa 2006 Ready 98.3 52
Kasa 2007 Ready 98.3 53
Kasa 2008 Ready 98.3 56
Kasa 2009 Ready 98.3 53
Kasa 2010 Ready 98.3 51
Kasa 2013 Traffic 98.3 51
Kasa 2014 Traffic 98.3 54
Kasa 2015 Traffic 98.3 64
Ava 2002 Before 109.5 67
Ava 2003 Before 109.5 79
Ava 2004 Construction 109.5 70
Ava 2005 Construction 109.5 77
Ava 2006 Construction 109.5 76
Ava 2007 Ready 109.5 80
Ava 2008 Ready 109.5 77
Ava 2009 Ready 109.5 81
Ava 2010 Ready 109.5 83
Ava 2013 Traffic 109.5 76
Ava 2014 Traffic 109.5 81
Ava 2015 Traffic 109.5 78
Logdea 2002 Before 67.4 54
Logdea 2003 Before 67.4 53
Logdea 2004 Before 67.4 57
Logdea 2005 Construction 67.4 55
Logdea 2006 Construction 67.4 49
Logdea 2007 Ready 67.4 61
Logdea 2008 Ready 67.4 56
Logdea 2009 Ready 67.4 56
Logdea 2010 Ready 67.4 53
Logdea 2013 Traffic 67.4 55
Logdea 2014 Traffic 67.4 53
Logdea 2015 Traffic 67.4 51
Langed 2002 Before 34.2 49
Langed 2003 Before 34.2 56
Langed 2004 Before 34.2 51
Langed 2005 Before 34.2 52
Langed 2006 Construction 34.2 47
Langed 2007 Ready 34.2 56
Langed 2008 Ready 34.2 54
Langed 2009 Ready 34.2 51
Langed 2010 Ready 34.2 54
Langed 2013 Traffic 34.2 57
Langed 2014 Traffic 34.2 51
Langed 2015 Traffic 34.2 50
Hornea 2002 Before 48.9 48
Hornea 2003 Before 48.9 49
Hornea 2004 Before 48.9 41
Hornea 2005 Before 48.9 37
Hornea 2006 Construction 48.9 40
Hornea 2007 Construction 48.9 48
Hornea 2008 Ready 48.9 48
Hornea 2009 Ready 48.9 39
Hornea 2010 Ready 48.9 47
Hornea 2013 Traffic 48.9 47
Hornea 2014 Traffic 48.9 43
Hornea 2015 Traffic 48.9 47
Stocke 2002 Before 208.4 58
Stocke 2003 Before 208.4 69
Stocke 2004 Before 208.4 73
Stocke 2005 Before 208.4 67
Stocke 2006 Before 208.4 66
Stocke 2007 Construction 208.4 71
Stocke 2008 Construction 208.4 75
Stocke 2009 Construction 208.4 60
Stocke 2010 Ready 208.4 68
Stocke 2013 Traffic 208.4 73
Stocke 2014 Traffic 208.4 67
Stocke 2015 Traffic 208.4 75
StockeNE 2002 Before 33.9 34
StockeNE 2003 Before 33.9 36
StockeNE 2004 Before 33.9 32
StockeNE 2005 Before 33.9 28
StockeNE 2006 Before 33.9 35
StockeNE 2007 Construction 33.9 43
StockeNE 2008 Construction 33.9 41
StockeNE 2009 Construction 33.9 34
StockeNE 2010 Ready 33.9 34
StockeNE 2013 Traffic 33.9 33
StockeNE 2014 Traffic 33.9 32
StockeNE 2015 Traffic 33.9 37
Degernas 2002 Before 68.8 56
Degernas 2003 Before 68.8 56
Degernas 2004 Before 68.8 66
Degernas 2005 Before 68.8 64
Degernas 2006 Before 68.8 58
Degernas 2007 Before 68.8 64
Degernas 2008 Construction 68.8 62
Degernas 2009 Construction 68.8 61
Degernas 2010 Construction 68.8 68
Degernas 2013 Traffic 68.8 69
Degernas 2014 Traffic 68.8 57
Degernas 2015 Traffic 68.8 70
---
N?r du skickar e-post till SLU s? inneb?r detta att SLU behandlar dina personuppgifter. F?r att l?sa mer om hur detta g?r till, klicka h?r <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
[[alternative HTML version deleted]]
**********************************************************
Electronic Mail is not secure, may not be read every day, and should not be used for urgent or sensitive issues
---
När du skickar e-post till SLU så innebär detta att SLU behandlar dina personuppgifter. För att läsa mer om hur detta går till, klicka här <https://www.slu.se/om-slu/kontakta-slu/personuppgifter/>
E-mailing SLU will result in SLU processing your personal data. For more information on how this is done, click here <https://www.slu.se/en/about-slu/contact-slu/personal-data/>
More information about the R-sig-mixed-models
mailing list