[R-sig-ME] Data structure issue for GLMM models

Anne Blach Overgaard anne.overgaard at bios.au.dk
Fri Nov 20 09:24:03 CET 2015


Dear Thierry,

Thank you very much  for your reply.
Below is the syntax of the base model with altitude included as fixed effect:

GLMM.b.nan <- glmer(occ_Betula_nana ~ isotherm + Trange + Tsumm + Psumm + ddeg +
                                sri + slope + mosses +
  bet.nan.bio + alt
                               (1|fsite) + (1|fplotgr) + (1|fplot),
  data=env.sd.ran, family="poisson")

The alternative is to include altitude both as fixed effect and as random effect due to the nested structure of the data

GLMM.b.nan <- glmer(occ_Betula_nana ~ alt + isotherm + Trange + Tsumm + Psumm + ddeg +
                                sri + slope + mosses +
  bet.nan.bio +
                               (1|fsite) + (1|falt) + (1|fplotgr) + (1|fplot),
  data=env.sd.ran, family="poisson")

The response: occ_Betula_nana, is count data i.e., the number of individuals registered in a pin point frame in each plot (ranges from 0-25)
The fixed effects are represented by climate predictors:  isotherm + Trange + Tsumm + Psumm + ddeg (isothermality, annual temperature range, summer temperature, summer precipitation and growing degree days) as well as sri (solar radiation index), slope, mosses (occurrence of mosses in the plots), bet.nan.bio ( a species-specific biotic-interaction variable we have computed), and finally alt (altitude m. asl).

The random effects are site (fsite), plot group (fplotgr) and plots (fplot) and perhaps altitude (falt), but this is where we would appreciate some help to decide whether altitude should be include as fixed effect only , as random effect only or alternatively be included both as random and fixed effects. All random effects have been named individually which is why I haven’t used the code  (1|fsite/fplotgr/falt), but they are nested.

Site is named: S1-S5
Alt is named according to site and altitude e.g., 1_20 (for site 1 at altitude 20) – NOTE: not all altitudes are present at each site!
Plot group is named according to site, altitude and which of the three repetitions it represents e.g., 1_1_20 (for site 1, plot group no 1 at altitude 20)
Plots are names (P1-P414)
In total we have 5 sites x a varying number of altitudes per site x 3 plot groups per altitude x 6 plots = 414 plots in the entire data set.

I hope this makes the issue a bit more clear

Best regards,

Anne

From: Thierry Onkelinx [mailto:thierry.onkelinx at inbo.be]
Sent: 19 November 2015 15:01
To: Anne Blach Overgaard
Subject: Re: [R-sig-ME] Data structure issue for GLMM models

Dear Anne,

Can you provide the syntax of your base model and/or those of the models you are thinking about? And add a clear specification to the variable names. That would make your question much more clear and easier to answer.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

2015-11-19 12:42 GMT+01:00 Anne Blach Overgaard <anne.overgaard at bios.au.dk<mailto:anne.overgaard at bios.au.dk>>:

Dear List,
I hope that some of you may be able to help us with a data structure issue.
We have collected plant cover data (count data) for selected species along a climatic gradient in random stratified sampling plots. The hierarchical structure of the data is as follows:
We have sampled at five sites placed along a large-scale climatic gradient. Within each of the five sites we placed three plot groups 500 meters apart on each of the altitudes 20 m 100, 200, 300, 400 and 500 m above sea level, whenever possible, as not all isoclines were present at each site. Each plot group consisted of six plots that were placed 10 meters apart.
In total we have 5 sites x a varying number of altitudes per site x 3 plot groups per altitude x 6 plots = 414 plots in the entire data set.
Overall we would like to assess the relative importance of different predictor groups (altitude, climate, and biotic interactions) on the variation in cover per species. We are including the predictor groups as fixed effects in our models using lme4::glmer (family = poisson). We include site and plot group as nested random effects and plots as an observation-level random factor due to overdispersion in the data.
Our question is whether altitude should be entered as a random factor, as a fixed effect, or possibly as both a fixed and a random effect. Altitude is a part of the nested structure of the data, but we also have an interest in including it as a fixed effect to assess how much of the variation in the data is due to altitude.
We hope that some of you can guide us how to deal with altitude in this data analysis.
Thanking you in advance.
Best regards,
Anne & co-workers





        [[alternative HTML version deleted]]

_______________________________________________
R-sig-mixed-models at r-project.org<mailto:R-sig-mixed-models at r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models


	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list