[R-sig-ME] Statistical consultation GLMM

Mon Aug 2 16:03:19 CEST 2021

On 7/22/21 7:45 PM, Estefania Isabel Muñoz Salas wrote:
> Hi,
> 
>   My name is Estefanía; I am doing a master's degree in Marine Ecology. As
> part of the project, we are dealing with shorebird count data, which have
> been taken along the coast of California and northwestern Mexico. The
> surveys are conducted under a standardized monitoring protocol. Sampling
> units have been established at each of the sites, polygons with different
> sizes, which vary from site to site. The birds present in each unit have
> been counted year after year from 2011 to 2019 one time in winter. In
> addition to the above, the count data in this case, given the nature of the
> birds to congregate, make that many units have zeros, and some units have
> abundances of 1000 birds or more, making the data do not approximate to a
> normal distribution. Therefore, to treat these data, we use Generalized
> Linear Mixed Models (GLMM) to contemplate the variability in bird abundance
> from site to site and from the sampling unit to the sampling unit.
> The objective of my work is to know the population trend of three species
> of shorebirds (analyzed separately), and if there is a relationship with
> environmental variables such as average temperature, minimum, and maximum
> temperature, and precipitation; and if there is a difference between
> regions, in this case, were grouped sites in California, those of the Baja
> California peninsula and another region of northwestern Mexico, that we
> called Continental.
> Initially, I tested which distribution family fit the data by testing a
> Poisson, Poisson zero-inflated, and negative binomial and negative binomial
> zero-inflated distribution, which are the most common for count data. The
> distribution that obtained the lowest AIC was the negative binomial
> zero-inflated.
> Knowing that there could be a correlation between the predictor variables,
> I calculated their correlations and for the time we defined that since the
> correlation between the years and the environmental variables was low <.30,
> a single model would be made, in which the year, we also decided that the
> size of each of the sampling units (logarithm of the hectares) would be
> included since it is different in each unit, and we want to take that into
> account. The region would also be considered as a factor with 3 levels.
> Still, the temperature variables did present high correlations, but are the
> variables we are interested in so, this is where I have several doubts
> because my formation is not statistical
> 1.-Should I not include environmental variables in a single model because
> they are correlated,  although they are of interest?
> 2.-If what I am doing is right or not?
> 3.-How do I know if I have made a good fit of the data to the model? How do
> I test it?
> 4.-How do I select the best model?
> 5.-What assumptions should I test?
> 7.- Am I missing something obvious?

   These aren't really GLMM-specific questions.

   Opinions differ about correlations; my personal opinion is that it is 
rarely a good idea to exclude highly correlated predictors from a 
regression (see refs below).

    I would recommend the DHARMa package (and its extensive, 
high-quality vignettes) for assessing issues with the fits.

   I would not recommend selecting a best model with a reduced set of 
predictors - I would use the full model - but AIC is fine.

Dormann, Carsten F., Jane Elith, Sven Bacher, Carsten Buchmann, Gudrun 
Carl, Gabriel Carré, Jaime R. García Marquéz, et al. “Collinearity: A 
Review of Methods to Deal with It and a Simulation Study Evaluating 
Their Performance.” Ecography, 2012, no-no. 
https://doi.org/10.1111/j.1600-0587.2012.07348.x.

Graham, Michael H. “Confronting Multicollinearity in Ecological Multiple 
Regression.” Ecology 84, no. 11 (2003): 2809–15. 
https://doi.org/10.1890/02-3114.

Morrissey, Michael B. ; Ruxton, and Graeme D. Ruxton. “Multiple 
Regression Is Not Multiple Regressions: The Meaning of Multiple 
Regression and the Non-Problem of Collinearity.” Philosophy, Theory, and 
Practice in Biology 10 (2018). 
http://dx.doi.org/10.3998/ptpbio.16039257.0010.003.

Vanhove, Jan. “Collinearity Isn’t a Disease That Needs Curing.” 
PsyArXiv, May 12, 2020. https://doi.org/10.31234/osf.io/mv2wx.

> 
> All the above I have done with the glmmTMB package in Rstudio.
> Thank you very much and sorry in advance if these are very basic questions.
> 
> The fit I try so far is this:
> m2znb.all<-glmmTMB(total~ logha + YearCollected + Geopolitical + tmp + tmn
> + tmx + pre + (1|Site/Plot), ziformula = ~1, data = mc2, family="nbinom2")
> where:
> total is the abundance of a species of shorebird
> logha the size of the unit (logarithmic of the hectare)
> YearCollected
> Geopolitical is the region
> tmp is the mean temperature
> tmn is the minimum temperature
> tmx is the maximum temperature
> pre is the precipitation
> 
> It would be possible to share the data
> 
> Regards, Estefanía.
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-mixed-models using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 

-- 
Dr. Benjamin Bolker
Professor, Mathematics & Statistics and Biology, McMaster University
Director, School of Computational Science and Engineering
Graduate chair, Mathematics & Statistics