[R-sig-ME] Logistic mixed model selection

Wed Sep 4 16:47:25 CEST 2024

Dear community,

I am currently evaluating the presence / absence of snails in the Southern part of Uganda. My study involves 56 water-contact sites, categorized into 4 types (lake, wetland, stream, spring). Each site has been sampled at least 20 times over the past 2 years and a half.

I built a logistic mixed-effect model with the following structure:

snail_pres ~ precipitation + temperature + Site.type * NDVI + elevation + flow_accumulation + slope_steepness + (1 | Watercontactsite)

For model selection, I generated all possible combinations of variables 2n-1 = 127 and ranked them by AIC. Based on the recommendation that models with an AIC difference of less than 2 are considered comparable, I selected all models within this interval. I then calculated the marginal and conditional R2  using the r.squaredGLMM() function from the MuMIn package. From this subset of models, I chose the one with the highest R2 values.

I have two datasets with the same variables, one sampled by citizen scientists and the other by an expert. My goal is to compare the best models for each dataset to determine if the same variables are significant.

Given the mixed opinions on the use of R� with mixed models, I want to ensure that my approach is correct. This paper [https://besjournals.onlinelibrary.wiley.com/doi/10.1111/j.2041-210x.2012.00261.x] suggests that using R� with binomial responses is acceptable, but I would appreciate any feedback or suggestions to refine my methodology.

Thank you so much for your suggestions!
Best regards,

Noelia

	[[alternative HTML version deleted]]