[R-sig-ME] Mixed model specification (control for location and repeated sampling of same location through time)

Tue Nov 8 16:23:43 CET 2022

Hi Norman,

The minimum number of blocks/groups required to support a random effect is discussed in Ben Bolker's GLMM FAQ wiki:

https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#should-i-treat-factor-xxx-as-fixed-or-random

"One point of particular relevance to ‘modern’ mixed model estimation (rather than ‘classical’ method-of-moments estimation) is that, for practical purposes, there must be a reasonable number of random-effects levels (e.g. blocks) – more than 5 or 6 at a minimum."

Best wishes,
Paul

Paul Johnson
Senior Lecturer
School of Biodiversity, One Health and Veterinary Medicine
University of Glasgow
Room 362, Wolfson Link Building
Glasgow G12 8QQ
+44 (0)7814 668 613
paul.johnson using glasgow.ac.uk
https://www.gla.ac.uk/schools/bohvm/staff/pauljohnson/
https://orcid.org/0000-0001-6663-7520

On 08/11/2022, 15:15, "R-sig-mixed-models on behalf of Norman DAURELLE via R-sig-mixed-models" <r-sig-mixed-models-bounces using r-project.org on behalf of r-sig-mixed-models using r-project.org> wrote:

    Dear list members, Brian, Thierry, 

    I am not an expert, but I don't see why the number of sites would be a barrier to introducing it as a random effect. 

    Would you care to explain the reasoning behind that statement ? 

    To me, the Y ~ X1 + X2 + X3 + (1 | Site) part seems appropriate (I don't know about how to use the different dates, though). 

    Sorry if this is not helpful, Brian. 

    Cheers, 

    Norman 

    De: "Thierry Onkelinx via R-sig-mixed-models" <r-sig-mixed-models using r-project.org> 
    �: "Brian Gill" <briangillphd using gmail.com> 
    Cc: r-sig-mixed-models using r-project.org 
    Envoy�: Jeudi 3 Novembre 2022 14:45:01 
    Objet: Re: [R-sig-ME] Mixed model specification (control for location and repeated sampling of same location through time) 

    Dear Brian, 

    You have only 3 sites. That is too few to use as a random effect. 

    Look into glmmTMB and INLA. They provide correlated random effects. Which 
    is relevant for your Date variable. 

    The glmmTMB formula might look like this: Y ~ Site + X1 + X2 + X3 + 
    ar1(Date | Site) 
    The INLA formula: Y ~ Site + X1 + X2 + X3 + f(Date, model = "rw1", 
    replicate = as.integer(Site)) 

    Best regards, 

    ir. Thierry Onkelinx 
    Statisticus / Statistician 

    Vlaamse Overheid / Government of Flanders 
    INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND 
    FOREST 
    Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance 
    thierry.onkelinx using inbo.be 
    Havenlaan 88 bus 73, 1000 Brussel 
    www.inbo.be 

    /////////////////////////////////////////////////////////////////////////////////////////// 
    To call in the statistician after the experiment is done may be no more 
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher 
    The plural of anecdote is not data. ~ Roger Brinner 
    The combination of some data and an aching desire for an answer does not 
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey 
    /////////////////////////////////////////////////////////////////////////////////////////// 

    <https://www.inbo.be> 

    Op ma 31 okt. 2022 om 18:55 schreef Brian Gill <briangillphd using gmail.com>: 

    > I have three locations (Sites) where I repeatedly measured a number of 
    > environmental variables (X1, X2, X3) and a response (Y; normally 
    > distributed) over time. That is, I have data on each environmental variable 
    > and the response at many time points for each of 3 sites. For each 
    > timepoints all three sites were sampled. 
    > 
    > I want to model the response (Y) as a function of the environmental 
    > variables (X1, X2, X3) while controlling for effects of Sites and Time. I
    > expect responses from the same site to be similar because they come from
    > the same location and responses measured at closer timepoints to be more
    > similar than those separated by more time. 
    > 
    > Can people please advise on an appropriate model specification. 
    > 
    > I've come up with the following so far: 
    > 
    > Y ~ Site + X1 + X2 + X3 + (1 | Date) 
    > 
    > Y ~ X1 + X2 + X3 + (1 | Site) + (1 | Date) 
    > 
    > My hangups are that I think these models treat Date categorically 
    > (controlling for variation from a particular date, but not how close or far 
    > dates are from each other). Also, a model allowing both random intercepts
    > and slopes might be better as responses could vary significantly in 
    > magnitude and direction among sites. 
    > 
    > Any advice would be appreciated. Thanks! 
    > 
    > [[alternative HTML version deleted]] 
    > 
    > _______________________________________________ 
    > R-sig-mixed-models using r-project.org mailing list 
    > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models 
    > 

    [[alternative HTML version deleted]] 

    _______________________________________________ 
    R-sig-mixed-models using r-project.org mailing list 
    https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models 

    	[[alternative HTML version deleted]]