[R-sig-ME] lme4 and random effects modelling.

Thierry Onkelinx thierry.onkelinx at inbo.be
Thu Jun 25 13:49:41 CEST 2015


Dear Tom,

You talk about sampling locations and surveys. Does each survey reuse the
same locations or does each survey uses different locations?
If the locations are reused, then I would add a location random effect. The
random effects structure would be (1|Farm/Survey/Location) or (1|Farm) +
(1|Farm:Survey) + (1|Farm:Survey:Location). In case the id's of survey are
unique (not reused among farms) en the location id's are unique as well
then you can simply this to (1|Farm) + (1|Survey) + (1|Location). This is
the case without Distance as a random slope.

In case of a single categorical variable as a random slope I prefer (0 +
Distance|Farm) instead of (1 + Distance|Farm). The model fit is the same,
the difference is in the parametrisation. 0 + Distance gives the effect of
"near", "intermediate" and "referene" whereas 1 + Distance gives intercept
(= "near"), difference between "intermediate" and "near", difference
between "reference" and "near". 0 + Distance makes IMHO the random effects
and their variance-covariance parameters easier to interpret.

Your model 3 adds random slopes for both the farm and survey level. You
have to this if that makes sense or not. Given the number of surveys per
farm and the number of locations per survey, I would only add the random
slopes at the farm level. The structure would look like (0 + Distance|Farm)
+ (1|Survey) + (1|Location), assuming each Survey and Location is unique.

Such structure matches the design of the study. If there is an effect from
one of the levels (farm, survey or location),  then the model can cope with
that. If there is no effect, the variance will be very small. So IMHO there
is no need to find the "optimal" random effects structure, since the design
dictates what a minimal structure should be.

Best regards,

ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
Forest
team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
Kliniekstraat 25
1070 Anderlecht
Belgium

To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey

2015-06-25 13:17 GMT+02:00 Tom Wilding <Tom.Wilding op sams.ac.uk>:

> Dear List
>
> I have a question on random-effects model selection.  My data set is
> 'ecological', messy and is based on observational data.  I have made notes
> on the Models I have tried, and would welcome any comments on these.
>
> I have ~100 Farms (random effect) within which Surveys (random effect)
> have been conducted over several years (~3 on average, per farm).  Survey
> is nested in Farm.  Each Survey consists of samples taken from different
> Distances: 1 sample from near the farm, 3 samples from intermediate
> distances and 2 samples from reference locations.  Assume the samples are
> taken non-independently (i.e. they represent pseudoreplicates).  Farms are
> variously treated with a chemical ("Treat", fixed effect, continuous and
> the main interest of this study) and there are several other environmental
> covariables associated with each sample.  Factor Distance is a major driver
> of patterns in the response, and the response is highly variable (within
> Farms, within Surveys within Farms and between pseudoreplicates).
>
> A simple model (using library(lme4)) includes the hierarchical structure
> in the random effects:
>
> Model1: Response~Treat+Distance +other covariables +
> (1|Farm/Survey)#AIC=3721
> #Model1 doesn't model the non-independence of the Distance samples.
> Model2: Response~Treat+Distance +other covariables +
> (1|Farm/Survey/Distance)#AIC=3586
> #Model2 now nests Distance in Survey and accounts for the non-independence
> (pseudoreplication of samples taken at Distances).   This model accounts
> for the variance in each of the nested hierarchies.
>
> It seems reasonable to test a random slope model, one that allows the
> effect of Distance to vary between different Farms and Surveys nested in
> Farms. Distance is a categorical predictor, the interpretation of 'slopes'
> with >2 levels (as here) seems relatively intuitive (but I have seen very
> little on random slopes with categorical predictors - Gelman and Hill's
> (Ref 1) radon example (page 281) with 'Floor' (2 levels) is the only
> example I can find).  I have fitted the following to allow for random
> slopes:
>
> Model3: Response~Treat+Distance +other covariables +
> (1+Distance|Farm/Survey)#AIC=3549.
> Model3 generates estimates of the standard deviation associated with each
> Distance within Surveys nested in Farms and within Farms and correlations
> between each Distance (Survey within Farm and within Farm).  This seems
> relatively intuitive.
>
> On the basis of AIC (as recommended by Zuur, Ref 2), and likelihood ratio
> tests Model3 is the superior model although it is estimating a greater
> number of random effects hence, probably, on the basis of BIC Model3 is not
> as good as Model2.  My concern with Model 3 is that I don't see if it is
> accounting for the non-independence of measurements taken within Distance.
> Comments on this aspect would be much appreciated.  I have noted this
> paper: http://arxiv.org/pdf/1506.04967v1.pdf but, for the moment, would
> rather use the logic of the observational design (not mine!), and graphical
> data exploration,  to derive the random-effects part of the model.
>
> Many thanks
>
> Tom.
>
>
> Ref 1 - Gelman, A. and J. Hill (2007). Data analysis using regression and
> multilevel/hierarchical models, Cambridge University Press.
> Ref 2 - Zuur, A. F., E. N. Ieno, N. J. Walker, A. A. Saveliev and G. M.
> Smith (2009). Mixed effects models and extensions in ecology with R,
> Springer, New York, USA.
>
>
> The Scottish Association for Marine Science (SAMS) is registered in
> Scotland as a Company Limited by Guarantee (SC009292) and is a registered
> charity (9206). SAMS has two actively trading wholly owned subsidiary
> companies: SAMS Research Services Ltd (SC224404) and SAMS Ltd (SC306912).
> All Companies in the group are registered in Scotland and share a
> registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The
> content of this message may contain personal views which are not the views
> of SAMS unless specifically stated. Please note that all email traffic is
> monitored for purposes of security and spam filtering. As such individual
> emails may be examined in more detail.
> _______________________________________________
> R-sig-mixed-models op r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>

	[[alternative HTML version deleted]]



More information about the R-sig-mixed-models mailing list