[R-sig-ME] lme4 and random effects modelling.

Tom Wilding Tom.Wilding at sams.ac.uk
Thu Jun 25 13:17:00 CEST 2015


Dear List

I have a question on random-effects model selection.  My data set is 'ecological', messy and is based on observational data.  I have made notes on the Models I have tried, and would welcome any comments on these.

I have ~100 Farms (random effect) within which Surveys (random effect) have been conducted over several years (~3 on average, per farm).  Survey is nested in Farm.  Each Survey consists of samples taken from different Distances: 1 sample from near the farm, 3 samples from intermediate distances and 2 samples from reference locations.  Assume the samples are taken non-independently (i.e. they represent pseudoreplicates).  Farms are variously treated with a chemical ("Treat", fixed effect, continuous and the main interest of this study) and there are several other environmental covariables associated with each sample.  Factor Distance is a major driver of patterns in the response, and the response is highly variable (within Farms, within Surveys within Farms and between pseudoreplicates).

A simple model (using library(lme4)) includes the hierarchical structure in the random effects:

Model1: Response~Treat+Distance +other covariables + (1|Farm/Survey)#AIC=3721
#Model1 doesn't model the non-independence of the Distance samples.
Model2: Response~Treat+Distance +other covariables + (1|Farm/Survey/Distance)#AIC=3586
#Model2 now nests Distance in Survey and accounts for the non-independence (pseudoreplication of samples taken at Distances).   This model accounts for the variance in each of the nested hierarchies.

It seems reasonable to test a random slope model, one that allows the effect of Distance to vary between different Farms and Surveys nested in Farms. Distance is a categorical predictor, the interpretation of 'slopes' with >2 levels (as here) seems relatively intuitive (but I have seen very little on random slopes with categorical predictors - Gelman and Hill's (Ref 1) radon example (page 281) with 'Floor' (2 levels) is the only example I can find).  I have fitted the following to allow for random slopes:

Model3: Response~Treat+Distance +other covariables + (1+Distance|Farm/Survey)#AIC=3549.
Model3 generates estimates of the standard deviation associated with each Distance within Surveys nested in Farms and within Farms and correlations between each Distance (Survey within Farm and within Farm).  This seems relatively intuitive.

On the basis of AIC (as recommended by Zuur, Ref 2), and likelihood ratio tests Model3 is the superior model although it is estimating a greater number of random effects hence, probably, on the basis of BIC Model3 is not as good as Model2.  My concern with Model 3 is that I don't see if it is accounting for the non-independence of measurements taken within Distance.  Comments on this aspect would be much appreciated.  I have noted this paper: http://arxiv.org/pdf/1506.04967v1.pdf but, for the moment, would rather use the logic of the observational design (not mine!), and graphical data exploration,  to derive the random-effects part of the model.

Many thanks

Tom.


Ref 1 - Gelman, A. and J. Hill (2007). Data analysis using regression and multilevel/hierarchical models, Cambridge University Press.
Ref 2 - Zuur, A. F., E. N. Ieno, N. J. Walker, A. A. Saveliev and G. M. Smith (2009). Mixed effects models and extensions in ecology with R, Springer, New York, USA.


The Scottish Association for Marine Science (SAMS) is registered in Scotland as a Company Limited by Guarantee (SC009292) and is a registered charity (9206). SAMS has two actively trading wholly owned subsidiary companies: SAMS Research Services Ltd (SC224404) and SAMS Ltd (SC306912). All Companies in the group are registered in Scotland and share a registered office at Scottish Marine Institute, Oban Argyll PA37 1QA. The content of this message may contain personal views which are not the views of SAMS unless specifically stated. Please note that all email traffic is monitored for purposes of security and spam filtering. As such individual emails may be examined in more detail.


More information about the R-sig-mixed-models mailing list