[R-sig-ME] df pseudoreplication in lme model

Ben Bolker bbolker at gmail.com
Mon Feb 9 05:47:35 CET 2015


Lauren Meyer <lauren.meyer90 at ...> writes:

> 
> Hello, I am trying to assess whether or not 
> my df are pseudoreplicated in my
> lme model.
> 
> my study was undertaken on five fish (labeled PC) each tested in two
> replicates(REP), across each combination of three treatments HOM, C18 and
> CU, each of which had two levels; HOM(SON, BLD),C18 SML, BIG), 
> CU (YES, NO).
> The variable we are assessing is the amount of toxin 
> extracted (TOX1). Also,
> some data is missing, and has already been removed. I am using an lme
> model, as the study design is similar to a split plot design, with a
> 2X2X2 full factorial design. There are a total of 65 observations.
> 
> Here is the model I am using:
> >model<- lme(TOX1~HOM*C18*CU, random=~1|PC/REP, data=Data4, method="ML")
> Linear mixed-effects model fit by maximum likelihood
> 
> which results in 48 DF for everything. Furthermore, I removed the
> three way interaction as well as all of the two way interactions as
> they were deemed non-significant, producing the final model :
> 
> > model5<- lme(TOX1~HOM+C18+CU, random=~1|PC/REP, data=Data4, method="ML")
> 
> which has 52 DF
> 
> However, I am unsure if these Df are pseudoreplicated and would like some
> help in how to determine if this is the case. I am happy to upload the
> full dataset and/or any of the outputs if that would help.

  Not sure entirely what you mean by "pseudoreplicated df".
I guess there are quite a few missing observations (since
5 x 2 x 2 x 2 x 2 = 80). 
 
  In principle since this is a randomized block design (you have
the treatments replicated within every fish*rep combination), the
df here should be correct (you can look up the formula for the df
of a randomized block design in a general stats book, e.g.
Ellison and Gotelli _Primer of Ecological Statistics_).  
There is one potential issue here, though:
technically, since you measured all treatments in every fish,
you have the capability to measure whether the treatments vary
across fish and across replicates (random = ~HOM+C18+CU|PC/REP).
However, 5 fish is not very many reps, especially not for estimating
a full 3x3 variance-covariance matrix for the treatments ...)

Schielzeth and Forstmeier Behav Ecol 20:416–420 (2009) talk about the
importance of accounting for among-individual variation in effects, but
caution:

> There are a few potential problems when using random slope
models. First, if there are only few individuals, the
between-individual variance components are difficult to estimate and
tend to be underestimated. This leads to unstable and often slightly
overconfident SEs. Second, random slope models might not converge,
particularly if more than one random intercept and one random slope
are included. The number of parameters to be estimated increases
substantially because not only the random effect for the intercepts
and slopes but also the correlations among them have to be
estimated. In case of convergence problems, we suggest following
Figure 1 to judge if including random slopes is likely to have a large
influence and to run preliminary submodels to decide whether or not to
include particular random slopes.


More information about the R-sig-mixed-models mailing list