[R-sig-ME] mixed effects models and pseudo replication

Wed Mar 10 15:07:58 CET 2010

Hi, 

I am analysing effects of local population density on fish performance (e.g. weight). My dataset is based on fish sampled from different sites (17 stations) and in addition to measures on individual performance, I have information on age (0 and 1). On site level, I have information on fish densities for both age groups. I am interesting in estimating the effects of fish density on performance and particularly interested in determining possible differences between age groups in the density response. 

Traditionally, these kind of data are analysed based on mean values (ancovas). However, based on mixed effects model, the among individual variance will be included in the analysis and not just averaged out. I started by using lmer (lme4 package), but realizing that the variance is increasing with density, I switched to lme (nlme package) and applied variance structures. 

My starting model is thus: 

m1 <- lme(weight ~ age*density0 + age*density1, random = ~1|station, weights=....) 

with station and age as factors.  

Now, my issue is pseudo-replication. The summary table shows that the factors age and age*density have very high degrees of freedom (~700) and accordingly low p-values. It seems to me like age and the interactions between age and density are analysed as if the samples were independent, and if so, it means pseudo-replication, doesn't it? 

If I set up an alternative random structure allowing for random variance between age classes within station: 
m2 <- lme(weight ~ age*density0 + age*density1, random = ~1|station/age, weights=....) 

the summary table is more like I think it should be: 14 df for all fixed effects parameters and interactions, and the p-values seem more realistic.  

When comparing m1 and m2 (REML estimation), however, m2 do not provide better fit, and based on literature (e.g. Zuur et al. 2009), then I should use m1. 

Testing the significance of the interaction terms by model comparisons (which is what I do to find the optimal model), the significance levels of the likelihood ratio test for specific interaction terms are equivalent whether I use station or station/age as random factors. Which is sort of comforting. 

So, my question is, do I really control for pseudo-replication in the estimation of all fixed effects and interactions when using m1? If so, why these high dfs in the summary table?? 

I would really appreciate if someone could enlighten me! 

Regards, 

Eli 

________________________________________________________________

Eli Kvingedal
PhD Student

Norwegian Institute for Nature Research - NINA
Postal address: NO-7485 Trondheim, NORWAY
Delivery/Visiting address: Tungasletta 2, NO-7047 Trondheim, NORWAY
Phone: +47 73 80 14 00 * Fax: +47 73 80 14 01 * www.nina.no