[R-sig-ME] Mixed effect models for a study with no (real) replication

Marco Plebani marcoplebani85 at gmail.com
Tue Jul 22 14:22:26 CEST 2014


Dear list members,

I am analyzing the effects of temperature on species richness across a natural temperature gradient. The study is based on 13 streams, each at a different temperature; for each stream I have measures of species richness obtained from three samples i.e. 39 data points in total.
I expect the observed variability to be due to four possible sources, three of which are discernible:
- differences AMONG streams due to temperature,
- differences AMONG streams not due to temperature (e.g. due to other environmental factors not accounted for in the study),
- differences WITHIN streams (due to either natural heterogeneity and/or the observation process, indiscernible without having multiple measures for each sample).

I should point out that “temperature” is coded as a continuous variable while “ Stream.ID” is a factor (I did this for clarity; I could have used “temperature” as both fixef and ranef).
A very similar issue has been discussed recently (https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022365.html, “Same variable as both fixed and random") so I fitted the following model:

glmer(species.richness ~ temperature + (1 | Stream.ID), data=dd, family=poisson)

I tested the above-mentioned model on simulated data obtained as follows (see script at the end of this message):

species.richness = a * exp(b* temperature * epsilon1 * epsilon2)

Where epsilon1 adds variability AMONG streams not explained by temperature, and epsilon2 adds variability WITHIN streams.
The estimates of parameters a and b are coherent with the parameters of the simulation, but the st.dev estimates due to Stream.ID are often far off (somewhere between the actual st.dev due to Stream ID and the residual st.dev)  and often touching zero (when the simulated st.dev due to Stream ID and the residual st.dev are similar).
Why does that happen? Here: https://stat.ethz.ch/pipermail/r-sig-mixed-models/2014q3/022386.html (“LMM: including ranef or not?”) Ben Bolker foresees such an issue when the ranef has only few levels and suggests to use blme to fix the problem. I tried; blme still confounds the st.dev due to Stream ID and the residual st.dev when the latter is much higher then the former, but it does succeed in reducing the estimate of false zeroes for the st.dev of Stream ID when st.dev(Stream.ID)~st.dev(residual) and both <1.
Why blme is better than lme4 in detecting the st.dev of the ranef “Stream.ID” when it is small?
My familiarity with Bayesian stats is limited to studying Bayes’s theorem as an undergrad, so to be honest I have no idea what’s going on “inside” blme.

Thank you very much in advance for any help provided.
Cheers,

Marco

-----
Marco Plebani
Institute of Evolutionary Biology and Environmental Studies
University of Zurich
http://www.ieu.uzh.ch/staff/phd/plebani.html





###################################
###################################
# Marco Plebani, July 17th 2014
# Given a study based on 13 streams at 13 different temperatures, sampled 3 times each for species richness and total biomass, is it possible to discern the temperature effect on species richness and total biomass from the effect of variability among streams not due to temperature?
# Note that the multiple samples are PSEUDO-replicates and that there is complete temperature=stream identity.
###################################
###################################

rm(list=ls())
library(lme4)
library(blme)
#set.seed(2)

# Simulate data based on the observed temperature effect.
# the model is:

# y = a * exp(b*x) + epsilon

#a, i.e. species richness at 0°C, is set at 13
# b is set at -0.07

# epsilon is the sum of:
# epsilon1 due to variability AMONG streams not explained by temperature,
# epsilon2 due to variability WITHIN streams (i.e. residual variuability, made up by both the streams natural heterogeneity and the sampling error)

# create empty vectors where to save estimates for the sd due to Stream ID, and the p-val of parameter "b" (see model above):

stream.ID.stdev<-rep(NA,50)
stream.ID.stdev.bayes<-rep(NA,50)
model.significance<-rep(NA,50)

# create the dataset:

reps <- 1:3
Stream.ID <- letters[1:13]
temperature <- seq(5,20, length.out=13)
rr <- expand.grid(Stream.ID = Stream.ID, reps=reps)
rr$temperature = rep(temperature,3)

## predicted species richness for the observed temperature:
rr$pred.rich <- 13 * exp(-0.07 * rr$temperature) # a and b values are realistic estimates

# continuous prediction: 
temp=seq(0,22,0.1)
pred.richness <- 13 * exp(-0.07 * temp)

par(mfrow=c(1,3))
plot(x= temp, y= pred.richness, xlim=c(0,23), ylim=c(0, 15), type="l")

# run the analysis on 50 datasets drawn from the same distribution:

for(i in 1:50){

## epsilon1: stream.ID effect
stream.eff <- data.frame(Stream.ID = rr$Stream.ID,
						effect=rnorm(n=length(Stream.ID), mean=0, sd=0.2))
# add epsilon1 to the dataset						
rr$epsilon1 <- stream.eff$effect[match(rr$Stream.ID, stream.eff$Stream.ID)]

## epsilon2: residual variability within streams
rr$epsilon2 <-  rnorm(length(rr$temperature), mean=0, sd=0.2)

# pred.richness accounting for epsilon1
rr$pred.rich_e1 <- 13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1)

# pred.richness accounting for epsilon1 and epsilon2				
rr$pred.rich_e1_e2 <- round(13 * exp(-0.07 * rr$temperature) * exp(rr$epsilon1) * exp(rr$epsilon2))
# rounded to the closest integer
							
points(x= rr$temperature, y= rr$pred.rich_e1, pch=8, cex=1)
#points(x= rr$temperature, y= rr$pred.rich_e1_e2, cex=1.5)
# solid line represents predicted mean richness
# *'s represent predicted mean richness + epsilon1
# circles represent predicted mean richness + epsilon1 + epsilon2, i.e. the observations from each sample

# glmm:

richness.glmm <- glmer(pred.rich_e1_e2 ~ temperature + (1|Stream.ID), data=rr, family=poisson)
# "pred.rich_e1_e2" is the observed richness in each sample.
richness.glmm.bayes <- bglmer(pred.rich_e1_e2 ~ temperature + (1|Stream.ID), data=rr, family=poisson)

stream.ID.stdev[i] <- as.numeric(attributes(summary(richness.glmm)$varcor[[1]])$stddev)
stream.ID.stdev.bayes[i] <- as.numeric(attributes(summary(richness.glmm.bayes)$varcor[[1]])$stddev)
model.significance[i] <- ifelse(summary(richness.glmm)$coefficients[2,4]<0.05,1,0)
# library(MuMIn) # estimates pseudo-R^2 for MEM.
# r.squaredGLMM(richness.glmm)

}

hist(stream.ID.stdev)
abline(v=mean(stream.ID.stdev), col="red",lwd=2)
legend("topright",title=paste("mean=", round(mean(stream.ID.stdev),2), "; sd=", round(sd(stream.ID.stdev),2)), legend=c(NA))

hist(stream.ID.stdev.bayes)
abline(v=mean(stream.ID.stdev.bayes), col="red",lwd=2)
legend("topright",title=paste("mean=", round(mean(stream.ID.stdev.bayes),2), "; sd=", round(sd(stream.ID.stdev.bayes),2)), legend=c(NA))

# end


More information about the R-sig-mixed-models mailing list