[R-sig-ME] Mixed Effects Models to compare different testing sites

Matthew Brain matt.brain1 at gmail.com
Fri Apr 22 17:42:48 CEST 2016

I have an awkward correlation problem with an unbalanced nested repeated measures data set. Essentially we have 3 sampling sites on a flowing fluid stream: site A is at the source and sites B and C are sequentially downstream after additives and filters are applied. A significant process difference is additive A or B (which affects sites B and C). Hierarchical structure is group (source of fluid stream), batch of filter, processA or B. Sample of data can be found at http://www.irenabyss.com.au/testing/ddata.html. 

I am trying to answer four questions: 
1. What is the correlation and CI between measurements taken from Sample sites B and C 
2. Is the measurement variance significantly greater at sample sites B or C when compared to site A
3. Is there heteroskedasticity of measurements (from A,B or C) as values vary between low and high values

My main issue is with assessing correlation and model specification. This post: http://stackoverflow.com/questions/2336056/how-to-do-correlation-with-blocks-or-repeated-measures suggests using nlme’s correlation structure however I am uncertain whether this is truly a valid method. 

In regards to point 2: can I compare the fit parameters of identical lmer models that substitute Sample B and C with each other to comment on variation at either site.

Code so far is:
#Data at http://www.irenabyss.com.au/testing/ddata.html

#simple correlation - that is clearly incorrect due to the range of the measurements within different patients.
cor.test(ddata$SampleB,ddata$SampleC,use = "everything",na.rm=TRUE)

#method from stack_overflow: nlme correlation function to assess intra-class correlation
summary(mod <- gls(SampleC ~ SampleB+Batch:Group,
                   correlation = corLin(form=~1|Batch/Group), #Post_Filt_iCa_mmol.L
                   data=ddata,na.action = na.omit) ) 
intervals(mod)$corStruct  # confidence interval for the correlation

#plot of predicted vs fitted values from the gls model
bwtheme<-standard.theme("pdf",color = FALSE)
(plot5<- with(ddata,
              xyplot(SampleC ~ SampleB|(paste(ddata$Process,  formatC(ddata$Group,width=2,format="d",flag="0"),sep=".")), groups=as.factor(Batch), as.table=TRUE, panel = function(x, y, ...) {panel.xyplot(x, y, ...);panel.abline(coef = coef(mod), ...)  },par.settings = bwtheme, aspect = "iso",type = c('p'),xlab = "Sample B Conc.", ylab = "Sample A Conc",scales = "free",auto.key = list(title = "Batch", cex = 0.7,space = 'right',adj = 1,columns = 1,levels(as.factor(ddata$Batch))))))

#comparisons of intercept, slope and variance at sites B & C compared to Site A
summary(lme4_modC<-lmer(SampleC ~(1|SampleA) +(Batch|Group),data=ddata))
summary(lme4_modB<-lmer(SampleB ~(1|SampleA) +(Batch|Group),data=ddata))

Any suggestions or identifying errors appreciated.

Kind Regards

Matt Brain
PhD Candidate, Monash University

More information about the R-sig-mixed-models mailing list