[R-sig-ME] Cross-Classified MLM for reliability study

Stuart Luppescu slu at ccsr.uchicago.edu
Tue Aug 20 18:01:00 CEST 2013

On Tue, 2013-08-20 at 13:02 +0000, Jack Wilkinson wrote:
> Dear list members,
> I am hoping to use a multilevel model to analyse data from a rater reliability study. Interest lies in assessing interrater and intrarater reliability.
> I have 20 patients. Four measurements were taken in succession on each patient by each of two raters (ie: each patient was measured 8 times).
> I believe this to be an example of a cross-classified MLM, with observations nested both within patients and within raters.  I am interested in fitting a model that contains an intercept, a random patient term, a rater term and a random patient by rater interaction term. The interaction term represents the interrater error, and the level 1 residual represents intrarater error. I intend to use the variance estimates from the model to calculate variance partition coefficients, representing inter and intrarater reliability. The question is whether or not I have specified the model correctly in lme4.
> As I only have two raters, I understand that I should treat rater as a fixed effect.  My attempt to specify the model is as follows.
> fixed<-lmer(pwv~ 1 +(1|id) + rater + (1|id:rater), data = reliability)
> In fact, in a subsequent experiment, I would be interested in looking at a larger number of raters, treating rater as a random effect. Then I would specify the model as:
> random<-lmer(pwv~ 1 +(1|id) + (1|rater) + (1|id:rater), data = reliability).
> Would these examples specify my models of interest? As a new user of lme4 and multilevel models, I will apologise for any naivety on my part.

I have done something similar in an analysis of observations of teacher
performance. I'm by no means an expert in this area, but I have a couple
of comments:
1) If you only have two raters and both raters evaluated all the
patients, a mixed random effects model is probably overkill. Why not
just use 2-way ANOVA, or calculate a kappa statistic?
2) The ratings the observers are giving are probably not numbers. Is 2-1
the same as 4-3? I doubt it. So, if you are insistent on using this type
of analysis you should use ordered categorical outcomes. I have found
that MCMCglmm works best for this. Here is the model I use (tid is the
teacher ID and obsid is the observer ID; comp.f is the evaluation
framework components they are rated on):

glme4 <- MCMCglmm(rating.o ~ comp.f,
                  prior=list(R=list(V=1, fix=1), G=list(G1=list(V=1,
nu=0), G2=list(V=1, nu=0))),
                  random = ~tid + obsid,
                  family = "ordinal",
                  data = ratings)

Then to calculate the reliability I get the ICC like this:

tid.var <- summary(glme4)$Gcovariances[,1][1]
obsid.var <- summary(glme4)$Gcovariances[,1][2]
ICC <- tid.var/(tid.var + obsid.var + 1)

Stuart Luppescu -=-=- slu <AT> ccsr <DOT> uchicago <DOT> edu
CCSR at U of C ,.;-*^*-;.,  ccsr.uchicago.edu
     (^_^)/    才文と智奈美の父
[Crash programs] fail because they are based on the theory that, 
with nine women pregnant, you can get a baby a month.
                -- Wernher von Braun

Stuart Luppescu <slu at ccsr.uchicago.edu>
University of Chicago

More information about the R-sig-mixed-models mailing list