[R-sig-ME] [FORGED] Re: Using variance components of lmer for ICC computation in reliability study

Sun Jun 17 17:42:19 CEST 2018

Dear Pierre,

Again, thanks for referring me to the paper (and thanks to Ben for the review article). It's becoming clearer now.

When I refer to residuals, as I understand from lmer modelling, it is the level 1 variance. Back to my design, I have many subjects, two fixed raters, sessions nested within a subject, and trials nested within sessions. Hence, level 1 variance would typically reflect inter-trial variance.

If I were to use lmer modelling, I do not have to specify the level 1 random effects (i.e. ~1| SUBJ:SESSION:TRIAL), as it is in the residuals. 

However, using mcmcglmm, there appears to be an even lower level of random effects which is fixed (as you mentioned). So my question is, should I specify also the level 1 random effects?

Sample formula using your suggestion of probit (I just like to know if the priors and model are appropriate):
mcglmm_mod = MCMCglmm(vas ~ RATER, 
                      random =  ~ SUBJ +
                        SUBJ:SESSION +
                        SUBJ:SESSION:TRIAL,
                      data = as.data.frame (df %>% filter (SIDE == "R")),
                      family = "ordinal",
                      prior=list(R=list(V=1, fix=1), G=list(G1=list(V=1, nu=0),
                                                            G2=list(V=1, nu=0),
                                                            G3=list(V=1, nu=0))))
Many thanks again for your (and everyone's) help thus far.

Kind regards,
Bernard
-----Original Message-----
From: pierre.de.villemereuil using mailoo.org <pierre.de.villemereuil using mailoo.org> 
Sent: Saturday, June 16, 2018 7:28 PM
To: Bernard Liew <B.Liew using bham.ac.uk>
Cc: r-sig-mixed-models using r-project.org
Subject: Re: [R-sig-ME] [FORGED] Re: Using variance components of lmer for ICC computation in reliability study

Dear Bernard,

> Thanks Pierre for the suggestion to use MCMCglmm. Very useful.

Glad this was helpful.

> 1) Can I ask why when taking the ratio of the variance components, the denominator is ICC = Vcomp / (sum(variance components of the model) + 1)? Why is the one added?

You can find an explanation for this in the Supplementary File on our article on quantitative genetics interpretation of GLMMs:
http://www.genetics.org/content/204/3/1281.supplemental

It is fairly technical. Trying to put it simply: there is an equivalence between a binomial/probit model and the threshold model. This is because a probit link is actually the CDF of a normal distribution (same goes for a logit and a logistic distribution): using a probit link and a binomial is equivalent to using a threshold model after adding a random noise normally distributed (this is Fig. S1 in our Suppl. File above). 

And the variance of this random noise is thus the variance of a standard normal distribution which is 1 (or the variance of a standard logistic distribution, which is (pi^2)/3, for a logit link).

Computing the ICC on the liability scale "as if" a threshold model was used thus only requires to add that extra-variance on the denominator.

> 2) I have tried mixed ordinal modelling using "ordinal" or MCMCglmm, 
> and noticed the variance of the residuals of the model is not 
> produced. I have also read that the residual is assumed to be as you 
> mentioned (pi^2)/3. Is there a reason (maybe a not so technical one?)

It all depends what you call "residual". In MCMCglmm, there is for example a "residual" variance called "units" which value should be fixed to e.g. 1 when running families such as ordinal. This variance is required to be added in the denominator of the ICC. If you have only a random effect, you would have:
Vrand / (Vrand + Vunits + 1)

Note that there is also the "threshold" family in MCMCglmm, which is special in the sense that (to say it quickly) the "units" variance is the variance of the probit link and should be fixed to 1. In that case, you would have:
Vrand / (Vrand + Vunits)

Sometimes this "extra-variance" (1 or (pi^2)/3) is indeed called "residual variance". This is because, as per my explanation above, probit and logit links can be seen as "adding a random noise then taking a threshold", this random noise is sometimes seen as a "residual error" in the model.

I hope this will clarify things for you. I'm afraid it's difficult to explain clearly the whys without going too much into the technicality of the models.

Cheers,
Pierre.

> 
> Kind regards,
> Bernard
> 
> -----Original Message-----
> From: pierre.de.villemereuil using mailoo.org 
> <pierre.de.villemereuil using mailoo.org>
> Sent: Friday, June 15, 2018 4:05 PM
> To: r-sig-mixed-models using r-project.org
> Cc: Ben Bolker <bbolker using gmail.com>; Doran, Harold <HDoran using air.org>; 
> Bernard Liew <B.Liew using bham.ac.uk>
> Subject: Re: [R-sig-ME] [FORGED] Re: Using variance components of lmer 
> for ICC computation in reliability study
> 
> Hi,
> 
> > However, I'm not sure how one would go about computing an ICC from 
> > ordinal data
> 
> I've never used the package "ordinal", but if it's anything like the "ordinal" family of MCMCglmm, then computing an ICC on the liability scale would be fairly easy, as one would just proceed as always and simply add the so-called "link variance" corresponding to the chosen link function (1 for probit, (pi^2)/3 for logit). E.g. for a given variance component Vcomp and a probit link:
> ICC = Vcomp / (sum(variance components of the model) + 1)
> 
> However, computing an ICC on the data scale would be much more difficult as it is actually multivariate...
> 
> I think in the case when such scores were used, having the estimate on the liability scale make sense though, as the binning is more due to our inability of finely measuring this scale rather than an actual property of the system.
> 
> Cheers,
> Pierre.
> 
> Le vendredi 15 juin 2018, 03:27:54 CEST Ben Bolker a écrit :
> > More generally, the best way to fit this kind of model is to use an
> > *ordinal* model, which assumes the responses are in increasing 
> > sequence but does not assume the distance between levels (e.g. 1 vs 
> > 2,
> > 2 vs 3 ...) is uniform.  However, I'm not sure how one would go 
> > about computing an ICC from ordinal data ... (the 'ordinal' package 
> > is the place to look for the model-fitting procedures). Googling it 
> > finds some stuff, but it seems that it doesn't necessarily apply to 
> > complex designs ...
> > 
> > https://stats.stackexchange.com/questions/3539/inter-rater-reliabili
> > ty
> > -for-ordinal-or-interval-data
> > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3402032/
> > 
> > 
> > On Thu, Jun 14, 2018 at 6:58 PM, Doran, Harold <HDoran using air.org> wrote:
> > > That’s a helpful clarification, Rolf. However, with gaussian 
> > > normal errors in the linear model, we can’t *really* assume they 
> > > would asymptote at 1 or 10. My suspicion is that these are 
> > > likert-style ordered counts of some form, although the OP should 
> > > clarify. In which case, the 1 or 10 are limits with censoring, as 
> > > true values for some measured trait could exist outside those 
> > > boundaries (and I suspect the model is forming predicted values outside of 1 or 10).
> > >
> > >
> > >
> > > On 6/14/18, 6:33 PM, "Rolf Turner" <r.turner using auckland.ac.nz> wrote:
> > >
> > >>
> > >>On 15/06/18 05:35, Doran, Harold wrote:
> > >>
> > >>> Well no, you¹re specification is not right because your variable 
> > >>> is not continuous as you note. Continuous means it is a real 
> > >>> number between -Inf/Inf and you have boundaries between 1 and 10.
> > >>> So, you should not be using a linear model assuming the outcome is continuous.
> > >>
> > >>I think that the foregoing is a bit misleading.  For a variable to 
> > >>be continuous it is not necessary for it to have a range from 
> > >>-infinity to infinity.
> > >>
> > >>The OP says that dv  "is a continuous variable (scale 1-10)".  It 
> > >>is not clear to me what this means.  The "obvious"/usual meaning 
> > >>or interpretation would be that dv can take (only) the (positive
> > >>integer) values 1, 2, ..., 10.  If this is so, then a continuous 
> > >>model is not appropriate.  (It should be noted however that people 
> > >>in the social sciences do this sort of thing --- i.e. treat 
> > >>discrete variables as continuous --- all the time.)
> > >>
> > >>It is *possible* that dv can take values in the real interval 
> > >>[1,10], in which case it *is* continuous, and a "continuous model"
> > >>is indeed appropriate.
> > >>
> > >>The OP should clarify what the situation actually is.
> > >>
> > >>cheers,
> > >>
> > >>Rolf Turner
> > >>
> > >>--
> > >>Technical Editor ANZJS
> > >>Department of Statistics
> > >>University of Auckland
> > >>Phone: +64-9-373-7599 ext. 88276
> > >>
> > >>> On 6/14/18, 11:16 AM, "Bernard Liew" <B.Liew using bham.ac.uk> wrote:
> > >>>
> > >>>> Dear Community,
> > >>>>
> > >>>>
> > >>>> I am doing a reliability study, using the methods of 
> > >>>> https://www.ncbi.nlm.nih.gov/pubmed/28505546. I have a question 
> > >>>> on the lmer formulation and the use of the variance components.
> > >>>>
> > >>>> Background: I have 20 subjects, 2 fixed raters, 2 testing 
> > >>>> sessions, and
> > >>>> 10 trials per sessions. my dependent variable is a continuous 
> > >>>> variable (scale 1-10). Sessions are nested within each 
> > >>>> subject-assessor combination. I desire a ICC (3) formulation of 
> > >>>> inter-rater and inter-session reliability from the variance components.
> > >>>>
> > >>>> My lmer model is:
> > >>>>
> > >>>> lmer (dv ~ rater + (1|subj) + (1|subj:session), data = df)
> > >>>>
> > >>>> Question:
> > >>>>
> > >>>>   1.  is the model formulation right? and is my interpretation 
> > >>>>of the  variance components for ICC below right?
> > >>>>   2.  inter-rater ICC = var (subj) / (var(subj) + var 
> > >>>>(residual)) # I  read that the variation of raters will be lumped with the residual
> > >>>>   3.  inter-session ICC =( var (subj) + var (residual)) /( var
> > >>>>(subj) +  var (subj:session) + var (residual))  some simulated
> > >>>>data:
> > >>>> df = expand.grid(subj = c(1:20), rater = c(1:2), session = 
> > >>>>c(1:2), trial  = c(1:10))  df$vas = rnorm (nrow (df_sim), mean = 
> > >>>>3, sd = 1.5)
> > >>>>
> > >>>> I appreciate the kind response.
> > >>
> > >>
> > >
> > > _______________________________________________
> > > R-sig-mixed-models using r-project.org mailing list 
> > > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> > 
> > _______________________________________________
> > R-sig-mixed-models using r-project.org mailing list 
> > https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
> 
> 
>