Weighted Chance-corrected Agreement Coefficients

Kilem L. Gwet, Ph.D.

2019-09-16

library(irrCAC)

Abstract

The irrCAC is an R package that provides several functions for calculating various chance-corrected agreement coefficients. This package closely follows the general framework of inter-rater reliability assessment presented by Gwet (2014). The document overview.html provides a broad overview of the different ways you may compute various agreement coefficients using the irrCAC package. However, it is common for researchers to want to compute the weighted versions of agreement coefficients in order to account for the ordinal nature of some ratings.

This document shows you how to specify the weights to comnpute the weighted kappa coefficient, the weighted \(\mbox{AC}_2\) coefficient and many others.

The different weights

A set of weights to be applied to an agreement is generally defined in the form a matrix (i.e. a square table) whose dimension is defined par the number of categories into which the rater can classify each subject. Although the user can define her own custom weights, the inter-rater reliability literature offers predefined weights, which can be generated using functions in this package as follows:

  identity.weights(1:3)
#>      [,1] [,2] [,3]
#> [1,]    1    0    0
#> [2,]    0    1    0
#> [3,]    0    0    1
  radical.weights(1:3)
#>           [,1]      [,2]      [,3]
#> [1,] 1.0000000 0.2928932 0.0000000
#> [2,] 0.2928932 1.0000000 0.2928932
#> [3,] 0.0000000 0.2928932 1.0000000
  linear.weights(1:3)
#>      [,1] [,2] [,3]
#> [1,]  1.0  0.5  0.0
#> [2,]  0.5  1.0  0.5
#> [3,]  0.0  0.5  1.0
  ordinal.weights(1:3)
#>           [,1]      [,2]      [,3]
#> [1,] 1.0000000 0.6666667 0.0000000
#> [2,] 0.6666667 1.0000000 0.6666667
#> [3,] 0.0000000 0.6666667 1.0000000
  quadratic.weights(1:3)
#>      [,1] [,2] [,3]
#> [1,] 1.00 0.75 0.00
#> [2,] 0.75 1.00 0.75
#> [3,] 0.00 0.75 1.00
  circular.weights(1:3)
#>              [,1]         [,2]         [,3]
#> [1,] 1.000000e+00 3.330669e-16 0.000000e+00
#> [2,] 3.330669e-16 1.000000e+00 3.330669e-16
#> [3,] 0.000000e+00 3.330669e-16 1.000000e+00
  bipolar.weights(1:3)
#>           [,1]      [,2]      [,3]
#> [1,] 1.0000000 0.6666667 0.0000000
#> [2,] 0.6666667 1.0000000 0.6666667
#> [3,] 0.0000000 0.6666667 1.0000000

You may need to read chapter 3 of Gwet (2014) for a more detailed discussion of weighted agreement coefficients. Identifty weights yield unweighted coefficients and the most commonly-used weights are the quadratic and linear weights.

Weighted Agreement Coefficients

I will show here how to compute weighted agreement coefficients separately for each of the 3 ways of organizing your input ratings that are covered in this package.

  1. The contingency table: this dataset describes ratings from 2 raters only in such a way that each row represents a rating used by rater 1 and each column a rating used by rater 2 (see cont3x3abstractors) for an example of such a dataset).

  2. The dataset of raw ratings: this dataset is a listing of all ratings assigned each individual subject by each rater. The row represents one subject, the column represents one rater and each value is the rating that the rater assigned to the subject (you may view the package dataset cac.raw4raters as an example).

  3. The distribution of raters: each row of this dataset represents a subject, each column a rating used by any of the raters and each dataset value is the the number of raters who assigned the specified rating to the subject (you may see distrib.6raters as an example)

Weighting ratings from a contingency table

Suppose you want to use quadratic weights to compute various weighted coefficients using the dataset cont3x3abstractors. Using the quadratic.weights function presented above, you would proceed as follows:

  cont3x3abstractors
#>         Ectopic AIU NIU
#> Ectopic      13   0   0
#> AIU           0  20   7
#> NIU           0   4  56
  q <- nrow(cont3x3abstractors)
  kappa2.table(cont3x3abstractors,weights = quadratic.weights(1:q))
#>      coeff.name coeff.val   coeff.se      coeff.ci coeff.pval
#> 1 Cohen's Kappa 0.8921569 0.03535151 (0.822,0.962)      0e+00
  scott2.table(cont3x3abstractors,weights = quadratic.weights(1:q))
#>   coeff.name coeff.val   coeff.se      coeff.ci coeff.pval
#> 1 Scott's Pi 0.8921093 0.03539506 (0.822,0.962)      0e+00
  gwet.ac1.table(cont3x3abstractors,weights = quadratic.weights(1:q))
#>   coeff.name coeff.val   coeff.se      coeff.ci coeff.pval
#> 1 Gwet's AC2 0.9402369 0.01792019 (0.905,0.976)      0e+00
  bp2.table(cont3x3abstractors,weights = quadratic.weights(1:q))
#>         coeff.name coeff.val   coeff.se      coeff.ci coeff.pval
#> 1 Brennan-Prediger    0.9175 0.02346673 (0.871,0.964)      0e+00
  krippen2.table(cont3x3abstractors,weights = quadratic.weights(1:q))
#>             coeff.name coeff.val   coeff.se      coeff.ci coeff.pval
#> 1 Krippendorff's Alpha 0.8926487 0.03539506 (0.822,0.963)      0e+00
  pa2.table(cont3x3abstractors,weights = quadratic.weights(1:q))
#>          coeff.name coeff.val    coeff.se      coeff.ci coeff.pval
#> 1 Percent Agreement    0.9725 0.007822244 (0.957,0.988)      0e+00

Weighting for a dataset of raw ratings

Suppose you want to use quadratic weights to compute various weighted coefficients using the dataset cac.raw4raters. You would proceed as follows:

  pa.coeff.raw(cac.raw4raters,weights = "quadratic")$est
#>          coeff.name        pa pe coeff.val coeff.se  conf.int      p.value
#> 1 Percent Agreement 0.9753788  0 0.9753788  0.09062 (0.776,1) 3.526377e-07
#>      w.name
#> 1 quadratic
  gwet.ac1.raw(cac.raw4raters,weights = "quadratic")$est
#>   coeff.name        pa        pe coeff.val coeff.se  conf.int      p.value
#> 1        AC2 0.9753788 0.7137044     0.914  0.10396 (0.685,1) 2.634438e-06
#>      w.name
#> 1 quadratic
  fleiss.kappa.raw(cac.raw4raters,weights = "quadratic")$est
#>      coeff.name        pa        pe coeff.val coeff.se  conf.int
#> 1 Fleiss' Kappa 0.9753788 0.8177083   0.86494  0.14603 (0.544,1)
#>        p.value    w.name
#> 1 9.976081e-05 quadratic
  krippen.alpha.raw(cac.raw4raters,weights = "quadratic")$est
#>             coeff.name        pa    pe coeff.val coeff.se  conf.int
#> 1 Krippendorff's Alpha 0.9735938 0.825   0.84911  0.12913 (0.561,1)
#>        p.value    w.name
#> 1 6.267448e-05 quadratic
  conger.kappa.raw(cac.raw4raters,weights = "quadratic")$est
#>       coeff.name        pa        pe coeff.val coeff.se  conf.int
#> 1 Conger's Kappa 0.9753788 0.8269638   0.85771  0.14367 (0.541,1)
#>        p.value    w.name
#> 1 9.319979e-05 quadratic
  bp.coeff.raw(cac.raw4raters,weights = "quadratic")$est
#>         coeff.name        pa   pe coeff.val coeff.se  conf.int     p.value
#> 1 Brennan-Prediger 0.9753788 0.75   0.90152  0.11089 (0.657,1) 5.60548e-06
#>      w.name
#> 1 quadratic

Weighting when input data is the distribution of raters by subject and rating

The calculation of weighted agreement coefficients for this type of datasets is very similar to the calculation of agreement coefficients with contingency tables. You need to use the weight-generating functions such as the quadratic.weights() fucntion to which the vector of ratings is supplied as parameter. As an example, suppose you want to compute various weighted agreement coefficients using quadratic weights and the distrib.6raters dataset. This would be accomplished as follows:

  q <- ncol(distrib.6raters)
  gwet.ac1.dist(distrib.6raters,weights = quadratic.weights(1:q))
#>   coeff.name     coeff    stderr      conf.int      p.value        pa
#> 1 Gwet's AC2 0.4837438 0.1058227 (0.257,0.711) 0.0004356618 0.8544444
#>          pe
#> 1 0.7180556
  fleiss.kappa.dist(distrib.6raters,weights = quadratic.weights(1:q))
#>      coeff.name     coeff    stderr       conf.int   p.value        pa
#> 1 Fleiss' Kappa 0.2511909 0.1457381 (-0.061,0.564) 0.1067865 0.8544444
#>          pe
#> 1 0.8056173
  krippen.alpha.dist(distrib.6raters,weights = quadratic.weights(1:q))
#>             coeff.name    coeff   stderr       conf.int    p.value
#> 1 Krippendorff's Alpha 0.259511 0.133958 (-0.028,0.547) 0.07315748
#>          pa        pe
#> 1 0.8560617 0.8056173
  bp.coeff.dist(distrib.6raters,weights = quadratic.weights(1:q))
#>         coeff.name     coeff    stderr      conf.int     p.value        pa
#> 1 Brennan-Prediger 0.4177778 0.1067493 (0.189,0.647) 0.001559352 0.8544444
#>     pe
#> 1 0.75

References:

  1. Gwet, K.L. (2014, ISBN:978-0970806284). “Handbook of Inter-Rater Reliability,” 4th Edition. Advanced Analytics, LLC