[R] convenient way to calculate specificity, sensitivity and accuracy from raw data

Mon Sep 1 12:16:39 CEST 2008

try something like this:

dat <- read.table(textConnection("video 1 2 3 4 5 6 7 8 9 10 11 12 13  
14 15 16 17 18 19 20 21
1      1 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
2      2 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1
3      3 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
4      4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
5      5 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1  0
6      6 0 0 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0
7      7 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
8      8 0 0 0 0 0 0 0 0 0  0  0  0  0  0  1  0  0  0  0  0  0
9      9 0 0 0 0 0 0 0 0 0  1  0  1  1  0  1  1  0  0  0  1  0
10    10 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
11    11 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
12    12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
13    13 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
14    14 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
15    15 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
16    16 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
17    17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
18    18 0 0 0 0 1 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1
19    19 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
20    20 0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
21    21 0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  1
22    22 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
23    23 0 1 0 0 1 0 1 0 0  1  0  0  1  1  0  0  1  0  0  0  0
24    24 0 0 0 0 0 0 0 0 0  0  0  0  1  1  1  1  0  1  0  0  1
25    25 0 0 0 0 0 0 0 0 0  0  0  1  0  0  1  1  0  0  0  0  0
26    26 0 0 0 0 0 0 0 0 0  0  0  1  0  0  0  0  0  0  0  0  0
27    27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
28    28 0 1 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
29    29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
30    30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
31    31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
32    32 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
33    33 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
34    34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
35    35 0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0
36    36 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
37    37 0 1 1 0 1 0 0 1 0  0  0  0  1  1  1  0  1  0  0  1  1
38    38 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
39    39 0 1 0 0 1 0 0 1 0  1  1  0  1  1  0  0  1  1  0  1  1
40    40 1 1 1 1 1 0 1 0 0  0  0  1  1  1  1  0  0  1  0  0  1
41    41 0 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0  1
42    42 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0"),  
header = TRUE)
closeAllConnections()

goldstand <- dat$X21
prev <- sum(goldstand)
cprev <- sum(!goldstand)
n <- prev + cprev
lapply(dat[-1], function(x){
	tab <- table(x, goldstand)
	cS <- colSums(tab)
	if(nrow(tab) > 1 && ncol(tab) > 1) {
		out <- c(sp = tab[1,1], sn = tab[2,2]) / cS
		c(out, ac = (out[1] * cprev + out[2] * prev) / n)
	}
})

I hope it helps.

Best,
Dimitris

Quoting drflxms <drflxms at googlemail.com>:

> Dear R-colleagues,
>
> this is a question from a R-newbie medical doctor:
>
> I am evaluating data on inter-observer-reliability in endoscopy. 20
> medical doctors judged 42 videos filling out a multiple choice survey
> for each video. The overall-data is organized in a classical way:
> observations (items from the multiple choice survey) as columns, each
> case (identified by the two columns "number of medical doctor" and
> "number of video") in a row. In addition there is a medical doctor
> number 21 who is assumed to be a gold-standard.
>
> As measure of  inter-observer-agreement I calculated kappa according to
> Fleiss and simple agreement in percent using the routines
> "kappam.fleiss" and "agree" from the irr-package. Everything worked fine
> so far.
>
> Now I'd like to calculate specificity, sensitivity and accuracy for each
> item (compared to the gold-standard), as these are well-known and easy
> to understand quantities for medical doctors.
>
> Unfortunately I haven't found a feasible way to do this in R so far. All
> solutions I found, describe calculation of specificity, sensitivity and
> accuracy from a contingency-table / confusion-matrix only. For me it is
> very difficult to create such contingency-tables / confusion-matrices
> from the raw data I have.
>
> So I started to do it in Excel by hand - a lot of work! When I'll keep
> on doing this, I'll miss the deadline. So maybe someone can help me out:
>
> It would be very convenient, if there is way to calculate specificity,
> sensitivity and accuracy from the very same data.frames I created for
> the calculation of kappa and agreement. In these data.frames, which were
> generated from the overall-data-table described above using the
> "reshape" package, we have the judging medical doctor in the columns and
> the videos in the rows. In the cells there are the coded answer-options
> from the multiple choice survey. Please see an simple example with
> answer-options 0/1 (copied from R console) below:
>
>  video 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
> 1      1 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 2      2 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1
> 3      3 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 4      4 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 5      5 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  1  0
> 6      6 0 0 0 0 0 0 0 0 0  0  0  0  0  1  0  0  0  0  0  0  0
> 7      7 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 8      8 0 0 0 0 0 0 0 0 0  0  0  0  0  0  1  0  0  0  0  0  0
> 9      9 0 0 0 0 0 0 0 0 0  1  0  1  1  0  1  1  0  0  0  1  0
> 10    10 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 11    11 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 12    12 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 13    13 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 14    14 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 15    15 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 16    16 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 17    17 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 18    18 0 0 0 0 1 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  1
> 19    19 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 20    20 0 1 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 21    21 0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  1
> 22    22 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 23    23 0 1 0 0 1 0 1 0 0  1  0  0  1  1  0  0  1  0  0  0  0
> 24    24 0 0 0 0 0 0 0 0 0  0  0  0  1  1  1  1  0  1  0  0  1
> 25    25 0 0 0 0 0 0 0 0 0  0  0  1  0  0  1  1  0  0  0  0  0
> 26    26 0 0 0 0 0 0 0 0 0  0  0  1  0  0  0  0  0  0  0  0  0
> 27    27 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 28    28 0 1 0 1 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 29    29 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 30    30 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 31    31 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 32    32 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 33    33 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 34    34 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 35    35 0 0 0 0 0 0 1 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 36    36 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 37    37 0 1 1 0 1 0 0 1 0  0  0  0  1  1  1  0  1  0  0  1  1
> 38    38 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
> 39    39 0 1 0 0 1 0 0 1 0  1  1  0  1  1  0  0  1  1  0  1  1
> 40    40 1 1 1 1 1 0 1 0 0  0  0  1  1  1  1  0  0  1  0  0  1
> 41    41 0 0 0 0 0 0 0 0 0  1  0  0  0  0  0  0  0  0  0  0  1
> 42    42 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0  0  0  0  0  0  0
>
> What I did in Excel is: Creating the very same tables using
> pivot-charts. Comparing columns 1-20 to column 21 (gold-standard),
> summing up the count of values that are identical to 21. I repeated this
> for each answer-option. From the results, one can easily calculate
> specificity, sensitivity and accuracy.
>
> How to do this, or something similar leading to the same results in R?
> I'd appreciate any kind of help very much!
>
> Greetings from Munich,
> Felix
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
Dimitris Rizopoulos
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/(0)16/336899
Fax: +32/(0)16/337015
Web: http://med.kuleuven.be/biostat/
      http://perswww.kuleuven.be/dimitris_rizopoulos/

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm