[R] selecting only corresponding categories from a confusion matrix

Mon Nov 29 14:49:56 CET 2010

On Nov 29, 2010, at 8:32 AM, drflxms wrote:

> Dear R colleagues,
>
> as a result of my calculations regarding the inter-observer- 
> variability
> in bronchoscopy, I get a confusion matrix like the following:
>
>       0   1 1001 1010  11
> 0    609  11   54   36   6
> 1      1   2    6    0   2
> 10    14   0    0    8   4
> 100    4   0    0    0   0
> 1000  23   7   12   10   5
> 1001   0   0    4    0   0
> 1010   4   0    0    3   0
> 1011   1   0    1    0   2
> 11     0   0    3    3   1
> 110    1   0    0    0   0
> 1100   2   0    0    0   0
> 1110   1   0    0    0   0
>
> The first column represents the categories found among observers, the
> top row represents the categories found by the reference  
> ("goldstandard").
> I am looking for a way (general algorithm) to extract a data.frame  
> with
> only the corresponding categories among observers and reference from  
> the
> above confusion matrix. "Corresponding" means in this case, that a
> category has been chosen by both: observers and reference.
> In this example corresponding categories would be simply all  
> categories
> that have been chosen by the reference (0,1,1001,1010,11), but  
> generally
> there might also occur categories which are found by the reference  
> only
> (and not among observers - in the first column).
> So the solution-dataframe for the above example would look like:
>
>       0   1 1001 1010  11
> 0    609  11   54   36   6
> 1      1   2    6    0   2
> 1001   0   0    4    0   0
> 1010   4   0    0    3   0
> 11     0   0    3    3   1

I wasn't able to follow the confusing, er, confusion matrix  
explanation but it appears from a comparison of the input and output  
that you just want row indices that are the  column names:

 > mtx[colnames(mtx), ]
        0  1 1001 1010 11
0    609 11   54   36  6
1      1  2    6    0  2
1001   0  0    4    0  0
1010   4  0    0    3  0
11     0  0    3    3  1
 >
 > # and the omitted
 >
 > mtx[!rownames(mtx) %in% colnames(mtx), ]
       0 1 1001 1010 11
10   14 0    0    8  4
100   4 0    0    0  0
1000 23 7   12   10  5
1011  1 0    1    0  2
110   1 0    0    0  0
1100  2 0    0    0  0
1110  1 0    0    0  0
 >
 > # and their number:
 >
 > NROW(mtx[!rownames(mtx) %in% colnames(mtx), ])
[1] 7


>
> All the categories found among observers only, were omitted.
>
> If the solution algorithm would include a method to list the omitted
> categories and to count their number as well as the number of omitted
> cases, it would be just perfect for me.

David Winsemius, MD
West Hartford, CT