[R] selecting only corresponding categories from a confusion matrix
David Winsemius
dwinsemius at comcast.net
Mon Nov 29 14:49:56 CET 2010
On Nov 29, 2010, at 8:32 AM, drflxms wrote:
> Dear R colleagues,
>
> as a result of my calculations regarding the inter-observer-
> variability
> in bronchoscopy, I get a confusion matrix like the following:
>
> 0 1 1001 1010 11
> 0 609 11 54 36 6
> 1 1 2 6 0 2
> 10 14 0 0 8 4
> 100 4 0 0 0 0
> 1000 23 7 12 10 5
> 1001 0 0 4 0 0
> 1010 4 0 0 3 0
> 1011 1 0 1 0 2
> 11 0 0 3 3 1
> 110 1 0 0 0 0
> 1100 2 0 0 0 0
> 1110 1 0 0 0 0
>
> The first column represents the categories found among observers, the
> top row represents the categories found by the reference
> ("goldstandard").
> I am looking for a way (general algorithm) to extract a data.frame
> with
> only the corresponding categories among observers and reference from
> the
> above confusion matrix. "Corresponding" means in this case, that a
> category has been chosen by both: observers and reference.
> In this example corresponding categories would be simply all
> categories
> that have been chosen by the reference (0,1,1001,1010,11), but
> generally
> there might also occur categories which are found by the reference
> only
> (and not among observers - in the first column).
> So the solution-dataframe for the above example would look like:
>
> 0 1 1001 1010 11
> 0 609 11 54 36 6
> 1 1 2 6 0 2
> 1001 0 0 4 0 0
> 1010 4 0 0 3 0
> 11 0 0 3 3 1
I wasn't able to follow the confusing, er, confusion matrix
explanation but it appears from a comparison of the input and output
that you just want row indices that are the column names:
> mtx[colnames(mtx), ]
0 1 1001 1010 11
0 609 11 54 36 6
1 1 2 6 0 2
1001 0 0 4 0 0
1010 4 0 0 3 0
11 0 0 3 3 1
>
> # and the omitted
>
> mtx[!rownames(mtx) %in% colnames(mtx), ]
0 1 1001 1010 11
10 14 0 0 8 4
100 4 0 0 0 0
1000 23 7 12 10 5
1011 1 0 1 0 2
110 1 0 0 0 0
1100 2 0 0 0 0
1110 1 0 0 0 0
>
> # and their number:
>
> NROW(mtx[!rownames(mtx) %in% colnames(mtx), ])
[1] 7
>
> All the categories found among observers only, were omitted.
>
> If the solution algorithm would include a method to list the omitted
> categories and to count their number as well as the number of omitted
> cases, it would be just perfect for me.
David Winsemius, MD
West Hartford, CT
More information about the R-help
mailing list