[R] Selecting rows of a matrix based on some condition on the columns

David Winsemius dwinsemius at comcast.net
Fri Mar 5 05:49:52 CET 2010


On Mar 4, 2010, at 10:59 PM, Juliet Ndukum wrote:

> The data set consists of two sets of matrices, as labelled by the  
> columns, T's and C's.
>
>> xy
>       x    T1    T2    T3    T4    T5    C1    C2    C3    C4    C5
> [1,] 50  0.00  0.00 33.75  0.00  0.00  0.00 36.76  0.00 35.26  0.00
> [2,] 13 34.41  0.00  0.00 36.64 32.86 34.11 35.80 37.74  0.00  0.00
> [3,] 14 35.85  0.00 33.88 36.68 34.88 34.58  0.00 32.75 37.45  0.00
> [4,] 33 34.56  0.00  0.00 36.00  0.00  0.00 36.56  0.00 34.83  0.00
> [5,] 66 36.38 37.42  0.00 32.47 34.05  0.00  0.00  0.00  0.00  0.00
> [6,] 22  0.00  0.00 31.07 31.63 37.51  0.00 39.34 34.91 35.51  0.00
> [7,] 25  0.00  0.00  0.00 36.11 34.24  0.00 34.07 32.72  0.00  0.00
> [8,]  9 33.63  0.00 38.43  0.00 35.72 32.95 36.40 38.57 34.19 32.47
> [9,] 87 35.22  0.00  0.00 35.31  0.00  0.00 34.55 35.14 38.12  0.00
> [10,] 99  0.00  0.00 34.94  0.00  0.00 33.54  0.00 34.39 34.54  0.00
>
> First, I wish to select for each row, all columns that have at least  
> a T and a C.  Based on the code below, I got exactly what I need.
>
>> t1all <- apply(xy,1,function(x) any((x[2]>0|x[3]>0|x[4]>0|x[5]>0| 
>> x[6]>0)&(x[7]>0 |x[8]>0 |x[9]>0|x[10]>0|x[11]>0)))
>> mat.t1all <- xy[t1all,]
>> mat.t1all
>       x    T1 T2    T3    T4    T5    C1    C2    C3    C4    C5
> [1,] 50  0.00  0 33.75  0.00  0.00  0.00 36.76  0.00 35.26  0.00
> [2,] 13 34.41  0  0.00 36.64 32.86 34.11 35.80 37.74  0.00  0.00
> [3,] 14 35.85  0 33.88 36.68 34.88 34.58  0.00 32.75 37.45  0.00
> [4,] 33 34.56  0  0.00 36.00  0.00  0.00 36.56  0.00 34.83  0.00
> [5,] 22  0.00  0 31.07 31.63 37.51  0.00 39.34 34.91 35.51  0.00
> [6,] 25  0.00  0  0.00 36.11 34.24  0.00 34.07 32.72  0.00  0.00
> [7,]  9 33.63  0 38.43  0.00 35.72 32.95 36.40 38.57 34.19 32.47
> [8,] 87 35.22  0  0.00 35.31  0.00  0.00 34.55 35.14 38.12  0.00
> [9,] 99  0.00  0 34.94  0.00  0.00 33.54  0.00 34.39 34.54  0.00
>
> Then, I need the rows for which there are at least two T's and two  
> C's. Using a similar code as above, I get the following output:
>
>> t2all <- apply(xy,1,function(x) any(((x[2]>0&x[3]>0)|(x[2]>0&x[4]>0)|
> + (x[2]>0&x[5]>0)|(x[2]>0&x[6]>0)|(x[3]>0&x[4]>0)|(x[3]>0&x[5]>0)|
> + (x[3]>0&x[6]>0)|(x[4]>0&x[5]>0)|(x[4]>0&x[6]>0)|(x[5]>0&x[6]>0))
> +
> + &(( (x[7]>0&x[8]>0)|(x[7]>0&x[9]>0)|(x[7]>0&x[10]>0)| 
> (x[7]>0&x[11]>0)|
> + (x[8]>0&x[9]>0)|(x[8]>0&x[10]>0)|(x[8]>0&x[11]>0)|(x[9]>0&x[10]>0)|
> + (x[9]>0&x[11]>0)|(x[10]>0&x[11]>0) ))))
>>
>> mat.t2all <- xy[t2all,]
>> mat.t2all
>      x    T1 T2    T3    T4    T5    C1    C2    C3    C4    C5
> [1,] 13 34.41  0  0.00 36.64 32.86 34.11 35.80 37.74  0.00  0.00
> [2,] 14 35.85  0 33.88 36.68 34.88 34.58  0.00 32.75 37.45  0.00
> [3,] 33 34.56  0  0.00 36.00  0.00  0.00 36.56  0.00 34.83  0.00
> [4,] 22  0.00  0 31.07 31.63 37.51  0.00 39.34 34.91 35.51  0.00
> [5,] 25  0.00  0  0.00 36.11 34.24  0.00 34.07 32.72  0.00  0.00
> [6,]  9 33.63  0 38.43  0.00 35.72 32.95 36.40 38.57 34.19 32.47
> [7,] 87 35.22  0  0.00 35.31  0.00  0.00 34.55 35.14 38.12  0.00
>
> For three T's and three C's, I got
>
>> t3all <- apply(xy,1,function(x) any(( (x[2]>0&x[3]>0&x[4]>0)|
> + (x[2]>0&x[3]>0&x[5]>0)|(x[2]>0&x[3]>0&x[6]>0)| 
> (x[2]>0&x[4]>0&x[5]>0)|
> + (x[2]>0&x[4]>0&x[6])|(x[2]>0&x[5]>0&x[6]>0)|
> + (x[3]>0&x[4]>0&x[5]>0)|(x[3]>0&x[4]>0&x[6]>0)|
> + (x[4]>0&x[5]>0&x[6]>0) )
> +
> + &( (x[7]>0&x[8]>0&x[9]>0)|
> + (x[7]>0&x[8]>0&x[10]>0)|(x[7]>0&x[8]>0&x[11]>0)| 
> (x[7]>0&x[9]>0&x[10]>0)|
> + (x[7]>0&x[9]>0&x[11])|(x[7]>0&x[10]>0&x[11]>0)|
> + (x[8]>0&x[9]>0&x[10]>0)|(x[8]>0&x[9]>0&x[11]>0)|
> + (x[9]>0&x[10]>0&x[11]>0) ) ))
>>
>> mat.t3all <- xy[t3all,]
>> mat.t3all
>      x    T1 T2    T3    T4    T5    C1    C2    C3    C4    C5
> [1,] 13 34.41  0  0.00 36.64 32.86 34.11 35.80 37.74  0.00  0.00
> [2,] 14 35.85  0 33.88 36.68 34.88 34.58  0.00 32.75 37.45  0.00
> [3,] 22  0.00  0 31.07 31.63 37.51  0.00 39.34 34.91 35.51  0.00
> [4,]  9 33.63  0 38.43  0.00 35.72 32.95 36.40 38.57 34.19 32.47
>
>
> Can someone help me with a better, and more efficient code that will  
> handle this, thank you in advance for your help.
> JN

 > xy <- data.matrix(read.table(textConnection("
+       x    T1    T2    T3    T4    T5    C1    C2    C3    C4    C5
+  50  0.00  0.00 33.75  0.00  0.00  0.00 36.76  0.00 35.26  0.00
+  13 34.41  0.00  0.00 36.64 32.86 34.11 35.80 37.74  0.00  0.00
+  14 35.85  0.00 33.88 36.68 34.88 34.58  0.00 32.75 37.45  0.00
+  33 34.56  0.00  0.00 36.00  0.00  0.00 36.56  0.00 34.83  0.00
+  66 36.38 37.42  0.00 32.47 34.05  0.00  0.00  0.00  0.00  0.00
+  22  0.00  0.00 31.07 31.63 37.51  0.00 39.34 34.91 35.51  0.00
+  25  0.00  0.00  0.00 36.11 34.24  0.00 34.07 32.72  0.00  0.00
+   9 33.63  0.00 38.43  0.00 35.72 32.95 36.40 38.57 34.19 32.47
+  87 35.22  0.00  0.00 35.31  0.00  0.00 34.55 35.14 38.12  0.00
+  99  0.00  0.00 34.94  0.00  0.00 33.54  0.00 34.39 34.54  0.00"),  
header=TRUE) )

These two vectors should give more economical summary objects with  
which to work:

 > rowSums(xy[, grep("T", colnames(xy))] > 0)
  [1] 1 3 4 2 4 3 2 3 2 1
 > rowSums(xy[, grep("C", colnames(xy))] > 0)
  [1] 2 3 3 2 0 3 2 5 3 3

Or if you want to see them side by side:

 > cbind(rowSums(xy[, grep("T", colnames(xy))] > 0),
            rowSums(xy[, grep("C", colnames(xy))] > 0) )
       [,1] [,2]
  [1,]    1    2
  [2,]    3    3
  [3,]    4    3
  [4,]    2    2
  [5,]    4    0
  [6,]    3    3
  [7,]    2    2
  [8,]    3    5
  [9,]    2    3
[10,]    1    3

> --

David Winsemius, MD
Heritage Laboratories
West Hartford, CT



More information about the R-help mailing list