[R] high and lowest with names

Thu Oct 13 17:48:14 CEST 2011

On Oct 13, 2011, at 10:42 AM, Ben qant wrote:

> Here is a more R'sh solution (speed unknown).

Really? The intermediate, potentially large, objects seem to be  
proliferating.

> Courtesy of Mark Leeds (I
> modified it a bit to generalize it for a cnt input and get min and  
> max).
> Again, getting cnt highest and lowest values in the entire matrix and
> display the data point row and column names with each:

1) For max (or min) I would have thought that one could have much more  
easily gathered the maximum and minimum locations with:

  which(x == max(x), arr.ind=TRUE)   # Bert Gunter's discarded  
suggestion

... and used the results as indices into x or rownames(x) or  
colnames(x). But I made no earlier comments because it did not appear  
that you had provided the swiss$Education object in a form that could  
be easily extracted for testing. I see now that setting up a similar  
object was fairly easy, but would encourage you to consider the `dput`  
function for such problem construction in the future;

dat2 <- matrix(sample(1:25, 25), 5,5)
colnames(dat2) = c('a','b','c','d','e')
rownames(dat2) = c('z','y','x','w','v')
arrns <- which(dat2 == max(dat2), arr.ind=TRUE)
 > arrns
   row col
v   5   1
 > colnames(dat2)[arrns[,2]] ; rownames(dat2)[arrns[,1]]
[1] "a"
[1] "v"

2) For display of all results with row/column labels :

rbind(dat2, rownames(dat2)[row(dat2)], colnames(dat2)[row(dat2)])

3) For display of values of "bottom five" and top five:

  dat2five <- which(dat2 <= c(dat2)[order(dat2)][5], arr.ind=TRUE)
  rbind( dat2LT5= dat2[dat2five],
           Rows = rownames(dat2)[ dat2five[,1] ],
           Cols = colnames(dat2)[ dat2five[,2] ])
#--------------
         [,1] [,2] [,3] [,4] [,5]
dat2LT5 "2"  "3"  "5"  "1"  "4"
Rows    "x"  "w"  "y"  "y"  "x"
Cols    "a"  "a"  "c"  "d"  "d"

dat2topfive <- which(dat2 >= c(dat2)[rev(order(dat2))][5], arr.ind=TRUE)
  rbind( dat2top5= dat2[dat2topfive],
           Rows = rownames(dat2)[ dat2topfive[,1] ],
           Cols = colnames(dat2)[ dat2topfive[,2] ])
#---------------
          [,1] [,2] [,3] [,4] [,5]
dat2top5 "24" "25" "23" "22" "21"
Rows     "z"  "v"  "y"  "w"  "v"
Cols     "a"  "a"  "b"  "e"  "e"

>
>> x <- swiss$Education[1:25]
>> dat = matrix(x,5,5)
>> colnames(dat) = c('a','b','c','d','e')
>> rownames(dat) = c('z','y','x','w','v')
>> cnt = 10
>> #===============================================
>> print(dat)
>   a  b  c  d  e
> z 12  7  6  2 10
> y  9  7 12  8  3
> x  5  8  7 28 12
> w  7  7 12 20  6
> v 15 13  5  9  1
>>
>> # MAKE IT A VECTOR FOR EASIER ORDERING
>> datasvec <- as.vector(dat)
>> # ORDER IT
>> datasvecordered<- order(datasvec)
>> # RECYCLE ROWS AND COLUMNS NAMES FOR EASIER MAPPING
>> recycledcols <- rep(colnames(dat),each=nrow(dat))
>> recycledrows <- rep(rownames(dat),times=ncol(dat))
>>
>> # GET THE VALUES, THE ROW NAMES AND THE COLUMN NAMES
>> len = length(datasvecordered)
>> rr_len = length(recycledrows)
>>
> rbind(datasvec[datasvecordered][(len- 
> cnt):len],recycledrows[datasvecordered][(rr_len- 
> cnt):rr_len],recycledcols[datasvecordered][(rr_len-cnt):rr_len])
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
> [1,] "9"  "9"  "10" "12" "12" "12" "12" "13" "15" "20"  "28"
> [2,] "y"  "v"  "z"  "z"  "y"  "w"  "x"  "v"  "v"  "w"   "x"
> [3,] "a"  "d"  "e"  "a"  "c"  "c"  "e"  "b"  "a"  "d"   "d"
>>
> rbind(datasvec[datasvecordered][1:cnt],recycledrows[datasvecordered] 
> [1:cnt],recycledcols[datasvecordered][1:cnt])
>     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
> [1,] "1"  "2"  "3"  "5"  "5"  "6"  "6"  "7"  "7"  "7"
> [2,] "v"  "z"  "y"  "x"  "v"  "z"  "w"  "w"  "z"  "y"
> [3,] "e"  "d"  "e"  "a"  "c"  "c"  "e"  "a"  "b"  "b"
>
> enjoy
>
> ben
>
> On Wed, Oct 12, 2011 at 11:47 AM, Ben qant <ccquant at gmail.com> wrote:
>
>> Hello,
>>
>> This is my solution. This is pretty fast (tested with a larger data  
>> set)!
>> If you have a more elegant way to do it (of similar speed), please  
>> reply.
>> Thanks for the help!
>>
>> ################## get highest and lowest values and names of a  
>> matrix
>> # create sample data
>>
>> x <- swiss$Education[1:25]
>> dat = matrix(x,5,5)
>> colnames(dat) = c('a','b','c','d','e')
>>
>> rownames(dat) = c('z','y','x','w','v')
>>
>> #my solution
>>
>> nms = dimnames(dat) #get matrix row and col names
>> cnt = 10 # number of max and mins to get
>>
>> tmp = dat
>> mxs = list("list",cnt)
>> mns = list("list",cnt)
>> for(i in 1:cnt){
>>  #get maxes
>>  mx_dims = arrayInd(which.max(tmp), dim(tmp)) # get max dims for  
>> entire
>> matrix note: which.max also removes NA's
>>  mx_nm = c(nms[[1]][mx_dims[1]],nms[[2]][mx_dims[2]]) #get names
>>  mx = tmp[mx_dims] # get max value
>>  mxs[[i]] = c(mx,mx_nm) # add max and dim names to list of maxes
>>  tmp[mx_dims] = NA #removes last max so new one is found
>>
>>  #get mins (basically same as above)
>>  mn_dims = arrayInd(which.min(tmp), dim(tmp))
>>  mn_nm = c(nms[[1]][mn_dims[1]],nms[[2]][mn_dims[2]])
>>  mn = tmp[mn_dims]
>>  mns[[i]] = c(mn,mn_nm)
>>  tmp[mn_dims] = NA
>> }
>>
>> mxs
>> mns
>>
>> # end
>>
>> Regards,
>>
>> Ben
>>
>>
>> On Tue, Oct 11, 2011 at 5:32 PM, "Dénes TÓTH" <tdenes at cogpsyphy.hu>  
>> wrote:
>>
>>>
>>> which.max is even faster:
>>>
>>> dims <- c(1000,1000)
>>> tt <- array(rnorm(prod(dims)),dims)
>>> # which
>>> system.time(
>>> replicate(100, which(tt==max(tt), arr.ind=TRUE))
>>> )
>>> # which.max (& arrayInd)
>>> system.time(
>>> replicate(100, arrayInd(which.max(tt), dims))
>>> )
>>>
>>> Best,
>>> Denes
>>>
>>>> But it's simpler and probably faster to use R's built-in  
>>>> capabilities.
>>>> ?which ## note the arr.ind argument!)
>>>>
>>>> As an example:
>>>>
>>>> test <- matrix(rnorm(24), nr = 4)
>>>> which(test==max(test), arr.ind=TRUE)
>>>>     row col
>>>> [1,]   2   6
>>>>
>>>> So this gives the row and column indices of the max, from which  
>>>> row and
>>>> column names can easily be obtained from the dimnames attribute  
>>>> of the
>>>> matrix.
>>>>
>>>> Note: This assumes that the object in question is a matrix, NOT a  
>>>> data
>>>> frame, for which it would be slightly more complicated.
>>>>
>>>> -- Bert
>>>>
>>>>
>>>> On Tue, Oct 11, 2011 at 3:06 PM, Carlos Ortega
>>>> <cof at qualityexcellence.es>wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> With this code you can find row and col names for the largest  
>>>>> value
>>>>> applied
>>>>> to your example:
>>>>>
>>>>> r.m.tmp<-apply(dat,1,max)
>>>>> r.max<-names(r.m.tmp)[r.m.tmp==max(r.m.tmp)]
>>>>>
>>>>> c.m.tmp<-apply(dat,2,max)
>>>>> c.max<-names(c.m.tmp)[c.m.tmp==max(c.m.tmp)]
>>>>>
>>>>> It's inmediate how to get the same for the smallest and build a
>>> function
>>>>> to
>>>>> calculate everything and return a list.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Carlos Ortega
>>>>> www.qualityexcellence.es
>>>>>
>>>>> 2011/10/11 Ben qant <ccquant at gmail.com>
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I'm looking to get the values, row names and column names of the
>>>>> largest
>>>>>> and
>>>>>> smallest values in a matrix.
>>>>>>
>>>>>> Example (except is does not include the names):
>>>>>>
>>>>>>> x <- swiss$Education[1:25]
>>>>>>> dat = matrix(x,5,5)
>>>>>>> colnames(dat) = c('a','b','c','d','c')
>>>>>>> rownames(dat) = c('z','y','x','w','v')
>>>>>>> dat
>>>>>>  a  b  c  d  c
>>>>>> z 12  7  6  2 10
>>>>>> y  9  7 12  8  3
>>>>>> x  5  8  7 28 12
>>>>>> w  7  7 12 20  6
>>>>>> v 15 13  5  9  1
>>>>>>
>>>>>>> #top 10
>>>>>>> sort(dat,partial=n-9:n)[(n-9):n]
>>>>>> [1]  9 10 12 12 12 12 13 15 20 28
>>>>>>> # bottom 10
>>>>>>> sort(dat,partial=1:10)[1:10]
>>>>>> [1] 1 2 3 5 5 6 6 7 7 7
>>>>>>
>>>>>> ...except I need the rownames and colnames to go along for the  
>>>>>> ride
>>>>> with
>>>>>> the
>>>>>> values...because of this, I am guessing the return value will  
>>>>>> need to
>>>>> be
>>>>> a
>>>>>> list since all of the values have different row and col names  
>>>>>> (which
>>>>> is
>>>>>> fine).
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Ben

David Winsemius, MD
West Hartford, CT