[R] finding the most frequent row

Gabor Grothendieck ggrothendieck at myway.com
Sat Dec 11 01:42:48 CET 2004


Jean Eid <jeaneid <at> chass.utoronto.ca> writes:

: 
: You can do the following also
: 
: 
: X <- matrix(c(1,2,1,2,1,3,1,4), ncol=2)
: Y<-unique(X)
: Y[which.max(apply(Y, 1,function(i) sum(apply(matrix(1:nrow(X)),
: 1,function(x) identical(X[x,], i[1:2]))))),]
: 
: I do not know what your strategy is when there are multiple maxima i.e two
: different  rows appear at the same frequency. The above will ge the first
: row that appears to be the maximum  as an example suppose that we have

The two calls to apply in this last statement can be simplified 
by eliminating the indices:

Y[which.max(apply(Y, 1, function(y) sum( apply(X, 1, identical, y)))),]

Also, although less efficient, its interesting to note that
the two apply calls without the sum can be regarded as multiplying 
Y times X transpose under the identical() inner product:

# multiply x times y' under the inner product f.
# With f <- function(x,y)sum(x,y) it corresponds to ordinary 
# matrix multiplication of a times b transpose.
xyt <- function(a,b,f) apply(b,1,function(x)apply(a,1,function(y)f(x,y)))

# factoring out sum this generalized multiplication gives us:
Y[which.max(colSums( xyt(Y,X,identical) )),]




More information about the R-help mailing list