[R] Tagging identical rows of a matrix

Gabor Grothendieck ggrothendieck at myway.com
Fri May 14 23:03:45 CEST 2004


Waichler, Scott R <Scott.Waichler <at> pnl.gov> writes:

> 
> Thanks to all of you who responded to my help request.
> Here is the very efficient upshot of your advice:
> 
> > mat2 <- apply(mat, 1, paste, collapse=":")
> > vec <- match(mat2, unique(mat2))
> > vec
> [1] 1 2 1 1 2 3
> 
> 
> P.S.  I found that Andy Liaw's method didn't preserve the
> index order that I wanted; it yields
> 
> 2 3 2 2 3 1
> 
> To get the order of integers I was looking for required an
> invocation of unique:
> 
> as.numeric(factor(apply(mat, 1, paste, collapse=":"),
>                   levels=unique(apply(mat, 1, paste, collapse=":"))))
> 
> But the first method above is obviously cleaner and is twice
> as fast, only 9 seconds for a 100000 row matrix on an ordinary PC.  

The interaction solution gives an identical result, is shorter and
is one or two orders of magnitude faster.  Here is a comparison of the three:

R> set.seed(1)
R> mat <- matrix(sample(20,100000,rep=T),50000)
R> 
R> f0 <- function(mat) {
+ mat2 <- apply(mat, 1, paste, collapse=":");
+ match(mat2, unique(mat2))
+ }
R> 
R> 
R> f1 <- function(mat) { z <- apply(mat, 1, paste, collapse=":")
+ as.numeric(factor(z,levels=unique(z)))
+ }
R> 
R> f2 <- function(mat) as.numeric(interaction(mat[,1],mat[,2],drop=T))
R> 
R> dummy <- gc(); system.time(z0 <- f0(mat))
[1] 5.24 0.02 5.52   NA   NA
R> dummy <- gc(); system.time(z1 <- f1(mat))
[1] 5.18 0.00 5.52   NA   NA
R> dummy <- gc(); system.time(z2 <- f2(mat))
[1] 0.1 0.0 0.1  NA  NA
R> all.equal(z0,z1)
[1] TRUE
R> all.equal(z0,z2)
[1] TRUE
R> all.equal(z2,z1)
[1] TRUE




More information about the R-help mailing list