[R] Tagging identical rows of a matrix
Gabor Grothendieck
ggrothendieck at myway.com
Fri May 14 23:03:45 CEST 2004
Waichler, Scott R <Scott.Waichler <at> pnl.gov> writes:
>
> Thanks to all of you who responded to my help request.
> Here is the very efficient upshot of your advice:
>
> > mat2 <- apply(mat, 1, paste, collapse=":")
> > vec <- match(mat2, unique(mat2))
> > vec
> [1] 1 2 1 1 2 3
>
>
> P.S. I found that Andy Liaw's method didn't preserve the
> index order that I wanted; it yields
>
> 2 3 2 2 3 1
>
> To get the order of integers I was looking for required an
> invocation of unique:
>
> as.numeric(factor(apply(mat, 1, paste, collapse=":"),
> levels=unique(apply(mat, 1, paste, collapse=":"))))
>
> But the first method above is obviously cleaner and is twice
> as fast, only 9 seconds for a 100000 row matrix on an ordinary PC.
The interaction solution gives an identical result, is shorter and
is one or two orders of magnitude faster. Here is a comparison of the three:
R> set.seed(1)
R> mat <- matrix(sample(20,100000,rep=T),50000)
R>
R> f0 <- function(mat) {
+ mat2 <- apply(mat, 1, paste, collapse=":");
+ match(mat2, unique(mat2))
+ }
R>
R>
R> f1 <- function(mat) { z <- apply(mat, 1, paste, collapse=":")
+ as.numeric(factor(z,levels=unique(z)))
+ }
R>
R> f2 <- function(mat) as.numeric(interaction(mat[,1],mat[,2],drop=T))
R>
R> dummy <- gc(); system.time(z0 <- f0(mat))
[1] 5.24 0.02 5.52 NA NA
R> dummy <- gc(); system.time(z1 <- f1(mat))
[1] 5.18 0.00 5.52 NA NA
R> dummy <- gc(); system.time(z2 <- f2(mat))
[1] 0.1 0.0 0.1 NA NA
R> all.equal(z0,z1)
[1] TRUE
R> all.equal(z0,z2)
[1] TRUE
R> all.equal(z2,z1)
[1] TRUE
More information about the R-help
mailing list