[R] Tagging identical rows of a matrix

Waichler, Scott R Scott.Waichler at pnl.gov
Fri May 14 22:12:08 CEST 2004


Thanks to all of you who responded to my help request.
Here is the very efficient upshot of your advice:

> mat2 <- apply(mat, 1, paste, collapse=":")
> vec <- match(mat2, unique(mat2))
> vec
[1] 1 2 1 1 2 3


P.S.  I found that Andy Liaw's method didn't preserve the
index order that I wanted; it yields

2 3 2 2 3 1

To get the order of integers I was looking for required an
invocation of unique:

as.numeric(factor(apply(mat, 1, paste, collapse=":"),
                  levels=unique(apply(mat, 1, paste, collapse=":"))))

But the first method above is obviously cleaner and is twice
as fast, only 9 seconds for a 100000 row matrix on an ordinary PC.  

Regards,
Scott Waichler

> > I would like to generate a vector having the same length
> > as the number of rows in a matrix.  The vector should contain an 
> > integer indicating the "group" of the row, where identical 
> matrix rows 
> > are in a group, and a unique row has a unique integer. Thus, for
> >
> > a <- c(1,2)
> > b <- c(1,3)
> > c <- c(1,2)
> > d <- c(1,2)
> > e <- c(1,3)
> > f <- c(2,1)
> > mat <- rbind(a,b,c,d,e,f)
> >
> > I would like to get the vector c(1,2,1,1,2,3).  I know dist() gives 
> > part of the answer, but I can't figure out how to use it for this 
> > purpose without doing a lot of looping.  I need to apply this to 
> > matrices up to ~100000 rows.




More information about the R-help mailing list