[R] Count unique rows/columns in a matrix

Sat Jan 12 20:15:31 CET 2008

On Sat, Jan 12, 2008 at 12:35:47PM -0500, John Kane wrote:
> I definately did not read it that way but that may
> have been my fault.  That table approach is quite
> nice!
> 
> Using it, you could just rebuild the vectors from the
> names. Does this do more or less what you want?

John, thanks. Still not good enough. :( The problem is not that the 
result was in string format, but that not the real values are 
compared, only the rounded values to six (?) decimals. I know this is only 
the default and more could be done by setting some parameters 
(probably options(digits) is enough), but then it is not very efficient, 
since instead of comparing 8 byte doubles i'll be comparing quite long 
strings for every single number in the matrix. This seems quite a hack
to me.

I'm thinking about the following solution. We hash every row/column 
of the matrix, then sort the hashed values, and compare only those 
rows/columns for which the hash values are the same. (With the proper
comparision, ie. via "==" or all.equal.) 

Of course i'm not completely sure that this is faster than comparing
long strings, but i'll give it a try. I have quite big matrices,
that's why i need an efficient solution.

(I'm sending this to the list, because someone else was also
interested, but i lost his email address.)

Gabor

> X<-matrix(c(1,2,3,1,2,3,4,5,6,1,3,2,4,5,6,1,1,1),6,3,byrow=TRUE)
> xx <-table(apply(X, 1, paste, collapse=","))
> hh <- names(xx)
> nnk  <-(strsplit(hh, ","))
> kkn  <- lapply(nnk, as.numeric)
> df1 <-t(as.data.frame(kkn))
> cbind(df1,xx)
> 
[...]

-- 
Csardi Gabor <csardi at rmki.kfki.hu>    UNIL DGM