[R] Proportion of equal entries in dist()?

Tue Jan 20 19:10:26 CET 2015

Jorge,

I have not used it myself, but you might find the dist() function in the
proxy package to be useful.

http://cran.r-project.org/web/packages/proxy/index.html

Jean

On Mon, Jan 19, 2015 at 7:38 AM, Jorge I Velez <jorgeivanvelez at gmail.com>
wrote:

> Dear all,
>
> Given vectors "x" and "y", I would like to compute the proportion of
> entries that are equal, that is, mean(x == y).
>
> Now, suppose I have the following matrix:
>
> n <- 1e2
> m <- 1e4
> X <- matrix(sample(0:2, m*n, replace = TRUE), ncol = m)
>
> I am interested in calculating the above proportion for every pairwise
> combination of rows.  I came up with the following:
>
> myd <- function(X, p = NROW(X)){
> D <- matrix(NA, p, p)
> for(i in 1:p) for(j in 1:p) if(i > j) D[i, j] <- mean(X[i, ] == X[j,])
> D
> }
>
> system.time(d <- myd(X))
>
> However, in my application n and m are much more larger than in this
> example and the computational time might be an issue.  I would very much
> appreciate any suggestions on how to speed the "myd" function.
>
> Note:  I have done some experiments with the dist() function and despite
> being much, much, much faster than "myd", none of the default distances
> fits my needs.  I would also appreciate any suggestions on how to include
> "my own" distance function in dist().
>
> Thank you very much for your time.
>
> Best regards,
> Jorge Velez.-
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

	[[alternative HTML version deleted]]