[R] self-defined distance function to be computed on matrix

Peter Langfelder peter.langfelder at gmail.com
Thu Aug 30 21:17:23 CEST 2012


On Thu, Aug 30, 2012 at 10:48 AM, zz <czhang at uams.edu> wrote:
> Hello,
>
> I have a self-defined function to be computed on each column in a matrix.
> The basic idea is to ignore the elements that have value of 0 during
> computation.
>
> I should be able to write my own function but it could be computational
> expensive, so I'd love to ask if anyone may have suggestions on how to
> implement it more efficiently.  Thanks in advance.
>
> For example, there are three vectors in the matrix, which are
> A       B       C
> 1       0       1
> -1      1       1
> -1      -1      1
> 1       0       -1
>
> Distance(AB) = (-1X1+(-1)X(-1))/de(AB) , and
> de(AB) = sqrt(square(-1)+square(-1)) X sqrt(square(1)+square(-1))
>
> Distance(BC) = (1X1+(-1)X1)/de(BC) ,and
> de(BC) = sqrt(square(1)+square(-1)) X sqrt(square(1)+square(1))
>
> Distance(AC) = (1X1+(-1)X1+(-1)X1+1X(-1))/de(AC), and
> de(BC) = sqrt(square(1)+square(-1)+square(-1)+square(1)) X
> sqrt(square(1)+square(1)+square(1)+square(-1))
>
> As you may see, the numerator is basically the dot product of the two
> vectors; this function actually is more like the cosine function in R, but
> with some variations.
>

If I understand it correctly, you are trying to calculate the "cosine
correlation" while excluding all rows where one of the wto columns has
a zero? There may be other ways to do it, but (shameless plug) my
package WGCNA defines a replacement for the usual correlation function
cor() that lets you specify the argument cosine  = TRUE to calculate
cosine correlation (i.e., Pearson correlation without centering). To
ignore the zeroes, turn them into NA, and specify argument
use = "pairwise.complete.obs" (or just use = "p") to the function cor.

So define a matrix (say ABC), set all zero values to NA

ABC[ABC==0] = NA

then issue

library(WGCNA)
sim = cor(ABC, cosine = TRUE, use = 'p')

Note that the correlation gives you a similarity; to turn it into a
dissimilarity or distance you have to subtract it from 1

dissim = 1-sim

HTH,

Peter




More information about the R-help mailing list