[R] Sampling the Distance Matrix

Lorenzo Isella lorenzo.isella at gmail.com
Thu Sep 24 21:36:42 CEST 2015


Hi,
And thanks for your reply.
Essentially, your script gets the job done.
For instance, if I run

mm <- cbind(5/(1:5), -2*sqrt(1:5))
dst <- dist(mm)
dst2 <- as.matrix(dst)
diag(dst2) <- NA
idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))

then it correctly detects the first two rows, where all the values are
larger than 0.9.
In other words, it detects the points that are at least 0.9 units away
from *all* the other points.
My other question (I did not realize this until I got your answer) is
the following: I have the distance matrix of a set of N points.
You gave me an algorithm two find all the points that are at least 0.9
units away from any other points.
However, in some cases, for me it is OK even a weaker condition: find
a subset of k points (with k tunable) whose distance *from each other*
is greater than 0.9 units (even if their distance from some other
points may be smaller than 0.9).
Any idea about how to tackle that? Is it simply a matter of detecting
the row and column numbers of all the entries of the distance matrix
larger than 0.9?
Many thanks

Lorenzo



On Wed, Sep 23, 2015 at 09:23:04PM +0000, David L Carlson wrote:
>I think the OP wanted rows where all values were greater than .9.
>If so, this works:
>
>> set.seed(42)
>> dst <- dist(cbind(rnorm(20), rnorm(20)))
>> dst2 <- as.matrix(dst)
>> diag(dst2) <- NA
>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>> idx
>13 18 19
>13 18 19
>> dst2[idx, idx]
>         13       18       19
>13       NA 2.272407 3.606054
>18 2.272407       NA 1.578150
>19 3.606054 1.578150       NA
>
>-------------------------------------
>David L Carlson
>Department of Anthropology
>Texas A&M University
>College Station, TX 77840-4352
>
>
>
>-----Original Message-----
>From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
>Sent: Wednesday, September 23, 2015 3:23 PM
>To: Lorenzo Isella
>Cc: r-help at r-project.org
>Subject: Re: [R] Sampling the Distance Matrix
>
>> mm <- cbind(1/(1:5), sqrt(1:5))
>> d <- dist(mm)
>> d
>          1         2         3         4
>2 0.6492864
>3 0.9901226 0.3588848
>4 1.2500000 0.6369033 0.2806086
>5 1.4723668 0.8748970 0.5213550 0.2413050
>> which(as.matrix(d)>0.9, arr.ind=TRUE)
>  row col
>3   3   1
>4   4   1
>5   5   1
>1   1   3
>1   1   4
>1   1   5
>I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9
>
>The as.matrix(d) is needed because dist returns the lower triangle of
>the distance
>matrix and an object of class "dist" and as.matrix.dist converts that
>into a matrix.
>
>Bill Dunlap
>TIBCO Software
>wdunlap tibco.com
>
>
>On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella
><lorenzo.isella at gmail.com> wrote:
>> Dear All,
>> Suppose you have a distance matrix stored like a dist object, for
>> instance
>>
>> x<-rnorm(20)
>> y<-rnorm(20)
>>
>> mm<-as.matrix(cbind(x,y))
>>
>> dst<-(dist(mm))
>>
>> Now, my problem is the following: I would like to get the rows of mm
>> corresponding to points whose distance is always larger of, let's say,
>> 0.9.
>> In other words, if I were to compute the distance matrix on those
>> selected rows of mm, apart from the diagonal, I would get all entries
>> larger than 0.9.
>> Any idea about how I can efficiently code that?
>> Regards
>>
>> Lorenzo
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list