[R] Sampling the Distance Matrix

David Winsemius dwinsemius at comcast.net
Thu Sep 24 22:30:02 CEST 2015


On Sep 24, 2015, at 12:36 PM, Lorenzo Isella wrote:

> Hi,
> And thanks for your reply.
> Essentially, your script gets the job done.
> For instance, if I run
> 
> mm <- cbind(5/(1:5), -2*sqrt(1:5))
> dst <- dist(mm)
> dst2 <- as.matrix(dst)
> diag(dst2) <- NA
> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
> 
> then it correctly detects the first two rows, where all the values are
> larger than 0.9.
> In other words, it detects the points that are at least 0.9 units away
> from *all* the other points.
> My other question (I did not realize this until I got your answer) is
> the following: I have the distance matrix of a set of N points.
> You gave me an algorithm two find all the points that are at least 0.9
> units away from any other points.
> However, in some cases, for me it is OK even a weaker condition: find
> a subset of k points (with k tunable) whose distance *from each other*
> is greater than 0.9 units (even if their distance from some other
> points may be smaller than 0.9).

If I understand ..... Make a matrix of unique combinations, then apply by rows to get the qualifying columns that satisfy the distance criterion:

mtxcomb <- combn(1:20, 5)
goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx], y[idx]) ) > 0.9))
mtxcomb [ , goodcls]

In my sample it was around 9% of the total 5 item combinations.

snipped a lot of output:
.....
    [,1440] [,1441]
[1,]      12      13
[2,]      13      16
[3,]      16      17
[4,]      19      19
[5,]      20      20
> dim( mtxcomb)
[1]     5 15504


-- 
David

> Any idea about how to tackle that? Is it simply a matter of detecting
> the row and column numbers of all the entries of the distance matrix
> larger than 0.9?
> Many thanks
> 
> Lorenzo
> 
> 
> 
> On Wed, Sep 23, 2015 at 09:23:04PM +0000, David L Carlson wrote:
>> I think the OP wanted rows where all values were greater than .9.
>> If so, this works:
>> 
>>> set.seed(42)
>>> dst <- dist(cbind(rnorm(20), rnorm(20)))
>>> dst2 <- as.matrix(dst)
>>> diag(dst2) <- NA
>>> idx <- which(apply(dst2, 1, function(x) all(na.omit(x)>.9)))
>>> idx
>> 13 18 19
>> 13 18 19
>>> dst2[idx, idx]
>>        13       18       19
>> 13       NA 2.272407 3.606054
>> 18 2.272407       NA 1.578150
>> 19 3.606054 1.578150       NA
>> 
>> -------------------------------------
>> David L Carlson
>> Department of Anthropology
>> Texas A&M University
>> College Station, TX 77840-4352
>> 
>> 
>> 
>> -----Original Message-----
>> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of William Dunlap
>> Sent: Wednesday, September 23, 2015 3:23 PM
>> To: Lorenzo Isella
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Sampling the Distance Matrix
>> 
>>> mm <- cbind(1/(1:5), sqrt(1:5))
>>> d <- dist(mm)
>>> d
>>         1         2         3         4
>> 2 0.6492864
>> 3 0.9901226 0.3588848
>> 4 1.2500000 0.6369033 0.2806086
>> 5 1.4723668 0.8748970 0.5213550 0.2413050
>>> which(as.matrix(d)>0.9, arr.ind=TRUE)
>> row col
>> 3   3   1
>> 4   4   1
>> 5   5   1
>> 1   1   3
>> 1   1   4
>> 1   1   5
>> I.e., the distances between mm's rows 3 & 1, 4 & 1, and 5,1 are more than 0.9
>> 
>> The as.matrix(d) is needed because dist returns the lower triangle of
>> the distance
>> matrix and an object of class "dist" and as.matrix.dist converts that
>> into a matrix.
>> 
>> Bill Dunlap
>> TIBCO Software
>> wdunlap tibco.com
>> 
>> 
>> On Wed, Sep 23, 2015 at 12:15 PM, Lorenzo Isella
>> <lorenzo.isella at gmail.com> wrote:
>>> Dear All,
>>> Suppose you have a distance matrix stored like a dist object, for
>>> instance
>>> 
>>> x<-rnorm(20)
>>> y<-rnorm(20)
>>> 
>>> mm<-as.matrix(cbind(x,y))
>>> 
>>> dst<-(dist(mm))
>>> 
>>> Now, my problem is the following: I would like to get the rows of mm
>>> corresponding to points whose distance is always larger of, let's say,
>>> 0.9.
>>> In other words, if I were to compute the distance matrix on those
>>> selected rows of mm, apart from the diagonal, I would get all entries
>>> larger than 0.9.
>>> Any idea about how I can efficiently code that?
>>> Regards
>>> 
>>> Lorenzo
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> 
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA



More information about the R-help mailing list