[R] Sampling the Distance Matrix
David Winsemius
dwinsemius at comcast.net
Fri Sep 25 22:56:56 CEST 2015
On Sep 25, 2015, at 12:54 PM, Lorenzo Isella wrote:
> Apologies for not letting this thread rest in peace.
> The small script
>
> #########################################################
> set.seed(1234)
>
> x <- rnorm(20)
> y <- rnorm(20)
>
>
> goodcls <- apply(mtxcomb , 2, function(idx) all( dist( cbind( x[idx],
> y[idx]) ) > 0.9))
>
> mycomb <- mtxcomb [ , goodcls]
> #########################################################
>
>
> is perfect to detects groups of 5 points whose distances to each other
> are always above 0.9.
> However, in my practical case I have about 500 points and I am looking
> for subset of several tens of points whose distance is above a given
> threshold.
> Unfortunately, the approach above does not scale, so I wonder if
> anybody is aware of an alternative approach.
Find the center of the distribution, eliminate all the points within some reasonable radius perhaps sqrt( sd(x)^2 +sd(y)^2 ) and then work on the reduced set. If you needed to reduce it even further I could imagine sampling in sectors defined by tan(x/y).
--
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list