[R] finding euclidean proximate points in two datasets

Alexander Shenkin ashenkin at ufl.edu
Thu May 20 17:12:19 CEST 2010


On 5/20/2010 9:18 AM, David Winsemius wrote:
> 
> On May 20, 2010, at 10:02 AM, Alexander Shenkin wrote:
> 
>> Hello all,
>>
>> I've been pouring through the various spatial packages, but haven't come
>> across the right thing yet.
> 
> There is a SIG for such questions.

thanks - just joined it.

>> Given a set of points in 2-d space X, i'm trying to find the subset of
>> points in Y proximate to each point in X.  Furthermore, the proximity
>> threshold of each point in X differs (X$threshold).  I've constructed
>> this myself already, but it's horrificly slow with a dataset of 40k+
>> points in one set, and a 700 in the other.
>>
>> A very inefficient example of what I'm looking for:
> 
> Not really a reproducible example. If euclidean_dist is a function ,
> then it is not one in any of the packages I have installed.

it's not reproducible - i'll make a better effort to include
reproducible code in the future.  and that function is just one i would
have written, but didn't want to clog the email with code.  Anyway, here
is a reproducible example:

X = data.frame(x=c(1,2,3), y=c(2,3,1), threshold=c(1,2,4))
Y = data.frame(x=c(5,2,3,4,2,5,2,3), y=c(5,2,2,4,1,2,3,1))
proximate=list()
i=1
for (pt in 1:length(X$x)) {
    proximate[[i]] <- sqrt((X[pt,]$x - Y$x)^2 + (X[pt,]$y - Y$y)^2) >
X[pt,]$threshold
    i=i+1
}
proximate

>>
>>    for (pt in X$idx) {
>>        proximity[i] = euclidian_dist(X[pt]$x, X[pt]$y, Y$x, Y$y) <
>> X$threshold
>>     i = i+1
>>    }
>>
> 
> Have you considered first creating a subset of candidate points that are
> within "threshold" of each reference point on both coordinates. That
> might sidestep a lot of calculations on points that are easily
> eliminated on a single comparison. Then you could calculate distances
> within that surviving subset of points. On average that should give you
> an over 50% "hit rate":
> 
>> (4/3)*pi*0.5^3
> [1] 0.5235988

That's a nice idea.  I'll still be waiting quite a while while my
machine cranks, but not as long.  Still - I suspect there would be much
bigger gains if there were tailored packages.  I'll re-post over on
sig-geo about that.  thanks.

>> Perhaps crossdist() in spatstat is what I should use, and then code a
>> comparison with X$threshold after the cross-distances are computed.
>> However, I was wondering if there was another tool I should be
>> considering.  Any and all thoughts are very welcome.  Thanks in advance.
>>
>> Thanks,
>> Allie
>> -- 
>> Alexander Shenkin
>> PhD Candidate
>> School of Natural Resources and Environment
>> University of Florida



More information about the R-help mailing list