[R] Spatial join - optimizing code

Monica Pisica pisicandru at hotmail.com
Tue Sep 16 21:01:16 CEST 2008


Hi Dan,

This is fantastic. I've just run your code with same data as before and the results are:

BEFORE:

   user  system  elapsed
8166.07    2.98  8194.43

AFTER (with Dan's code):

   user  system elapsed 
  18.53    0.03   18.59 

So with my "real" data this code is over 440 times faster .....

Thank you so much!

Monica




> Date: Tue, 16 Sep 2008 14:10:34 -0400
> From: davison at stats.ox.ac.uk
> To: pisicandru at hotmail.com
> CC: r-help at r-project.org
> Subject: Re: [R] Spatial join ? optimizing code
>
> Hi Monica,
>
> I think the key to speeding this up is, for every point in 'track', to
> compute the distance to all points in 'classif' 'simultaneously',
> using vectorized calculations. Here's my function. On my laptop it's
> about 160 times faster than the original for the case I looked at
> (10,000 observations in track and 500 in classif). I get around 18
> seconds for the 30,000 and 4,000 example (2 GHz processor running
> linux).
>
> Dan
>
> dist.merge2 <- function(x, y, xeast, xnorth, yeast, ynorth) {
> ## construct data frame d in which d[i,] contains information
> ## associated with the closest point in y to x[i,]
> xpos <- as.matrix(x[,c(xeast, xnorth)])
> xposl <- lapply(seq.int(nrow(x)), function(i) xpos[i,])
> ypos <- t(as.matrix(y[,c(yeast, ynorth)]))
> yinfo <- y[,! colnames(y) %in% c(yeast,ynorth)]
>
> get.match.and.dist <- function(point) {
> sqdists <- colSums((point - ypos)^2)
> ind <- which.min(sqdists)
> c(ind, sqrt(sqdists[ind]))
> }
> match <- sapply(xposl, get.match.and.dist)
> cbind(xpos, mindist=match[2,], yinfo[match[1,],])
> }
>
> It's marginally faster to convert xpos to a list followed by sapply as
> I do here, than to leave it as a matrix and use apply to get the
> matches.
>
>
>
>
>
>
> On Tue, Sep 16, 2008 at 04:23:33PM +0000, Monica Pisica wrote:
>>
>> Hi,
>>
>> Few days ago I have asked about spatial join on the minimum distance between 2 sets of points with coordinates and attributes in 2 different data frames.
>>
>> Simon Knapp sent code to do it when calculating distance on a sphere using lat, long coordinates and I've change his code to use Euclidian distances since my data had UTM coordinates.
>>
>> Typically one data frame has around 30 000 points and the classification data frame has around 4000 points, and the aim is to add to each point from the first data frame all the attributes from the second data frame of the point that is closest to it.
>>
>> On my PC (Dell, OptiPlex GX620, X86 ? based PC, 4 GB RAM, 3192 Mhz processor)
>> It took quite a long time to do the join:
>>
>> user system elapsed
>> 8166.07 2.98 8194.43
>>
>> Sys.info()
>> sysname release
>> "Windows" "XP"
>> version nodename
>> "build 2600, Service Pack 2"
>> machine
>> "x86"
>> I am running R 2.7.1 patched.
>> I wonder if any of you can suggest or help (or have time) in optimizing this code to make it run faster. My programming skills are not high enough to do it.
>>
>> Thanks,
>>
>> Monica
>>
>> #### code follows:
>> #### x a data frame with over 30000 points with coord in UTM, xeast, xnorth
>> #### y a data frame with over 4000 points with UTM coord (yeast, ynorth) and
>> ##### classification
>> ### calculating Euclidian distance
>>
>> dist <- function(xeast, xnorth, yeast, ynorth) {
>> ((xeast-yeast)^2 + (xnorth-ynorth)^2)^0.5
>> }
>>
>> ### doing the merge by location with minimum distance
>>
>> dist.merge <- function(x, y, xeast, xnorth, yeast, ynorth){
>> tmp <- t(apply(x[,c(xeast, xnorth)], 1, function(x, y){
>> dists <- apply(y, 1, function(x, y) dist(x[2],
>> x[1], y[2], y[1]), x)
>> cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
>> }
>> , y[,c(yeast, ynorth)]))
>> tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast,
>> ynorth), names(y))])
>> row.names(tmp) <- NULL
>> tmp
>> }
>>
>> #### code end
>>
>> _________________________________________________________________
>>
>> Live.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> --
> http://www.stats.ox.ac.uk/~davison

_________________________________________________________________


50F681DAD532637!5295.entry?ocid=TXT_TAGLM_WL_domore_092008


More information about the R-help mailing list