[R] Join data by minimum distance

Monica Pisica pisicandru at hotmail.com
Mon Sep 15 15:13:13 CEST 2008


Hi,

First of all thanks to everybody who came with suggestions and solutions, but Simon came with perfect code ;-)

We work with data that usually is projected in locat UTM coordinates, so i've changed the code as following and it works like a charm:

## code begins
## Euclidian distance 
##this has advantage that i can use a z coordinate if i have it and change the dist function accordingly

dist <- function(x1, y1, x2, y2) {
((x1-x2)^2 + (y1-y2)^2)^0.5
}

dist.merge <- function(x, y, xeast, xnorth, yeast, ynorth){
tmp <- t(apply(x[,c(xeast, xnorth)], 1, function(x, y){
dists <- apply(y, 1, function(x, y) dist(x[2],
x[1], y[2], y[1]), x)
cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
}
, y[,c(yeast, ynorth)]))
tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(yeast,
ynorth), names(y))])
row.names(tmp) <- NULL
tmp
}
 

## code ends

#demo

track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90), v1=letters[1:20], v2=1:20)
dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')

Again,

Thanks for all the help,
Monica



> Date: Sun, 14 Sep 2008 23:06:22 +1000
> From: sleepingwell at gmail.com
> To: pisicandru at hotmail.com
> Subject: Re: [R] Join data by minimum distance
> CC: r-help at r-project.org
>
>> I am wondering if there is a function which will do a join between 2 data.frames by minimum distance, as it is done in ArcGIS for example. For people who are not familiar with ArcGIS here it is an explanation:
>>
>> Suppose you have a data.frame with x, y, coordinates called track, and a second data frame with different x, y coordinates and some other attributes called classif. The track data.frame has a different number of rows than classif. I want to join the rows from classif to track in such a way that for each row in track I add only the row from classif that has coordinates closest to the coordinates in the track row (and hence minimum distance in between the 2 rows), and also add a new column which will record this minimum distance. Even if the coordinates in the 2 data.frames have same name, the values are not identical between the data.frames, so a merge by column is not what I am after.
>
>
>
> #-----------------------------------------------------------------------
> # get the distance between two points on the globe.
> #
> # args:
> # lat1 - latitude of first point.
> # long1 - longitude of first point.
> # lat2 - latitude of first point.
> # long2 - longitude of first point.
> # radius - average radius of the earth in km
> #
> # see: http://en.wikipedia.org/wiki/Great_circle_distance
> #-----------------------------------------------------------------------
> greatCircleDistance <- function(lat1, long1, lat2, long2, radius=6372.795){
> sf <- pi/180
> lat1 <- lat1*sf
> lat2 <- lat2*sf
> long1 <- long1*sf
> long2 <- long2*sf
> lod <- abs(long1-long2)
> radius * atan2(
> sqrt((cos(lat1)*sin(lod))**2 +
> (cos(lat2)*sin(lat1)-sin(lat2)*cos(lat1)*cos(lod))**2),
> sin(lat2)*sin(lat1)+cos(lat2)*cos(lat1)*cos(lod)
> )
> }
>
> #-----------------------------------------------------------------------
> # Calculate the nearest point using latitude and longitude.
> # and attach the other args and nearest distance from the
> # other data.frame.
> #
> # args:
> # x as you describe 'track'
> # y as you describe 'classif'
> # xlongnme name of longitude variable in x
> # xlatnme name of latitude location variable in x
> # ylongnme name of longitude location variable on y
> # ylatnme name of latitude location variable on y
> #-----------------------------------------------------------------------
> dist.merge <- function(x, y, xlongnme, xlatnme, ylongnme, ylatnme){
> tmp <- t(apply(x[,c(xlongnme, xlatnme)], 1, function(x, y){
> dists <- apply(y, 1, function(x, y) greatCircleDistance(x[2],
> x[1], y[2], y[1]), x)
> cbind(1:nrow(y), dists)[dists == min(dists),,drop=F][1,]
> }
> , y[,c(ylongnme, ylatnme)]))
> tmp <- cbind(x, min.dist=tmp[,2], y[tmp[,1],-match(c(ylongnme,
> ylatnme), names(y))])
> row.names(tmp) <- NULL
> tmp
> }
>
> # demo
> track <- data.frame(xt=runif(10,0,360), yt=rnorm(10,-90, 90))
> classif <- data.frame(xc=runif(10,0,360), yc=rnorm(10,-90, 90),
> v1=letters[1:20], v2=1:20)
> dist.merge(track, classif, 'xt', 'yt', 'xc', 'yc')

_________________________________________________________________

 Live.



More information about the R-help mailing list