[R] nested for() loops for returning a nearest point

Liaw, Andy andy_liaw at merck.com
Wed Jul 30 19:09:46 CEST 2003


> From: Steve Sullivan [mailto:ssullivan at qedgroupllc.com] 
> 
> I'm trying to do the following:
> 
> For each ordered pair of a data frame (D1) containing 
> longitudes and latitudes and unique point IDs, calculate the 
> distance to every point in another data frame (D2) also 
> containing longitudes, latitudes and point IDs, and return to 
> a new variable in D1 the point ID of the nearest element of D2.
> 
> Dramatis personae (mostly self-explanatory):
> 
> D1$long
> 
> D1$lat
> 
> D1$point.id
> 
> neighbor.id (to be created; for each ordered pair in D1 the 
> point ID of the nearest ordered pair in D2)
> 
> D2$long
> 
> D2$lat
> 
> D2$point.id
> 
> dist.geo (to be created)
> 
>  
> 
> I've been attempting this with nested for() loops that step 
> through each ordered pair in D1, and for each ordered pair 
> [i] in D1 create a vector
> (dist.geo) the length of D2$lat (say) that contains the 
> distance calculated from every ordered pair in D2 to the 
> current ordered pair [i] of D1, assign a value for 
> D1$neighbor.id[i] based on D2$point.id[(which.min(dist.geo)], 
> and move on to the next ordered pair of D1 to create another 
> dist.geo, assign another neighbor.id, etc.
> 
>  
> 
> There are no missings/NAs in any of the longs, lats or 
> point.ids, although advice on generalizing this to deal with 
> them would be appreciated.
> 
>  
> 
> What I've been trying:
> 
>  
> 
> neighbor.id <- vector(length=length(D1$lat))
> dist.geo <- vector(length=length(D2$lat))
> for(i in 1:length(neighbor.id)){
> for(j in 1:length(dist.geo)){
> dist.geo[j] <- D1$lat[i]-D2$lat[j]}  
> 
> # Yes, I know that isn't the right formula, this is just a test
> 
> neighbor.id[i] <- D2$point.id[which.min(dist.geo)]}
> 
>  
> 
> What I get is a neighbor.id of the appropriate length, but 
> which consists only of the same value repeated.  Should I 
> instead pass the
> which.min(dist.geo) to a variable before exiting the inner 
> (j) loop, and reference that variable in place of 
> which.min(dist.geo) in the last line?  Or is this whole 
> approach wrongheaded?
> 

For finding nearest neighbors, try the following:

set.seed(1)
d1 <- data.frame(long=rnorm(10), lat=rnorm(10), point.id=factor(1:10))
d2 <- data.frame(long=rnorm(5), lat=rnorm(5), point.id=factor(1:5))

## For each point in D1, find nearest neighbor in D2.
library(class)
d1$neighbor.id <- knn1(as.matrix(d2[,1:2]), as.matrix(d1[,1:2]),
d2$point.id)

If you really want do, you could modify knn1() (and the C code it calls) so
the distance is also returned.  Otherwise, you can just compute the distance
"by hand" in R once the nearest neighbors are found.

HTH,
Andy

>  
> 
> This should be elementary, I know, so I appreciate everyone's 
> forbearance.
> 
>  
> 
> Steven Sullivan, Ph.D.
> 
> Senior Associate
> 
> The QED Group, LLC
> 
> 1250 Eye St. NW, Suite 802
> 
> Washington, DC  20005
> 
> ssullivan at qedgroupllc.com
> 
> 202.898.1910.x15 (v)
> 
> 202.898.0887 (f)
> 
> 202.421.8161 (m)
> 
>  
> 
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list 
> https://www.stat.math.ethz.ch/mailman/listinfo> /r-help
> 

------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.




More information about the R-help mailing list