[R-sig-Geo] A one to many merge involving a spatial data frame object

Dan Putler dan.putler at sauder.ubc.ca
Tue Aug 9 22:23:37 CEST 2011


All,

I'm working with the 2010 TIGER/Line shapefile road data, and I'm 
dealing with alternative road names for the same road segment. There are 
two relevant files in the TIGER collection, the first is the road edge 
shapefile, which has a unique record (and geometry) for each road 
segment. The other is a featnames dbf file which can have multiple 
records for the same road segment (but an ID that identifies the road 
segment). One of the reasons this second file was created was to deal 
with situations where a portion of a road is known by two or more 
different names (for example, Hwy 50 and Main Street). My goal is to 
create a SpatialLinesDataFrame object that contains the unique road 
segment / road name combinations, which will result in a set of line 
geometries that are not unique. I've looked at the spCbind methods, but 
my reading of the documentation suggests it will not address this case 
directly since the feature IDs would not be unique.

I can create a new SpatialLinesDataFrame that has a row for each 
possible unique road segment and road name combination, and I can then 
use spCbind to attach the needed attribute information to this object. 
Unfortunately, the way I can think of creating the SpatialLinesDataFrame 
object is a great example of what *not* to do in S language programming, 
specifically, use a for-loop to "grow" a data frame like object using 
rbind. Below is a snippet of my present code:

   geom1 <- readOGR(dsn=roads_dsn, layer=roads_layer)
   geom1 <- geom1[, "TLID"]
   # Let the nastiness begin. Time to build-up the needed geometries
   geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[1],], "1")
   for(i in 2:nrow(roadsf)) {
     new_geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[i],], 
as.character(i))
     geom <- rbind(geom, new_geom)
   }

In the code, the data frame "roadsf" contains the attribute data for 
each unique road segment / road name pair, and the variable TLID is the 
unique ID for the road segment geometries. This works, but it is really 
slow (it is CPU bound, but not I/O bound). My question is have I missed 
an easier solution? If I haven't, how could I go about doing this more 
cleverly? Given the need to alter the feature IDs, I don't see a nice 
way to use one the apply family of functions. Alternatively, is this 
just something that is better suited to the use of a spatial database 
tool like PostGIS or SpatiaLite? I can guess the query looks like: 
SELECT * FROM table1 AS a, the_geom FROM table2 AS b WHERE a.id = b.id

Dan



More information about the R-sig-Geo mailing list