[R-sig-Geo] A one to many merge involving a spatial data frame object

Paul Hiemstra paul.hiemstra at knmi.nl
Mon Aug 15 10:17:22 CEST 2011


 On 08/09/2011 08:23 PM, Dan Putler wrote:
> All,
>
> I'm working with the 2010 TIGER/Line shapefile road data, and I'm
> dealing with alternative road names for the same road segment. There
> are two relevant files in the TIGER collection, the first is the road
> edge shapefile, which has a unique record (and geometry) for each road
> segment. The other is a featnames dbf file which can have multiple
> records for the same road segment (but an ID that identifies the road
> segment). One of the reasons this second file was created was to deal
> with situations where a portion of a road is known by two or more
> different names (for example, Hwy 50 and Main Street). My goal is to
> create a SpatialLinesDataFrame object that contains the unique road
> segment / road name combinations, which will result in a set of line
> geometries that are not unique. I've looked at the spCbind methods,
> but my reading of the documentation suggests it will not address this
> case directly since the feature IDs would not be unique.
>
> I can create a new SpatialLinesDataFrame that has a row for each
> possible unique road segment and road name combination, and I can then
> use spCbind to attach the needed attribute information to this object.
> Unfortunately, the way I can think of creating the
> SpatialLinesDataFrame object is a great example of what *not* to do in
> S language programming, specifically, use a for-loop to "grow" a data
> frame like object using rbind. Below is a snippet of my present code:
>
>   geom1 <- readOGR(dsn=roads_dsn, layer=roads_layer)
>   geom1 <- geom1[, "TLID"]
>   # Let the nastiness begin. Time to build-up the needed geometries
>   geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[1],], "1")
>   for(i in 2:nrow(roadsf)) {
>     new_geom <- spChFIDs(geom1[geom1$TLID == roadsf$TLID[i],],
> as.character(i))
>     geom <- rbind(geom, new_geom)

In general in R concatenating objects in this way is quite a slow
process (assuming that nrow(roadsf) is quite big). The object geom keeps
on growing and the memory allocated to that object is continuously
updated to allow for the growing object. This is very slow and should be
avoided. Pre-allocating the space needed for geom could speed this code
up quite a bit. Alternatively, I often use functions from the plyr
package to circumvent this problem. However, without a working example
from your side I cannot provide sample code that uses the plyr package.

regards,
Paul

>   }
>
> In the code, the data frame "roadsf" contains the attribute data for
> each unique road segment / road name pair, and the variable TLID is
> the unique ID for the road segment geometries. This works, but it is
> really slow (it is CPU bound, but not I/O bound). My question is have
> I missed an easier solution? If I haven't, how could I go about doing
> this more cleverly? Given the need to alter the feature IDs, I don't
> see a nice way to use one the apply family of functions.
> Alternatively, is this just something that is better suited to the use
> of a spatial database tool like PostGIS or SpatiaLite? I can guess the
> query looks like: SELECT * FROM table1 AS a, the_geom FROM table2 AS b
> WHERE a.id = b.id
>
> Dan
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo


-- 
Paul Hiemstra, Ph.D.
Global Climate Division
Royal Netherlands Meteorological Institute (KNMI)
Wilhelminalaan 10 | 3732 GK | De Bilt | Kamer B 3.39
P.O. Box 201 | 3730 AE | De Bilt
tel: +31 30 2206 494

http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



More information about the R-sig-Geo mailing list