[R-sig-Geo] "Merge" shapefiles

Robert J. Hijmans r.hijmans at gmail.com
Fri Nov 14 21:49:52 CET 2014


Steven,

Please do provide a self-contained example when you ask a question
(see my example below) and try to use correct terms. You are dealing
with SpatialPolygonDataFrame objects (shpA, shpB), these objects are
perhaps derived from shapefiles, but it is not what they are.  Also,
providing a motivation is very important, particularly if you are self
proclaimed newbie. I think that merge should be able to deal with this
case, for if you really want it, through an argument in the function
that with give you an error for the default value, because in most
practical cases this truly reveals an error.

Here is an example of a route you might take. Please carefully check
if it does what you need (I did not do that).

Robert


library(raster)

# Get a SpatialPolygonsDataFrame
p <- shapefile(system.file("external/lux.shp", package="raster"))

# make a subset
s <- p[1:3, ]

# merge (combines a Spatial* object with a data.frame. ID_2 is a unique ID
m <- merge(p, data.frame(s), all=TRUE, by='ID_2')

# makes a subset with multiple instances of the same polygon
ss <- bind(s, s[1:2,])
# add something unique to each record
ss$newvar <- 1:nrow(ss)

# merge fails now
# you should have shown us something like this. There very little
value in talking about shpA and shpB as we do not have not have (or
want) access to the data

m <- merge(p, data.frame(ss), all=TRUE, by='ID_2')

# this is the error.
# Error in .local(x, y, ...) :
#  'y' has multiple records for one or more 'by.y' key(s)


# so let's merge the data.frames (using two Spatial Objects)
d <- merge(p, ss, all=TRUE, by='ID_2')

# link table d to SpatialPolgyons object with all records, p
i <- match(d$ID_2, p$ID_2)

# get the polygons we need
x <- p[i, ]

# link the polygons to the merged table
y <- SpatialPolygonsDataFrame(as(x, 'SpatialPolygons'), d, match.ID=FALSE)

# inspect
p
y
data.frame(p)
data.frame(ss)
data.frame(y)

On Fri, Nov 14, 2014 at 2:23 AM, Roger Bivand <Roger.Bivand at nhh.no> wrote:
> On Fri, 14 Nov 2014, Tyler Frazier wrote:
>
>> Not to belittle the spatial capabilities of R, but this sounds like a
>> function that would be better addressed with PostgreSQL/postgis. Integrating
>> r & pgsql can be a good combination.
>
>
> Maybe, but clarity of thinking is perhaps what is needed, it always helps
> more than guesswork. If you already know PostGIS, you'd also need clarity of
> thinking, and the steps would be very similar, although with the possibility
> to link identical objects.
>
>>
>> Sent from my iPhone
>>
>>> On Nov 14, 2014, at 1:11 AM, Steven Ranney <steven.ranney at gmail.com>
>>> wrote:
>>>
>>> All -
>>>
>>> I am slowly learning more about spatial data in R.  However, I am still a
>>> relative neophyte.
>>>
>>> What I want to do:
>>>
>>> I have two shapefiles, shpA has ~401,000 individual polygons with
>>> attributes.  shpB is a subset of those polygons with different attribute
>>> data.  Even though shpB is a subset of those data, there may be multiple
>>> rows for a given polyon, thus giving shpB more total rows (~780,000).
>>>
>
> You must decide what you want to do in detail, for instance whether these
> representations make any sense. You do not provide a motivation or an
> affiliation, which make it hard to guess your application domain (ecology,
> real estate, whatever).
>
> You have ~401,000 individual polygons with IDs and some data, are they
> unique? Do they overlap? Are they home ranges (which may overlap), census
> blocks (which shouldn't)?
>
> Then you have extra data that happens to be in a messy shapefile with
> repeated geometries, all of which match some of those in the the first data
> set (it never needed to be a shapefile, and probably never should have
> been). Can you match them by ID (match() is much stronger than merge(),
> because it shows you what is matching)?
>
> Note that you expect to get >=0 matches on each geometry from the first
> object, you need to control what is going on, because the maximum number of
> matches will determine the number of columns in the output (with lots of
> missing values where there are fewer than this. Are the repeat geometries
> there because the repeats are at different times? Should you be trying to
> construct an appropriate space-time object if this is the case?
>
>
>>> Effectively, I want to merge these two shapefiles.  With two dataFrame
>>> objects in R, I would merge them like
>>>
>>> merge(shpA, shpB, by = "APN_LABEL", all = TRUE)
>>>
>>> but apparently, this doesn't work with shapefiles.  I have tried
>>>
>>> merge(shpA at data, shpB at data, by = "APN_LABEL", all = TRUE)
>>>
>>> which creates a dataFrame of the the two files but drops all of the
>>> spatial
>>> geometries.
>
>
> Yes, of course, what did you expect? The only references available say that
> there is no merge method for Spatial* objects, and you are anyway taking
> their data slots, which are data frames. If the output object has the same
> number of rows as shpA, and its row.names() matches that of shpA, you may
> have what you want (create a new SPDF object with the SpatialPolygons from
> shpA, and the output from merge as its data slot), but beware of merge()
> re-ordering rows. This is, however dependent on prior checking for
> consistency in the IDs.
>
>>>
>>> I've looked into gUnion() as it seems like that may be what I'm looking
>>> for, but I get the following error:
>
>
> Just fishing without understanding is always pretty hopeless. Why would you
> expect that a function that is declared to only handle geometries could sort
> out your data cleaning problem?
>
>>>
>>> tmp <- gUnion(shpA, shpB)
>>> Error in RGEOSBinTopoFunc(spgeom1, spgeom2, byid, id, drop_lower_td,
>>> "rgeos_union") :
>>>  std::bad_alloc
>>>
>>> Ultimately, I want a shapeFile of all ~401,000 geometries in shpA that
>>> includes ALL of the attribute data from shpB that may exist in multiple
>>> rows for a given polygon.
>
>
> Yes, but you need to think first; I'm not even sure why these polygons might
> be meaningful anyway - you didn't say. Guessing by function name really
> doesn't help. Did reading the "combine_maptools" vignette help?
>
> http://cran.r-project.org/web/packages/maptools/vignettes/combine_maptools.pdf
>
> Hope this clarifies,
>
> Roger
>
>>>
>>> Is this possible?  Is this simple?
>>>
>>> Steven H. Ranney
>>>
>>>    [[alternative HTML version deleted]]
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
> --
> Roger Bivand
> Department of Economics, Norwegian School of Economics,
> Helleveien 30, N-5045 Bergen, Norway.
> voice: +47 55 95 93 55; fax +47 55 95 91 00
> e-mail: Roger.Bivand at nhh.no
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list