[R-sig-Geo] [Qgis-user] spatial join, many to 1
Edzer Pebesma
edzer.pebesma at uni-muenster.de
Tue Oct 23 21:11:54 CEST 2012
As Barry tried to point out, you cannot do this.
Each area in a SpatialPolygonsDataFrame has attributes that are in a
simple table, with the number of records equal to the the number of
areas. Records are rows in a data.frame, meaning that individual fields
cannot contain (varying length) vectors, as you seem to want.
I believe the information you have is complete, you now either continue
living with these two objects, or you write your own class to merge them
into one (which is fairly trivial, btw).
On 10/23/2012 08:29 PM, Frazier, Tyler James wrote:
> Hi Edzer, Barry and list,
>
> I made some progress on this today, when I set readOGR() to stringsAsFactors=FALSE as so
>
> ea_map <- readOGR(".","ea_r1", stringsAsFactors=FALSE)
> towns <- readOGR(".","towns_r1", stringsAsFactors=FALSE)
>
> and then execute over() as so
>
> spatial_join <- over(ea_map, towns,returnList=TRUE)
>
> I get the following results ...
>
>> str(spatial_join)
> List of 1251
> $ :'data.frame': 3 obs. of 3 variables:
> ..$ NAME : chr [1:3] "ABINAKROM" "WOSSU MANU WADRAGO" "AMOATENGKROM"
> ..$ EA_NOS: chr [1:3] "956" "956" "956"
> ..$ REG : chr [1:3] "01" "01" "01"
> $ :'data.frame': 2 obs. of 3 variables:
> ..$ NAME : chr [1:2] "ABOTAREYE" "KWARFO"
> ..$ EA_NOS: chr [1:2] "827" "827"
> ..$ REG : chr [1:2] "01" "01"
> $ :'data.frame': 1 obs. of 3 variables:
> ..$ NAME : chr "JINIJINI KOFIKROM (NYAMEBEKYERE NO.1"
> ..$ EA_NOS: chr "826"
> ..$ REG : chr "01"
>
>> head(spatial_join)
> [[1]]
> NAME EA_NOS REG
> 1680 ABINAKROM 956 01
> 1681 WOSSU MANU WADRAGO 956 01
> 1682 AMOATENGKROM 956 01
>
> [[2]]
> NAME EA_NOS REG
> 1644 ABOTAREYE 827 01
> 1645 KWARFO 827 01
>
> [[3]]
> NAME EA_NOS REG
> 1967 JINIJINI KOFIKROM (NYAMEBEKYERE NO.1 826 01
>
> [[4]]
> NAME EA_NOS REG
> 1657 MANGOASE 817 01
> 1660 KOO BANIER 817 01
> 1661 TEMA OKRAKROM 817 01
>
> [[5]]
> NAME EA_NOS REG
> 1691 KWASAREKROM 823 01
>
> [[6]]
> NAME EA_NOS REG
> 1966 KWABENA NKETEAKROM 824 01
>
> which appears to be correct. BUT, when I attempt to rejoin with the original ea_map, I get an error
>
> map <- SpatialPolygonsDataFrame(ea_map,spatial_join)
>
>> map <- SpatialPolygonsDataFrame(ea_map,spatial_join)
> Error in SpatialPolygonsDataFrame(ea_map, spatial_join) :
> row.names of data and Polygons IDs do not match
>
> changing stringsAsFactors=FALSE in the readOGR permitted the fields to receive the names of each town, but how to join back to the original ea_map? maybe I am using the wrong command or need to add an index first?
>
> I could export a truncated dataset if that would be helpful.
>
> At the moment, I'm not interested in the population data. I want to join the names of all towns and villages to the enumeration areas in order to spatially disaggregate as locally as possible. The reason is because names of localities have often been recorded in different local dialects or languages (imagine something like the name for the city Berlin also appearing as Bearlin, Bärlin, Berlim, Buhrlin etc...if I have all the names from the two sides of the join, I have a better chance to determine which combinations are correct or eliminate incorrect ones).
>
> Thanks!
> Ty
>
>
> On Oct 23, 2012, at 8:14 AM, Edzer Pebesma wrote:
>
>
>
> On 10/22/2012 10:54 PM, Frazier, Tyler James wrote:
> Hi Barry,
>
> Really what I am trying to do is link raw census DATA to an enumeration area MAP, which spatially represents different aggregations of those observations. In order to do so, on the MAP side, I first want to join a point shape file which has all of the town names to an enumeration area shape file, which is missing those names.
>
> Previously, when I did this on the DATA side I used the following command ---
>
> enum_areas <- aggregate(localities_region1,by=list(localities_region1$ea_code), FUN=unique)
>
> and when an enumeration area had more then one town located within it, R created a field within a row as follows ---
>
> c("NEW-TOWN WHARF", "BALIBANGARA", "BOLENWO").
>
> I don't think R can hold such a structure in a data.frame. Of course,
> with lists it can do any nesting, but they are not tables.
>
>
> On the MAP side, it appears when I use the over() command, like so ---
>
> spatial_join <- over(ea_map, towns)
>
> over() supplies only the name of the first town listed within that enumeration area and ignores/omits the remainder. If over() created all of the names in a similar manner as aggregate(), couldn't I then use
>
> As I mentioned before, when you supply over with the returnList = TRUE
> argument, it will return all matches, just not in a table (as a table is
> not very useful in this case).
>
>
> map <- SpatialPolygonsDataFrame(ea_map,spatial_join)
>
> in order to create my map? What do you think?
>
> Which property of towns do you want in ea_map? If, for instance, you
> want the population of towns in ea_map, you could do (untested, we don't
> have your data):
>
> pop = aggregate(towns["population"], ea_map, sum)
>
>
> I suppose it would be possible to achieve the same using postGIS and some commands there, but would rather take a route within R. I haven't thought about how to use tapply() yet.
>
> Showing us the PostGIS route might help us (and maybe yourself)
> understand what you want.
>
>
> Thanks!
> Ty
>
>
> On Oct 22, 2012, at 10:02 PM, Barry Rowlingson wrote:
>
> On Mon, Oct 22, 2012 at 7:52 PM, Frazier, Tyler James
> <tyler.j.frazier at tu-berlin.de<mailto:tyler.j.frazier at tu-berlin.de><mailto:tyler.j.frazier at tu-berlin.de>> wrote:
> Hi Edzer and list,
>
> Thanks for the package, the command over() essentially achieves my intended result, which is to overlay a set of (2500) points onto a set of (1000) polygons where each polygon receives all of the attribute data from each of the points located within its boundary.
>
> Each polygon can't [easily] 'receive' the attributes from a differing
> number of points. The SpatialPolygonsDataFrame has one polygonal
> feature related to one row in the data frame. You could have one
> polygonal feature for each point, but that would end up as a massive
> duplication of the polygonal data.
>
> It would seem to be better to create a new *point* data set, where
> each row of the point data frame is augmented with the information
> from the polygon in which it resides. This gives you something like an
> ID variable that relates the points to the polygons in the manner of a
> database id field.
>
> Overall, it really depends on what you want to do. If you want to
> draw maps of polygons based on some aggregate function of the points
> within, then that's a question of doing something like a 'tapply' on
> the points using the polygons each point is in as a factor.
>
> Barry
>
>
> -----
> Tyler Frazier
> Department of Transportation Planning and Telematics
> Technical University Berlin
> http://www.vsp.tu-berlin.de/
>
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
--
Edzer Pebesma
Institute for Geoinformatics (ifgi), University of Münster
Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251
8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de
http://www.52north.org/geostatistics e.pebesma at wwu.de
More information about the R-sig-Geo
mailing list