[R-sig-Geo] Merging shapefiles and csv

Thu Jul 31 18:46:16 CEST 2014

On Thu, 31 Jul 2014, Rolando Valdez wrote:

> Hi,
>
> I have used this proceeding:
>
> spatial.data <- readOGR(….)
> stat.data <- read.csv(….)
> spatial.data at data=data.frame(stat.data)
>
> This will merge your statistical data to your spatial data. Make sure 
> you have the same order in both sides of your data.

No, this only holds under very specific assumptions (same observations in 
both data objects in the same order). Please do refer to the vignette in 
the maptools package, and to previous threads which have advised that 
merge() should not be used, and that the row.names of the data frames be 
used as ID keys. Typically using match() on the row.names of the two 
objects will show which are not correctly aligned.

Hope this clarifies,

Roger

>
> Hope this help, greetings.
>
> El 31/07/2014, a las 09:10, sam cruickshank <sam_l_cruickshank at hotmail.com> escribió:
>
>> Hi,Thank you Lyndon and Rafael for your thoughts.  After the sp::merge comment I followed the code below, but again it failed at the write OGR bit, but this time with "Error in writeOGR(spatial.data, dsn = "C:/Users/Laptop/Documents/Rworkspace/Shape",  :   Creating Name field failed"
>> This could be because the "new_layer" command...does this have to be named anything in particular?  Does it have to match the file name etc.  Lyndon I'll try yours next but must admit it's confused me a little.
>>
>> Joining New Data to an Existing sp Object
>> # use to read in some vector data
>> library(rgdal)
>>
>> # read something in, rows are identified by a column called 'id'
>> spatial.data <- readOGR(...)
>>
>> # read in some tabular data, rows are identified by a column called 'id'
>> new_table <- read.csv(...)
>>
>> # 'join' the new data with merge()
>> # all.x=TRUE is used to ensure we have the same number of rows after the join
>> # in case that the new table has fewer
>> merged <- merge(x=spatial.data at data, y=new_table, by.x='id', by.y='id', all.x=TRUE)
>>
>> # generate a vector that represents the original ordering of rows in the sp object
>> correct.ordering <- match(spatial.data at data$id, merged$id)
>>
>> # overwrite the original dataframe with the new merged dataframe, in the correct order
>> spatial.data at data <- merged[correct.ordering, ]
>>
>> # check the ordering of the merged data, with the original spatial data
>> cbind(spatial.data at data$id, merged$id[correct.ordering])
>> Correctly Write 'NA' Values to Shapefile [bug in writeOGR()]
>> # libraries we need
>> require(rgdal)
>> require(foreign)
>>
>> # pass 1: write the shapefile
>> writeOGR(spatial.data, dsn='new_folder', driver='ESRI Shapefile', layer='new_layer')
>>
>> # re-make the DBF:
>> write.dbf(spatial.data at data, file='new_folder/new_layer.dbf')
>>
>>
>>> Date: Thu, 31 Jul 2014 15:19:01 +0200
>>> From: rafael.wueest at gmail.com
>>> To: r-sig-geo at r-project.org
>>> Subject: Re: [R-sig-Geo] Merging shapefiles and csv
>>>
>>> Hi there
>>>
>>> have a look at
>>>
>>> ?sp::merge
>>>
>>> Should do what you need.
>>>
>>> HTH, Rafael
>>>
>>> On 31/07/2014 15:14, Lyndon Estes wrote:
>>>> I am not sure about the mismatch issue, but I thinking merging the
>>>> data slot of spatialPolygonsDataFrame with a data frame produces
>>>> undesirable results.
>>>>
>>>> I wrote a function a while back that does the merge in such a way that
>>>> the problems are avoided, and perhaps this might help.  I think there
>>>> are other, more recent, and undoubtedly better solutions (in fact I
>>>> recall seeing a very recent thread about this, but not sure where)
>>>> than this one that you could find.
>>>>
>>>> joinAttributeTable <- function(x, y, xcol, ycol) {
>>>> # Merges data frame to SpatialPolygonsDataFrame, keeping the correct
>>>> order. Code from suggestions at:
>>>> # https://stat.ethz.ch/pipermail/r-sig-geo/2008-January/003064.html
>>>> # Args:
>>>> #   x: SpatialPolygonsDataFrame
>>>> #   y: Name of data.frame to merge
>>>> #   xcol: Merge column name
>>>> #   ycol: Merge column name
>>>> # Returns: Shapefile with merged attribute table
>>>>
>>>>   x$sort_id <- 1:nrow(as(x, "data.frame"))  # Column containing
>>>> original row order for later sorting
>>>>
>>>>   x.dat <- as(x, "data.frame")  # Create new data.frame object
>>>>   x.dat2 <- merge(x.dat, y, by.x = xcol, by.y = ycol)  # Merge
>>>>   x.dat2.ord <- x.dat2[order(x.dat2$sort_id), ]  # Reorder back to original
>>>>   x2 <- x[x$sort_id %in% x.dat2$sort_id, ]  # Make new set of
>>>> polygons, dropping those which aren't in merge
>>>>   x2.dat <- as(x2, "data.frame")  # Make update x2 into a data.frame
>>>>   row.names(x.dat2.ord) <- row.names(x2.dat)  # Reassign row.names
>>>> from original data.frame
>>>>   x2 at data <- x.dat2.ord  # Assign to shapefile the new data.frame
>>>>   return(x2)
>>>> }
>>>>
>>>> Hope it helps.
>>>>
>>>> Best, Lyndon
>>>>
>>>>
>>>> On Thu, Jul 31, 2014 at 8:32 AM, HallS <sam_l_cruickshank at hotmail.com> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm struggling to know how this will come across as my data is confidential.
>>>>>
>>>>> Basically I have a shapefile (.shp) and a csv file while contain the same
>>>>> regions (i.e.) a column which has the same information.  Using this link:
>>>>> https://sites.google.com/site/eospansite/alobotips/spatial_r_tips/rshp_xls
>>>>> I managed to get quite far but once I got to the writeOGR command, I get the
>>>>> error
>>>>>  Error in writeOGR(RSANHS, dsn = "C:/Users/Laptop/Documents/Rworkspace/",  :
>>>>>   number of objects mismatch
>>>>>
>>>>> shape1 at data <- merge(shape1 at data,csv,by.x="RSA",
>>>>> +                           by.y="RSA", all.x=T, sort=F)
>>>>>>
>>>>>> ###Checking it
>>>>>> dim(shape at data)
>>>>> [1] 1745    2
>>>>>> dim(shape1 at data)
>>>>> [1] 1747    5
>>>>>
>>>>> This shows a discrepancy in two rows between the original shapefile and the
>>>>> new merged one.  When I looked at the merged file in full, there were a
>>>>> number of NA rows at the bottom where there was no corresponding data to the
>>>>> shapefile.  I tried shape1 at data <- na.exclude(shape1 at data) and with na.omit,
>>>>> and this did reduce the number of rows to 1690, but the problem persists.
>>>>>
>>>>> Sorry if this is a really unhelpful question, I'm not sure how to do it when
>>>>> data is confidential.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context: http://r-sig-geo.2731867.n2.nabble.com/Merging-shapefiles-and-csv-tp7586839.html
>>>>> Sent from the R-sig-geo mailing list archive at Nabble.com.
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at r-project.org
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at r-project.org
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>
>>>
>>> --
>>> Rafael Wüest
>>> rafael.wueest at gmail.com
>>> http://www.rowueest.net
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>
> Rolando Valdez
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no