[R-sig-Geo] Merging data frame for SpPolDF

Thu Mar 19 21:44:18 CET 2009

Hi

It might be we are talking about different things. What I understand is that 
you have an original shp-file with unique IDs associated with each Polygon. 
In the shape-file these are named "letras". In addition to this you have a 
data.frame with variables "letters" which matches the ones in "letras". 

However in the letters column there are fewer IDs that in letras. This is what 
is generated by:
#First we extract the CNTY_ID (let this be letras) to get some similar IDs for 
the dummy data.frame. At this stage it should contain one column named ID 
with the IDs, and one named Ndata with some random values
extra <- data.frame(ID=slot(nc, 'data')$CNTY_ID, 
Ndata=runif(length(slot(nc, 'data')$CNTY_ID)))

first:
> str(slot(nc, 'data'))
'data.frame':   100 obs. of  14 variables:
.
.
 $ CNTY_ID  : num  1825 1827 1828 1831 1832 ...                                  
.
. 
$ NWBIR79  : num  19 12 260 145 1197 ...
.

#The next step is to remove some of these IDs (1-3, and 68-100). At this stage 
we have removed quite a number of IDs
extra <- extra[4:67, 1:2]

#In addition we want to sort them in a different way, to make things more 
realistic
extra <- extra[order(extra$ID, decreasing=TRUE),]

#And finally we change the value of one of the IDs to have a value that is not 
present in the original CNTY_ID. Also this to make it more realistic
extra[1,1] <- 342

> str(extra)
'data.frame':   64 obs. of  2 variables:
 $ ID   : num  342 2039 2034 2032 2030 ...
 $ Ndata: num  0.8272 0.0255 0.5633 0.1834 0.8208 ...

> str(slot(nc, 'data')$CNTY_ID)
 num [1:100] 1825 1827 1828 1831 1832 ...

So we see they look different, just like a1 and a2 in the example provided. So 
far we have only been worried about making the dummy data. 

Skip the merge() function and move directly to the match() function. This is 
what you want to do (only the next steps):

extra <- extra[match(slot(nc, 'data')$CNTY_ID, extra$ID), 1:2]

> str(extra)
'data.frame':   100 obs. of  2 variables:
 $ ID   : num  NA NA NA 1831 1832 ...
 $ Ndata: num  NA NA NA 0.252 0.842 ...

We now have the data frame you wanted to add with the same number of rows, and 
ordered the same way as the data in the SpatialPolygonsDataFrame.

We add it to the data slot in the SpatialPolygonsDataFrame.
slot(nc, 'data')$Ndata <- extra$Ndata

> str(slot(nc, 'data')) 
'data.frame':   100 obs. of  15 variables:      
.
.
 $ CNTY_ID  : num  1825 1827 1828 1831 1832 ...                                  
 .
.
 $ NWBIR79  : num  19 12 260 145 1197 ...
 $ Ndata    : num  NA NA NA 0.252 0.842 ...
.

The point is that you sort the data by the IDs in the original shape file, and 
hence you can simply add the data back to the data slot and they are than 
located in the right place.

Finally export the data, and the new shapefile has one more variable. 
writeOGR(nc,dsn="/home/lunde/MMAMBmuni2",layer="MMAMBmuni2",
driver="ESRI Shapefile")

Am I still wrong? In that, could someone else assist me?

Best wishes 
Torleif

On Thursday 19 March 2009 08:21:25 pm Agustin Lobo wrote:
> Thanks. I might be wrong, but I think that your example is different.
> The problem comes up when the second dataframe does not
> have values for all cases  that are present in the first one. For example
>
>  > a1 <-data.frame(letras=c("A","B","C","D"),nums=c("1","2","3","4"))
>  > a2 <-data.frame(letras=c("A","C","D"),nums=c("10","30","40"))
>  > a1
>
>   letras nums
> 1      A    1
> 2      B    2
> 3      C    3
> 4      D    4
>
>  > a2
>
>   letras nums
> 1      A   10
> 2      C   30
> 3      D   40
>
>  > a2 <-data.frame(letters=c("A","C","D"),cods=c("10","30","40"))
>  > merge(a1,a2,by.x="letras",by.y="letters",all.x=T,sort=F)
>
>   letras nums cods
> 1      A    1   10
> 2      C    3   30
> 3      D    4   40
> 4      B    2 <NA>
>
> which disrupts the ordering in a1 and thus creates a risk for
> puting the merged dataframe in the SpPolDF
>
> And what you say would be:
>  > a2[match(a1$letras, a2$letters), ]
>
>    letters cods
> 1        A   10
> NA    <NA> <NA>
> 2        C   30
> 3        D   40
>
> which would not solve the problem.
>
> Perhaps I did not correctly interpret your solution?
>
> Agus
>
> Torleif Markussen Lunde wrote:
> > Hi
> >
> > Maybe this can help? Please correct me if this is not what you wanted.
> >
> > require(maptools)
> >
> > nc <- readShapePoly(system.file("shapes/sids.shp",
> > package="maptools")[1], proj4string=CRS("+proj=longlat +datum=NAD27"))
> >
> > #Create dummy data. Do some changes to make it look different (subset and
> > order)
> > extra <- data.frame(ID=slot(nc, 'data')$CNTY_ID,
> > Ndata=runif(length(slot(nc, 'data')$CNTY_ID)))
> > extra <- extra[4:67, 1:2]
> > extra <- extra[order(extra$ID, decreasing=TRUE),]
> > extra[1,1] <- 342
> >
> > #add the dummy data(.frame) (this part is what you want to do)
> > extra <- extra[match(slot(nc, 'data')$CNTY_ID, extra$ID), 1:2]
> >
> > slot(nc, 'data')$Ndata <- extra$Ndata
> > #or for the data frame
> > slot(nc, 'data') <- cbind(slot(nc, 'data'), extra[-1])
> >
> > Best wishes
> > Torleif
> >
> > On Thursday 19 March 2009 01:08:15 pm Agustin Lobo wrote:
> >> Hi!
> >>
> >> I often have to add more information to the data slot of
> >> a SpPolDF imported from a shp file. I do it in this way, don't like
> >> it too much and would like feed-back on a better way-
> >>
> >> #Import shp
> >> MMAMBmuni <- readOGR("C:/Pruebas/DUNS/MMAMBmuni", layer="MMAMBmuni")
> >> #Extract the DF
> >> MMAMBmuniDFori <- MMAMBmuni at data
> >>
> >> #Make a new dataframe by merging with another DF
> >> MMAMBmuniDFnew <-
> >> merge(MMAMBmuniDFori,MMAMBempleados,by.x="MUNICIPI",by.y="CODMUN",all.x=
> >>T,s ort=F)
> >>
> >> The problem here is that there are a couple of towns in the by.x field
> >> for which we do not any in by.y
> >> As we have set all.x=T, we get a line for which the values from the
> >> second dataframe are NA. But, despite stating sort=F, those cases are
> >> not in the same row as they are in the first data.frame but appended at
> >> the end of the new dataframe. This is bad news for us, as breaks
> >> the order required for including the new dataframe as the data slot
> >> of a new SpPolDF. Therefore, I have to reorder the new dataframe, thanks
> >> to another field, IDgrafic:
> >>
> >> MMAMBmuniDFnew<- MMAMBmuniDFnew[order(MMAMBmuniDFnew$ID_GRAFIC),]
> >>
> >> and then copy the original row.names, required because the row.names are
> >> the ones
> >> making the link to the polygons in the future SpPolDF:
> >>
> >> row.names(MMAMBmuniDFnew) <- row.names(MMAMBmuniDFori)
> >>
> >> #Now we put the new DF in lieu of the older one:
> >> MMAMBmuni2 at data <- MMAMBmuniDFnew
> >>
> >> #and finally save as shp
> >> writeOGR(MMAMBmuni2,dsn="C:/Pruebas/DUNS/MMAMBmuni2",layer="MMAMBmuni2",
> >> driver="ESRI Shapefile")
> >>
> >> Any suggestions on a better procedure? The problem is that sometimes I
> >> forget reordering and get a wrong shp. Until now, I have always realized
> >> the error, but I'm terrified by the idea of not realizing the error
> >> sometime and using true garbage after that point...
> >>
> >> Thanks
> >>
> >> Agus
> >>
> >> _______________________________________________
> >> R-sig-Geo mailing list
> >> R-sig-Geo at stat.math.ethz.ch
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-geo