[R-sig-Geo] How to extract coordinates values from a shapefile?

Fri Jun 11 11:43:34 CEST 2010

On 06/09/2010 11:01 PM, Dylan Beaudette wrote:
> On Wednesday 09 June 2010, Hadley Wickham wrote:
>    
>> On Wed, Jun 9, 2010 at 3:24 PM, Edzer Pebesma
>>
>> <edzer.pebesma at uni-muenster.de>  wrote:
>>      
>>> The example provided by Matt assumes that each polygon consists of a
>>> single ring, and doesn't have islands, lakes etc. The function below
>>> pasts all coordinates to a single 2-column matrix. For clarity's sake, I
>>> avoided nested sapply's.
>>>
>>> library(maptools)
>>> xx<- readShapePoly(system.file("shapes/sids.shp",
>>> package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
>>> +ellps=clrk66")) allcoordinates = function(x) {
>>>     ret = NULL
>>>     polys = x at polygons
>>>     for(i in 1:length(polys)) {
>>>         pp = polys[[i]]@Polygons
>>>         for (j in 1:length(pp))
>>>             ret = rbind(ret, coordinates(pp[[j]]))
>>>     }
>>>     ret
>>> }
>>> q = allcoordinates(xx)
>>>
>>> # check:
>>> plot(xx)
>>> lines(q, col='blue')
>>>        
>> And that's basically what fortify does, except it covers a few more cases.
>>
>> Hadley
>>      
> Not to be nit-picky, but wouldn't it be safer to pre-allocated "ret" instead
> of dynamically appending? This would allow the suggested function to scale to
> very large geometries.
>    
In addition to my previous suggestion, I tried using lapply + do.call 
instead of the for loop based implementation of edzer:

library(maptools)
xx<- readShapePoly(system.file("shapes/sids.shp", 
package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat 
+ellps=clrk66"))

allcoordinates = function(x) {
     ret = NULL
     polys = x at polygons
     for(i in 1:length(polys)) {
         pp = polys[[i]]@Polygons
         for (j in 1:length(pp))
             ret = rbind(ret, coordinates(pp[[j]]))
     }
     ret
}

allcoordinates_lapply = function(x) {
     polys = x at polygons
     return(do.call("rbind", lapply(polys, function(pp) {
     do.call("rbind", lapply(pp at Polygons, coordinates))
     })))
}

q = allcoordinates(xx)
z = allcoordinates_lapply(xx)
all.equal(q,z)

So far it produces the same output. And now comes the nice part, doing a 
test with a large polygon set and timing the performance of both functions:

 > system.time(x <- allcoordinates(nuts))
    user  system elapsed
  22.781   8.572  31.422
 > system.time(y <- allcoordinates_lapply(nuts))
    user  system elapsed
   0.248   0.004   0.250
 > all.equal(x,y)
[1] TRUE

So apart from the imo nicer looking function in terms of syntax, it is 
also several orders of magnitude faster. This effect will only become 
stronger if the polygon set becomes larger. So, lapply + do.call is your 
friend :).

cheers,
Paul

 > sessionInfo()
R version 2.11.0 (2010-04-22)
i486-pc-linux-gnu

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] maptools_0.7-34 lattice_0.18-5  sp_0.9-62       foreign_0.8-40

loaded via a namespace (and not attached):
[1] grid_2.11.0

>
> Cheers,
> Dylan
>
>
>
>    

-- 
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770