[R-sig-Geo] How to extract coordinates values from a shapefile?

Paul Hiemstra p.hiemstra at geo.uu.nl
Sat Jun 12 20:29:23 CEST 2010


On 06/11/2010 11:43 AM, Paul Hiemstra wrote:
> On 06/09/2010 11:01 PM, Dylan Beaudette wrote:
>> On Wednesday 09 June 2010, Hadley Wickham wrote:
>>> On Wed, Jun 9, 2010 at 3:24 PM, Edzer Pebesma
>>>
>>> <edzer.pebesma at uni-muenster.de>  wrote:
>>>> The example provided by Matt assumes that each polygon consists of a
>>>> single ring, and doesn't have islands, lakes etc. The function below
>>>> pasts all coordinates to a single 2-column matrix. For clarity's 
>>>> sake, I
>>>> avoided nested sapply's.
>>>>
>>>> library(maptools)
>>>> xx<- readShapePoly(system.file("shapes/sids.shp",
>>>> package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
>>>> +ellps=clrk66")) allcoordinates = function(x) {
>>>>     ret = NULL
>>>>     polys = x at polygons
>>>>     for(i in 1:length(polys)) {
>>>>         pp = polys[[i]]@Polygons
>>>>         for (j in 1:length(pp))
>>>>             ret = rbind(ret, coordinates(pp[[j]]))
>>>>     }
>>>>     ret
>>>> }
>>>> q = allcoordinates(xx)
>>>>
>>>> # check:
>>>> plot(xx)
>>>> lines(q, col='blue')
>>> And that's basically what fortify does, except it covers a few more 
>>> cases.
>>>
>>> Hadley
>> Not to be nit-picky, but wouldn't it be safer to pre-allocated "ret" 
>> instead
>> of dynamically appending? This would allow the suggested function to 
>> scale to
>> very large geometries.
> In addition to my previous suggestion, I tried using lapply + do.call 
> instead of the for loop based implementation of edzer:
>
> library(maptools)
> xx<- readShapePoly(system.file("shapes/sids.shp", 
> package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat 
> +ellps=clrk66"))
>
> allcoordinates = function(x) {
>     ret = NULL
>     polys = x at polygons
>     for(i in 1:length(polys)) {
>         pp = polys[[i]]@Polygons
>         for (j in 1:length(pp))
>             ret = rbind(ret, coordinates(pp[[j]]))
>     }
>     ret
> }
>
> allcoordinates_lapply = function(x) {
>     polys = x at polygons
>     return(do.call("rbind", lapply(polys, function(pp) {
>     do.call("rbind", lapply(pp at Polygons, coordinates))
>     })))
> }
More information on why this is so much faster can be found in the R 
Inferno Chapter 2, this gives a good description of what goes wrong in 
the background when growing an R object.

http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

The following basic R code also provides some info:

# Just letting the object grow
# without preallocation
meth1 = function(n) {
   vec <- numeric(0);
   for(i in 1:n) vec <- c(vec, i)
}

# preallocation
meth2 = function(n) {
   vec <- numeric(n)
   for(i in 1:n) vec[i] <- i
}

# Direct creation
# Not relevant for the
# above case
meth3 = function(n) {
   vec <- 1:n
}

# The suggestion I did
# with do.call and lapply
meth4 = function(n) {
   do.call("c", lapply(1:n, function(x) x))
}

n = 100000
# Speed test
system.time(meth1(n)) # Slooooow
system.time(meth2(n))
system.time(meth3(n))
system.time(meth4(n))

cheers,
Paul

>
> q = allcoordinates(xx)
> z = allcoordinates_lapply(xx)
> all.equal(q,z)
>
> So far it produces the same output. And now comes the nice part, doing 
> a test with a large polygon set and timing the performance of both 
> functions:
>
> > system.time(x <- allcoordinates(nuts))
>    user  system elapsed
>  22.781   8.572  31.422
> > system.time(y <- allcoordinates_lapply(nuts))
>    user  system elapsed
>   0.248   0.004   0.250
> > all.equal(x,y)
> [1] TRUE
>
> So apart from the imo nicer looking function in terms of syntax, it is 
> also several orders of magnitude faster. This effect will only become 
> stronger if the polygon set becomes larger. So, lapply + do.call is 
> your friend :).
>
> cheers,
> Paul
>
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> i486-pc-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] maptools_0.7-34 lattice_0.18-5  sp_0.9-62       foreign_0.8-40
>
> loaded via a namespace (and not attached):
> [1] grid_2.11.0
>
>>
>> Cheers,
>> Dylan
>>
>>
>>
>
>


-- 
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone:  +3130 274 3113 Mon-Tue
Phone:  +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770



More information about the R-sig-Geo mailing list