[R-sig-Geo] How to extract coordinates values from a shapefile?
Paul Hiemstra
p.hiemstra at geo.uu.nl
Sat Jun 12 20:29:23 CEST 2010
On 06/11/2010 11:43 AM, Paul Hiemstra wrote:
> On 06/09/2010 11:01 PM, Dylan Beaudette wrote:
>> On Wednesday 09 June 2010, Hadley Wickham wrote:
>>> On Wed, Jun 9, 2010 at 3:24 PM, Edzer Pebesma
>>>
>>> <edzer.pebesma at uni-muenster.de> wrote:
>>>> The example provided by Matt assumes that each polygon consists of a
>>>> single ring, and doesn't have islands, lakes etc. The function below
>>>> pasts all coordinates to a single 2-column matrix. For clarity's
>>>> sake, I
>>>> avoided nested sapply's.
>>>>
>>>> library(maptools)
>>>> xx<- readShapePoly(system.file("shapes/sids.shp",
>>>> package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
>>>> +ellps=clrk66")) allcoordinates = function(x) {
>>>> ret = NULL
>>>> polys = x at polygons
>>>> for(i in 1:length(polys)) {
>>>> pp = polys[[i]]@Polygons
>>>> for (j in 1:length(pp))
>>>> ret = rbind(ret, coordinates(pp[[j]]))
>>>> }
>>>> ret
>>>> }
>>>> q = allcoordinates(xx)
>>>>
>>>> # check:
>>>> plot(xx)
>>>> lines(q, col='blue')
>>> And that's basically what fortify does, except it covers a few more
>>> cases.
>>>
>>> Hadley
>> Not to be nit-picky, but wouldn't it be safer to pre-allocated "ret"
>> instead
>> of dynamically appending? This would allow the suggested function to
>> scale to
>> very large geometries.
> In addition to my previous suggestion, I tried using lapply + do.call
> instead of the for loop based implementation of edzer:
>
> library(maptools)
> xx<- readShapePoly(system.file("shapes/sids.shp",
> package="maptools")[1], IDvar="FIPSNO", proj4string=CRS("+proj=longlat
> +ellps=clrk66"))
>
> allcoordinates = function(x) {
> ret = NULL
> polys = x at polygons
> for(i in 1:length(polys)) {
> pp = polys[[i]]@Polygons
> for (j in 1:length(pp))
> ret = rbind(ret, coordinates(pp[[j]]))
> }
> ret
> }
>
> allcoordinates_lapply = function(x) {
> polys = x at polygons
> return(do.call("rbind", lapply(polys, function(pp) {
> do.call("rbind", lapply(pp at Polygons, coordinates))
> })))
> }
More information on why this is so much faster can be found in the R
Inferno Chapter 2, this gives a good description of what goes wrong in
the background when growing an R object.
http://www.burns-stat.com/pages/Tutor/R_inferno.pdf
The following basic R code also provides some info:
# Just letting the object grow
# without preallocation
meth1 = function(n) {
vec <- numeric(0);
for(i in 1:n) vec <- c(vec, i)
}
# preallocation
meth2 = function(n) {
vec <- numeric(n)
for(i in 1:n) vec[i] <- i
}
# Direct creation
# Not relevant for the
# above case
meth3 = function(n) {
vec <- 1:n
}
# The suggestion I did
# with do.call and lapply
meth4 = function(n) {
do.call("c", lapply(1:n, function(x) x))
}
n = 100000
# Speed test
system.time(meth1(n)) # Slooooow
system.time(meth2(n))
system.time(meth3(n))
system.time(meth4(n))
cheers,
Paul
>
> q = allcoordinates(xx)
> z = allcoordinates_lapply(xx)
> all.equal(q,z)
>
> So far it produces the same output. And now comes the nice part, doing
> a test with a large polygon set and timing the performance of both
> functions:
>
> > system.time(x <- allcoordinates(nuts))
> user system elapsed
> 22.781 8.572 31.422
> > system.time(y <- allcoordinates_lapply(nuts))
> user system elapsed
> 0.248 0.004 0.250
> > all.equal(x,y)
> [1] TRUE
>
> So apart from the imo nicer looking function in terms of syntax, it is
> also several orders of magnitude faster. This effect will only become
> stronger if the polygon set becomes larger. So, lapply + do.call is
> your friend :).
>
> cheers,
> Paul
>
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> i486-pc-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] maptools_0.7-34 lattice_0.18-5 sp_0.9-62 foreign_0.8-40
>
> loaded via a namespace (and not attached):
> [1] grid_2.11.0
>
>>
>> Cheers,
>> Dylan
>>
>>
>>
>
>
--
Drs. Paul Hiemstra
Department of Physical Geography
Faculty of Geosciences
University of Utrecht
Heidelberglaan 2
P.O. Box 80.115
3508 TC Utrecht
Phone: +3130 274 3113 Mon-Tue
Phone: +3130 253 5773 Wed-Fri
http://intamap.geo.uu.nl/~paul
http://nl.linkedin.com/pub/paul-hiemstra/20/30b/770
More information about the R-sig-Geo
mailing list