[R-sig-Geo] large shapefiles; zip code data

Virgilio Gomez Rubio Virgilio.Gomez at uclm.es
Thu Mar 12 10:50:33 CET 2009


Ben,

How many areas will you have in case you are able to read in all your
shapefiles? Are you sure that you will be able to handle that once you
have your data ready? If you have too many areas perhaps you should
consider aggregate your data in some way and try running a modela with
that.

I know that this is not an answer, but if you are trying to fit a model
with WinBUGS,  it may not take a very large data set. And, if it does,
you will need to run it for a long time before you get an answer.

Best,

Virgilio


El jue, 12-03-2009 a las 00:51 -0700, Ben Fissel escribió:
> Hello,
> 
> I am attempting to fit a model CAR count data model of the Besag-York-Mollie
> form to US zip code data (entire US minus Hawaii and Alaska).  However, I'm
> running out of memory reading in the zip code data.  The zip code data I
> obtained form the census at
> http://www2.census.gov/geo/tiger/TIGER2008/tl_2008_us_zcta500.zip and are
> shapefiles.  I've allocated 4GB of memory to R which is the max my OS will
> give it (Vista).  Despite this when I attempt to load the shapefiles in I
> run out of memory using readOGR or readShapePoly .  I had a similar problem
> in Stata and worked around it by reading in the shapefiles for the lower 48
> states http://www.census.gov/geo/www/cob/z52000.html separately and
> concatenating them together, relabeling the ID in the process.  I'm trying
> to do the same thing in R but relabeling the ID is not as straight forward
> for me given my novice R programming ability.  Luckily I found a little help
> at
> http://help.nceas.ucsb.edu/R:_Spatial#Understanding_spatial_data_formats_in_Rwhich
> I adapted to my code.
> 
> spatdata <- readOGR(".", "zt01_d00")
> #spatdata <- readShapePoly("zt01_d00")
> names(spatdata)[3] <- "ZT00_D00"
> names(spatdata)[4] <- "ZT00_D00_I"
> 
> for (j in 2:2){    # Just loop over one file until I get it to work
>    filename <- paste("zt", statelist[j], "_d00",sep ="")  #statlist  is a
> vector of the form statelist <- c("01","04",...,"56") with number that
> correspond the 48 state shapefiles
> 
>    spatdf <- readOGR(".", filename)
>  # spatdf <- readShapePoly(filename)
>    names(spatdf)[3] <- "ZT00_D00"
>    names(spatdf)[4] <- "ZT00_D00_I"
>    mergedata <- rbind(spatdata at data,spatdf at data)
>    mergepolys <- c(spatdata at polygons,spatdf at polygons)
>    mergepolysp <-
> SpatialPolygons(mergepolys,proj4string=CRS(proj4string(spatdf)))
>    rm("spatdata","spatdf","filename")
> 
>    for (i in 1: length(mergepolys)){
>      sNew = as.character(i)
>      mergepolys[i]@ID = sNew
>    }
>    ID <- c(as.character(1:length(mergepolys)))
>    mergedataID <- cbind(ID,mergedata)
>    spatdata <- SpatialPolygonsDataFrame(mergepolysp,data =
> mergedataID,match.ID = FALSE)
>    rm("mergepolys","mergedata","mergepolysp","mergedataID","ID")
> 
>    gc()
> }
> 
> However in the for loop over "i" I get an error when trying to relabel the
> ID: "Error in validObject(.Object) :  invalid class "SpatialPolygons"
> object: non-unique Polygons ID slot values" .  I've tried a number of
> different ways to change the ID in 'mergepolys' but haven't been successful
> yet.
> 
> Ultimately, I just want to get the shapefiles into R so I can identify
> contiguous zip codes for the spatial regression.  Whether I get this by
> loading in one big shape zip code file or concatenating 48 state files is
> irrelevant to me.  Perhaps the census shapefiles have superfluous data that
> I can get rid of to free up memory and still achieve my objective, I don't
> know enough about shapefiles and how R reads them to know what I can throw
> away.  Maybe I'm going about this all wrong.  Thank you for any help and or
> suggestions that you can provide.
> 
> After getting the shapefiles in I plan to identify contiguous zip codes and
> use R2Winbugs to fit the model as outlined in "Applied Spatial Data Analysis
> with R".  However, given the memory issues I'm having I am concerned that
> forming the spatial weighting matrix won't be possible, will R try to store
> this as an nxn matrix?  Furthermore, I have about 50+ other covariates that
> I need to merge in with the zip code data that is going to take up memory as
> well.  Simply put, is the memory bottleneck just in the function(s) loading
> the shapefiles or am I going to have trouble fitting this model with the
> covariates in R?
> 
> I've seen the thread "mapping by zip codes"
> https://stat.ethz.ch/pipermail/r-sig-geo/2009-March/005194.html , which
> provides very useful information but hasn't helped me get around the
> problems I'm having.
> 
> I've tried to be complete yet concise.  If there is any other information
> you need please let me know.
> 
> Thanks for any help and or suggestions you can provide.
> 
> -Ben
> -- 
> Benjamin Earl Fissel
> Economics Graduate Student
> University of California, San Diego
> bfissel.googlepages.com/home
> bfissel at ucsd.edu
> bfissel at gmail.com
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo



More information about the R-sig-Geo mailing list