[R-sig-Geo] large shapefiles; zip code data

Virgilio Gomez Rubio Virgilio.Gomez at uclm.es
Thu Mar 12 11:46:22 CET 2009


Hi Ben,

> There are 31625 zipcodes which correspond to a geographic area in the
> US.  I'm expecting it to take a while to run, as long as it takes less

I know that some people in Spain were using similar models with around
8500 areas and it took 18 hours to run. So not sure if in your case
WinBUGS will be able to give you an answer in a reasonable time. You may
try to load your shapefiles using a GIS and then try to make the
adjacency matrix with that. Depending on the software, you may get a
better memory management.

When fitting your models, I believe that you should try INLA if you have
a large number of areas. It is a software for Approximate Bayesian
Infenrece that we will give you the marginals of your parameters of
interest. Basically, you do not MCMC any more but approximate the
marginal distributions of your parameters of interest, which is faster.
In disease mapping most of the time that is what you want (i.e.,
posterior estimates of your relative risks).

The code is very well written and I am sure that it will handle large
data sets better than WinBUGS and you will get similar results. There is
a full course on this here:

http://www.bias-project.org.uk/GMRFCourse/

There is an R interface (R-INLA) that works pretty well.  The
installation is a bit tricky in Linux, but if you are using Windows I
believe that it is also possible to have it installed. 

Another option is to parallelize the Gibss sampler. I did some work on
this some time ago using package snow, but I found that, for very large
problems, it is not very reliable. For some reason, the connection
between the nodes was lost, although it could also be related to my
(lack of) configuration of the cluster.

Hope this helps.

Virgilio

PS: Fitting a spatial model on the NC Sids Data with R-INLA is something
like

library(spdep)
library(INLA)

source("nb2inla.R")

#Read and display map
xx <- readShapePoly(system.file("shapes/sids.shp",
package="maptools")[1],
      IDvar="FIPSNO", proj4string=CRS("+proj=longlat +ellps=clrk66"))
nbsids<-poly2nb(xx)

xx$region<-0:99
xx$region.struct<-0:99
xx$Exp<-xx$BIR74*sum(xx$SID74)/sum(xx$BIR74)
xx$SMR<-xx$SID74/xx$Exp

nb2inla(file="ncsids.graph", nbsids)


formula  <-
SID74~f(region.struct,model="besag",graph.file="ncsids.graph",
param=c(1,0.00005),initial=2.8)+f(NWBIR74,model="rw2",param=c(1,0.05))+
f(region,model="iid")

mod  <-  inla(formula,family="poisson",data=xx,E=Exp,
   control.inla=list(h=0.01),verbose=TRUE,
   inla.call="/home/virgil/usr/bin/inla")

xx$RR<-mod$summary.fitted.values[,1]
spplot(xx, c("SMR", "RR"))



More information about the R-sig-Geo mailing list