[R-sig-Geo] Help with creating a weight matrix for point dataset of 78, 000 observations

Roger Bivand Roger.Bivand at nhh.no
Mon Jun 17 10:12:46 CEST 2013

On Sun, 16 Jun 2013, Susan Gorelick wrote:

> Hi,
> I need to create a weight matrix in R  for point dataset of 78,000
> observations from a shapefile( from ArcGIS) which has both sets of
> longitudes and latitudes and X-Y coordinates (based on state plane feet).
> So far I projected the dataset in ESRI coordinates using rgdal maptools.

Your data are point data. Did you look at a summary() of the object after 
reading it in? You would look at summary in Stata, so do the same in R.

> I  keep getting error message as below:
> shapeR_SP <-spTransform(shapeR, CRS("+init=ESRI:102730"))

Did you check what the CRS of the imported object was?

>> IDs <-row.names(as(shapeR_SP, "data.frame"))
>> coords<-coordinates(shapeR_SP)
>> shapeR_nbq <-poly2nb(shapeR_SP)
> Error: extends(class(pl), "SpatialPolygons") is not TRUE

You did read ?poly2nb, didn't you? You have data with point support, so if 
you really want to treat them as polygons, you would go through a 
triangulation first - then tri2nb?

>> shapeR_kn6 <-knn2nb(knearneigh(coords, k=6), row.names=IDs)
> Warning message:
> In library(package, lib.loc = lib.loc, character.only = TRUE, logical.return
> = TRUE,  :
>  there is no package called 'RANN'
> where is 'RANN' from?!

Please do understand how R packages work. Do visit the spdep page on CRAN 
- you'll see that RANN (a clever implementation of fast approximate 
nearest neighbours) is "Suggests:". If spdep had "Depends:" on RANN, it 
would have been installed automatically when you installed spdep; when 
packages are "Suggests:", they are optional extras. In your case with 
point support, RANN will be faster than using the fallback, so install it. 
This is just as in using additional, user-contributed code in Stata, the 
user chooses what to install.

> The outcome goal is to address the spatial dependence  in hedonic model with
> house sales transaction study.  I like to run the spatial dependence test so
> that I can refine the spatial regession analysis.
> I would appreciate any suggestions  and/or  answers to the following
> questions:
> 1.       What is the sample size capacity of R (weight matrix size)?  This
> is the main reason I switched back to R from Stata.  Is there any means to
> achieve the outcomes I am looking for?


> 2.       Do I need to install.view ("Spatial)? When I tried to
> install.view("Spatial"), I've been getting a message that that package is
> not available for R 3.0 which I am using - R studio. Is "Spatial" different
> from "spatial" package?  Do I need to find a different version of R?

No, but you should read up on and understand how R handles contributed 
packages. It may be that R Studio is getting in the way adding an extra 
layer of stuff between you and the console - it will work fine when you 
know what you are doing, but will not help you to learn, it is intended 
to support programming in R when the programmer knows what to do.

> 3.       Is there a better to achieve the outcome?  For instance, using
> GeoDa to create a weight matrix?

Generating neighbour objects for 78k points is not a problem:

xy <- matrix(runif(78000*2), ncol=2)
system.time(nb6 <- knn2nb(knearneigh(xy, k=6)))
#   user  system elapsed
#  7.093   0.047   7.142

You may of course check this by creating k6 neighbours in GeoDa from the 
projected point shapefile, and reading the resulting GAL file into R.

However, this is an asymmetric neighbour list, so fitting ML models will 
be done accurately with method="LU", which will be slower than 
method="Matrix" using updating spare Cholesky log determinants for 
symmetric neighbours. You may also use method="MC" for Monte Carlo log 
determinant approximations. The default method="eigen" will have memory 
problems with your dense 78k by 78k matrix.

You should also be able to use the full range of GM estimators (as you 
would in Stata). Check by exporting the neighbours generated in R to Stata 

lw <- nb2listw(nb, style="W")
# for row standardised weights
sn <- listw2sn(lw)
write.sn2gwt(sn, "nb_w.gwt")

spmat import W using nb_w.gwt, geoda replace
spreg gs2sls Y <your Xs>, id(ID) dlmat(W) elmat(W)

See for details:


Hope this clarifies,


> Thank you.
> Susan Shim Gorelick
> 	[[alternative HTML version deleted]]
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

More information about the R-sig-Geo mailing list