[R-sig-Geo] Regression - large neighbour matrix - poor performance

Roger Bivand Roger.Bivand at nhh.no
Mon Mar 7 21:09:37 CET 2016


On Mon, 7 Mar 2016, Guilherme Ottoni wrote:

> Dear list,
>
> I'm working with some land's Hedonic Pricing Model, at county level,
> to determine how much few urban facilities would incrise the land's
> value. The literature and the spatial exploratory data analysis show
> that spatial effects should be considered while modeling.
>
> The shapefile I'm using is points type, not polygons. So I took the
> coordenates of the land points and generated the neighbour matrix
> (distance range of 500m - file size 400mb). However, the matrix got
> too big (as shown below).
>
> I could manage to do the Moran's tests. When I tryed to run the SAR,
> SEM and other models of spatial regression, I got the error mesage
> saying R has reached the total memory size of the computer (8GB).
>
> I tryed to change the "method" in the *sarlm types from "default" to
> "LU", but the estimation is running for 3h so far and it seemd that
> the Hessian maximization looped in certain value.
>
> I got no clue whether I'm doing it the rightway or there is a smarter
> way of doing so.
>
> Any help would be very welcome!
>
> ------------------------------------------------------ ROUTINE
> ------------------------------------
> mapa <- readShapePoints("OUC-ACLO_ITBI5500.shp")
> mapa <- readOGR(".", "OUC-ACLO_ITBI5500")
> OGR data source with driver: ESRI Shapefile
> Source: ".", layer: "OUC-ACLO_ITBI5500"
> with 25857 features
> It has 42 fields
>
> coords<-coordinates(mapa)
> vizinhos <- dnearneigh(coords, d1=0, d2=500, row.names=IDs)
> matriz_vizinhos <- nb2listw(vizinhos)
>
> summary(vizinhos)
> Neighbour list object:
> Number of regions: 25857
> Number of nonzero links: 15642996
> Percentage nonzero weights: 2.339719

This weights matrix is not very sparse, so all calculations will end up 
the same way. Use either a different distance threshold, or if the point 
density is very varied, use a variant of triangulation (SOI usually works 
well). If the weights are sparse, as with the Lucas county house price 
data from Spatial Econometrics toolbox and included in spdep, everything 
should run very much faster.

> Average number of links: 604.9811

This should certainly alert you to the problem - a mean count of 
neighbours of 605 implies that on average y is impacted by its nearest 605 
neighbours.

>
> lag.fit<-lagsarlm(formula, data=mapa, listw=matriz_vizinhos,
> method = "Matrix", quiet = FALSE)

library(spdep)
data(house)
hform <- formula(log(price) ~ age + I(age^2) + I(age^3) + log(lotsize) +
   rooms + beds + syear)
hlw <- nb2listw(LO_nb)
system.time(hlag_ML_Matrix <- lagsarlm(hform, data=house, listw=hlw,
   method="Matrix"))
#   user  system elapsed
#  1.331   0.007   1.338

on a four-year old laptop. But:

> LO_nb
Neighbour list object:
Number of regions: 25357
Number of nonzero links: 74874
Percentage nonzero weights: 0.01164489
Average number of links: 2.952794

Hope this clarifies,

Roger

> ----------------------------------------------------------------------------------------------------------------------
>
> Cheers
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Department of Economics, Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 91 00
e-mail: Roger.Bivand at nhh.no
http://orcid.org/0000-0003-2392-6140
https://scholar.google.no/citations?user=AWeghB0AAAAJ&hl=en
http://depsy.org/person/434412



More information about the R-sig-Geo mailing list