[R-sig-Geo] Memory problems with dnearneigh in spdep

Roger Bivand Roger.Bivand at nhh.no
Sun Apr 7 21:15:40 CEST 2013


I do prefer an affiliation, a real name, and a motivation; a
yahoo or gmail address tells me nothing about what you might be expected
to know. If I (or a list helper) cannot "place" you, it is much harder to
know why you are having problems.

You are also in breach of list rules because you are posting in HTML 
rather than just plain text. Removing the HTML will shorten your posting 
radically.


On Sun, 7 Apr 2013, Empty Empty wrote:

> Hi,
>
> I'm running into memory problems calculating Gi* using dnearneigh and 
> wonder if someone has any suggestions for either my script or another 
> work around.
>
> I have 1km data for Africa. My colleague calculated the appropriate 
> distance band (9333m) using incremental spatial autocorrelation in 
> ArcGIS but she was only able to run a subset of the data or with the 
> data aggregated to 5km due to memory issues (fixed distance, Euclidean). 
> I tried it in both ArcGIS (memory error after 45 hours) and R (Within 2 
> hours it was using more than 245GB on the server and then crashed). This 
> is what I used

Please always include the output of sessionInfo(). Here also include 
summary(r) output.

>
> library(spdep)
>
>
> r<-readOGR(".","r")
>
> coords<-coordinates(r)

This is not needed.

> nb9333<- dnearneigh(coords, 0,9333,longlat = F)

nb9333<- dnearneigh(r, 0, 9333)

would work, and would not override longlat=. I assume in fact that r is in 
longlat, and that you would then mean to use a km threshold of 9.333. If 
I'm right, then the memory usage is being complicated by your including 
all observations as neighbours of each observation.

For a 1km grid, a threshold of 9333m means that each of the about 30M 
gridcells has 100 neighbours (rather strong smoothing). Is the threshold 
sensibly chosen? The neighbour object will be 30M by 100 by 4, 1.21e+10 
(12G); the labels of the neighbour object at least the same; the weights 
24G in addition, so nb9333.listw is about 50G. With a more sensible 
threshold (1.5km?), the sizes of the nb9333.listw weights object become 
more tractable, maybe about 5G.

Is any of the data observed on this grid? For localG* (or localG) to make 
any sense, the variable must be observed at this resolution (satellite 
imagery?) - if interpolated, all bets are off.

Finally, the output involves so many multiple comparisons that no 
statistical inference is possible under normal assumptions - where is the 
3.886 coming from - is this an attempt to adjust for multiple comparisons?

I'm unsure why anyone would do this, as you could use standard filtering 
techniques on imaging data. The raster package also has functions for 
computing the local Moran and Geary measures on RasterLayer objects.

Hope this clarifies,

Roger

> nb9333.listw <- nb2listw(include.self(nb9333), style="B")???
> G9333<-localG(r$GRID_CODE,nb9333.listw)???
>
>
> sig3 <- classIntervals(G9333, n=3, style="fixed",
> ?????????????????????? fixedBreaks=c(min(G9333),-3.886, 3.886, max(G9333)))???? #I dont' think 3.886 is right
> cs<-c('blue','white','red') # blue = below -3, white = intermediate, red = above 3
> sigcol <- findColours(sig3, cs)?
>
> png("r.png")
> plot(r, col=sigcol, axes=TRUE)
> dev.off()
>
> 	[[alternative HTML version deleted]]
>
>

-- 
Roger Bivand
Department of Economics, NHH Norwegian School of Economics,
Helleveien 30, N-5045 Bergen, Norway.
voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no



More information about the R-sig-Geo mailing list