[R-sig-Geo] Creating very large spatial weight matrix

Roger Bivand Roger.Bivand at nhh.no
Fri Nov 19 09:08:27 CET 2010

On Thu, 18 Nov 2010, Aleksandr Andreev wrote:

> Yes, sorry, I'm running R 2.12.0 on Ubuntu 64-bit (kernel 2.6.32-25-generic)

The actual answer is to use the function needed for this operation:

coords <- cbind(Lon, Lat)
dnb <- dnearneigh(coords, 0, dmax, longlat=TRUE)

where dmax is a small distance in km. Of course, if you really need all 
the distances, all bets are off, but this would be an unusually specified 
picture of the underlying spatial process. I suggest not worrying about 
ensuring that all observations have at least one neighbour - for such a 
global measure as Moran's I for N=120', dropping a few cannot matter much. 
Go with a tight dmax, and it should just work. If dmax is loose, and the 
average number of neighbours creeps up, the nb object (and the following 
listw object) will get denser, with possibly some observations with 
thousands of neighbours, so oversmoothing the process.

If this is continental rather than whole-world, consider projecting to the 
plane and using graph-based neighbours (?graph2nb).

Hope this helps,


> Thanks for pointing out ff.
> ------------------------
> Aleksandr Andreev
> Graduate Student - Department of Economics
> University of North Carolina at Chapel Hill
> Mobile: +1 303 507 93 88
> Skype: typiconman
> 2010/11/18 Michael Sumner <mdsumner at gmail.com>:
>> And, please report your OS and version of R (64-bit presumably?).
>> On Fri, Nov 19, 2010 at 10:39 AM, Michael Sumner <mdsumner at gmail.com> wrote:
>>> In general you need at least twice the required memory, and it has to
>>> be contiguous. Try with a fresh instance of R and try to create a
>>> single vector of that size, that might show that you *could* do it.
>>> Otherwise, check out the ff package, and see other options in the High
>>> Performance Computing Task View on CRAN.
>>> There may be other techniques you can use to solve the problem, but
>>> those two things are my direct answers to your questions.
>>> Cheers, Mike.
>>> On Fri, Nov 19, 2010 at 10:28 AM, Aleksandr Andreev
>>> <aleksandr.andreev at gmail.com> wrote:
>>>> Hello list,
>>>> I have 120,000 geocoded observations, for which I'm trying to create a
>>>> distance-based spatial weighting matrix so that I can perform a Moran
>>>> test.
>>>> Each observation has Lat and Lon.
>>>> Unfortunately, when I run
>>>> dists <- as.matrix(dist(cbind(Lon, Lat)))
>>>> I get the message:
>>>> Error in vector("double", length) : vector size specified is too large
>>>> Now I realize that 120,000^2 / 2 is on the order of 6 GB. However, I
>>>> seem to be running into software limitations on the vector size before
>>>> I hit RAM limitations. Also, in principle, it should be possible
>>>> (though slow) to use hard disk space to store this matrix. Does anyone
>>>> have any ideas on how to do this in R?
>>>> Thanks,
>>>> ------------------------
>>>> Aleksandr Andreev
>>>> Graduate Student - Department of Economics
>>>> University of North Carolina at Chapel Hill
>>>> _______________________________________________
>>>> R-sig-Geo mailing list
>>>> R-sig-Geo at stat.math.ethz.ch
>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>> --
>>> Michael Sumner
>>> Institute for Marine and Antarctic Studies, University of Tasmania
>>> Hobart, Australia
>>> e-mail: mdsumner at gmail.com
>> --
>> Michael Sumner
>> Institute for Marine and Antarctic Studies, University of Tasmania
>> Hobart, Australia
>> e-mail: mdsumner at gmail.com
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo

Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no

More information about the R-sig-Geo mailing list