[R-sig-Geo] memory Usage setting

Roger Bivand Roger.Bivand at nhh.no
Tue Sep 11 16:50:03 CEST 2007


On Tue, 11 Sep 2007, elw at stderr.org wrote:

>
>> These days in GIS on may have to manipulate big datasets or arrays.
>>
>> Here I am on WINDOWS I have a 4Gb
>> my aim was to have an array of dim 298249 12 10 22 but that's 2.9Gb
>

Assuming double precision (no single precision in R), 5.8Gb.

>
> It used to be (maybe still is?) the case that a single process could only
> 'claim' a chunk of max size 2GB on Windows.
>
>
> Also remember to compute overhead for R objects... 58 bytes per object, I
> think it is.
>
>
>> It is also strange that once a dd needed 300.4Mb and then 600.7Mb (?) as
>> also I made some room in removing ZZ?
>
>
> Approximately double size - many things the interpreter does involve
> making an additional copy of the data and then working with *that*.  This
> might be happening here, though I didn't read your code carefully enough
> to be able to be certain.
>
>
>> which I don't really know if it took into account as the limit is
>> greater than the physical RAM of 4GB. ...?
>
> :)
>
>> would it be easier using Linux ?
>
> possibly a little bit - on a linux machine you can at least run a PAE
> kernel (giving you a lot more address space to work with) and have the
> ability to turn on a bit more virtual memory.
>
> usually with data of the size you're trying to work with, i try to find a
> way to preprocess the data a bit more before i apply R's tools to it.
> sometimes we stick it into a database (postgres) and select out the bits
> we want our inferences to be sourced from.  ;)
>
> it might be simplest to just hunt up a machine with 8 or 16GB of memory in
> it, and run those bits of the analysis that really need memory on that
> machine...

Yes, if there is no other way, a 64bit machine with lots of RAM would not 
be so contrained, but maybe this is a matter of first deciding why doing 
statistics on that much data is worth the effort? It may be, but just 
trying to read large amounts of data into memory is perhaps not justified 
in itself.

Can you tile or subset the data, accumulating intermediate results? This 
is the approach the biglm package takes, and the R/GDAL interface also 
supports subsetting from an external file.

Depending on the input format of the data, you should be able to do all 
you need provided that you do not try to keep all the data in memory. 
Using a database may be a good idea, or if the data are multiple remote 
sensing images, subsetting and accumulating results.

Roger

>
> --e
>
> _______________________________________________
> R-sig-Geo mailing list
> R-sig-Geo at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no




More information about the R-sig-Geo mailing list