[R-sig-Geo] memory Usage setting

Tomislav Hengl tomislav.hengl at jrc.it
Thu Sep 13 14:50:24 CEST 2007


Edzer,

This is also my biggest frustration with R at the moment. I can not load
grids that are bigger than 2M pixels and run geostatistics on them (if I run
the same analysis in a GIS software such as SAGA GIS, then I do not get this
problems). I often get messages such as:

"Reached total allocation of 1536Mb: see help(memory.size)"
"Error: cannot allocate vector of size 101.1Mb"

I still did not figure out how to avoid this obstacle.

FYI, I just came back from the geocomputation conference in Ireland where I
met some  Colleagues from the Centre for e-Science in Lancaster
(http://e-science.lancs.ac.uk). They are just about to release an R package
called MultiR that should be able to significantly speed up R calculations
by employing the grid computing facilities. Daniel Grose
(http://e-science.lancs.ac.uk/personnel.html#daniel) mentioned that they
plan to organize a workshop on MultiR in November 2007.

Grose, D. et al., 2006. sabreR: Grid-Enabling the Analysis of MultiProcess
Random Effect Response Data in R. In: P. Halfpenny (Editor), Second
International Conference on e-Social Science. National Centre for e-Social
Science, Manchester, UK, pp. 12.
http://epubs.cclrc.ac.uk/work-details?w=36493

Tom Hengl
http://spatial-analyst.net

-----Original Message-----
From: r-sig-geo-bounces at stat.math.ethz.ch
[mailto:r-sig-geo-bounces at stat.math.ethz.ch] On Behalf Of Edzer J. Pebesma
Sent: Thursday, September 13, 2007 12:15 PM
To: didier.leibovici at nottingham.ac.uk
Cc: r-sig-geo at stat.math.ethz.ch
Subject: Re: [R-sig-Geo] memory Usage setting

I think R will never do it's own memory swapping, as that is a typical OS
task. There are however several developments (provided in add-on
packages) that will not load all data in memory at start-up, but instead
call some data base whenever a data element is needed. You might search
r-help for rsqlite or biglm, and there are others; also look at the award
winners at useR this year.

Here, we've run pretty successful R sessions needing 10-11 Gb of memory on a
8Gb RAM 64 bits linux machine with lots of swap space. Needs some patience,
and still R might crash other parts of the system when memory usage becomes
too excessive.

Best regards,
--
Edzer

Didier Leibovici wrote:
> Thanks Roger
>
> I feel we've got a low RAM machine which would need a bit of an uplift 
> (recent server though)!
> The linux machine is unfortunately also with 4Gb of RAM But  I persist 
> to say it would be interesting to have within R a way of automatically 
> performing swapping memory if needed ...
>
> Didier
>
> Roger Bivand wrote:
>   
>> On Tue, 11 Sep 2007, elw at stderr.org wrote:
>>
>>     
>>>> These days in GIS on may have to manipulate big datasets or arrays.
>>>>
>>>> Here I am on WINDOWS I have a 4Gb
>>>> my aim was to have an array of dim 298249 12 10 22 but that's 2.9Gb
>>>>         
>> Assuming double precision (no single precision in R), 5.8Gb.
>>
>>     
>>> It used to be (maybe still is?) the case that a single process could 
>>> only 'claim' a chunk of max size 2GB on Windows.
>>>
>>>
>>> Also remember to compute overhead for R objects... 58 bytes per 
>>> object, I
>>> think it is.
>>>
>>>
>>>       
>>>> It is also strange that once a dd needed 300.4Mb and then 600.7Mb 
>>>> (?) as
>>>> also I made some room in removing ZZ?
>>>>         
>>> Approximately double size - many things the interpreter does involve
>>> making an additional copy of the data and then working with *that*.  
>>> This
>>> might be happening here, though I didn't read your code carefully enough
>>> to be able to be certain.
>>>
>>>
>>>       
>>>> which I don't really know if it took into account as the limit is
>>>> greater than the physical RAM of 4GB. ...?
>>>>         
>>> :)
>>>
>>>       
>>>> would it be easier using Linux ?
>>>>         
>>> possibly a little bit - on a linux machine you can at least run a PAE
>>> kernel (giving you a lot more address space to work with) and have the
>>> ability to turn on a bit more virtual memory.
>>>
>>> usually with data of the size you're trying to work with, i try to 
>>> find a
>>> way to preprocess the data a bit more before i apply R's tools to it.
>>> sometimes we stick it into a database (postgres) and select out the bits
>>> we want our inferences to be sourced from.  ;)
>>>
>>> it might be simplest to just hunt up a machine with 8 or 16GB of 
>>> memory in
>>> it, and run those bits of the analysis that really need memory on that
>>> machine...
>>>       
>> Yes, if there is no other way, a 64bit machine with lots of RAM would 
>> not be so contrained, but maybe this is a matter of first deciding why 
>> doing statistics on that much data is worth the effort? It may be, but 
>> just trying to read large amounts of data into memory is perhaps not 
>> justified in itself.
>>
>> Can you tile or subset the data, accumulating intermediate results? 
>> This is the approach the biglm package takes, and the R/GDAL interface 
>> also supports subsetting from an external file.
>>
>> Depending on the input format of the data, you should be able to do 
>> all you need provided that you do not try to keep all the data in 
>> memory. Using a database may be a good idea, or if the data are 
>> multiple remote sensing images, subsetting and accumulating results.
>>
>> Roger
>>
>>     
>>> --e
>>>
>>> _______________________________________________
>>> R-sig-Geo mailing list
>>> R-sig-Geo at stat.math.ethz.ch
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>
>>>       
>
>
>

_______________________________________________
R-sig-Geo mailing list
R-sig-Geo at stat.math.ethz.ch
https://stat.ethz.ch/mailman/listinfo/r-sig-geo




More information about the R-sig-Geo mailing list