[R-sig-Geo] memory Usage setting

Edzer J. Pebesma e.pebesma at geo.uu.nl
Thu Sep 13 14:05:28 CEST 2007


Agus, not 100% sure but I believe the PAE kernel allows the OS to use
more than 4 Gb, but not more than 4 Gb for any single process (such as R
is) to be used. So, 2 R processes could each use max 4 Gb RAM without
swapping on an 8 Gb RAM machine with PAE kernel.

No, I installed a full 64-bit kernel, where the 4 Gb address space limit
is not present because pointers are 8 byte. It was as simple as anything
else; there's just some things that may not work (OpenOffice?) without a
large of effort. Since this was a server, it didn't matter.
--
Edzer

Agustin Lobo wrote:
> Any particular advice for setting up the kernel
> (or other things) for such a machine (i.e., the PAE kernel)?
>
> Agus
>
> Edzer J. Pebesma escribió:
>> I think R will never do it's own memory swapping, as that is a 
>> typical OS task. There are however several developments (provided in 
>> add-on packages) that will not load all data in memory at start-up, 
>> but instead call some data base whenever a data element is needed. 
>> You might search r-help for rsqlite or biglm, and there are others; 
>> also look at the award winners at useR this year.
>>
>> Here, we've run pretty successful R sessions needing 10-11 Gb of 
>> memory on a 8Gb RAM 64 bits linux machine with lots of swap space. 
>> Needs some patience, and still R might crash other parts of the 
>> system when memory usage becomes too excessive.
>>
>> Best regards,
>> -- 
>> Edzer
>>
>> Didier Leibovici wrote:
>>> Thanks Roger
>>>
>>> I feel we've got a low RAM machine which would need a bit of an 
>>> uplift (recent server though)!
>>> The linux machine is unfortunately also with 4Gb of RAM
>>> But  I persist to say it would be interesting to have within R a way 
>>> of automatically performing swapping memory if needed ...
>>>
>>> Didier
>>>
>>> Roger Bivand wrote:
>>>  
>>>> On Tue, 11 Sep 2007, elw at stderr.org wrote:
>>>>
>>>>    
>>>>>> These days in GIS on may have to manipulate big datasets or arrays.
>>>>>>
>>>>>> Here I am on WINDOWS I have a 4Gb
>>>>>> my aim was to have an array of dim 298249 12 10 22 but that's 2.9Gb
>>>>>>         
>>>> Assuming double precision (no single precision in R), 5.8Gb.
>>>>
>>>>    
>>>>> It used to be (maybe still is?) the case that a single process 
>>>>> could only
>>>>> 'claim' a chunk of max size 2GB on Windows.
>>>>>
>>>>>
>>>>> Also remember to compute overhead for R objects... 58 bytes per 
>>>>> object, I
>>>>> think it is.
>>>>>
>>>>>
>>>>>      
>>>>>> It is also strange that once a dd needed 300.4Mb and then 600.7Mb 
>>>>>> (?) as
>>>>>> also I made some room in removing ZZ?
>>>>>>         
>>>>> Approximately double size - many things the interpreter does involve
>>>>> making an additional copy of the data and then working with 
>>>>> *that*.  This
>>>>> might be happening here, though I didn't read your code carefully 
>>>>> enough
>>>>> to be able to be certain.
>>>>>
>>>>>
>>>>>      
>>>>>> which I don't really know if it took into account as the limit is
>>>>>> greater than the physical RAM of 4GB. ...?
>>>>>>         
>>>>> :)
>>>>>
>>>>>      
>>>>>> would it be easier using Linux ?
>>>>>>         
>>>>> possibly a little bit - on a linux machine you can at least run a PAE
>>>>> kernel (giving you a lot more address space to work with) and have 
>>>>> the
>>>>> ability to turn on a bit more virtual memory.
>>>>>
>>>>> usually with data of the size you're trying to work with, i try to 
>>>>> find a
>>>>> way to preprocess the data a bit more before i apply R's tools to it.
>>>>> sometimes we stick it into a database (postgres) and select out 
>>>>> the bits
>>>>> we want our inferences to be sourced from.  ;)
>>>>>
>>>>> it might be simplest to just hunt up a machine with 8 or 16GB of 
>>>>> memory in
>>>>> it, and run those bits of the analysis that really need memory on 
>>>>> that
>>>>> machine...
>>>>>       
>>>> Yes, if there is no other way, a 64bit machine with lots of RAM 
>>>> would not be so contrained, but maybe this is a matter of first 
>>>> deciding why doing statistics on that much data is worth the 
>>>> effort? It may be, but just trying to read large amounts of data 
>>>> into memory is perhaps not justified in itself.
>>>>
>>>> Can you tile or subset the data, accumulating intermediate results? 
>>>> This is the approach the biglm package takes, and the R/GDAL 
>>>> interface also supports subsetting from an external file.
>>>>
>>>> Depending on the input format of the data, you should be able to do 
>>>> all you need provided that you do not try to keep all the data in 
>>>> memory. Using a database may be a good idea, or if the data are 
>>>> multiple remote sensing images, subsetting and accumulating results.
>>>>
>>>> Roger
>>>>
>>>>    
>>>>> --e
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>       
>>>
>>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>




More information about the R-sig-Geo mailing list