[R-sig-Geo] memory Usage setting
Edzer J. Pebesma
e.pebesma at geo.uu.nl
Thu Sep 13 14:05:28 CEST 2007
Agus, not 100% sure but I believe the PAE kernel allows the OS to use
more than 4 Gb, but not more than 4 Gb for any single process (such as R
is) to be used. So, 2 R processes could each use max 4 Gb RAM without
swapping on an 8 Gb RAM machine with PAE kernel.
No, I installed a full 64-bit kernel, where the 4 Gb address space limit
is not present because pointers are 8 byte. It was as simple as anything
else; there's just some things that may not work (OpenOffice?) without a
large of effort. Since this was a server, it didn't matter.
--
Edzer
Agustin Lobo wrote:
> Any particular advice for setting up the kernel
> (or other things) for such a machine (i.e., the PAE kernel)?
>
> Agus
>
> Edzer J. Pebesma escribió:
>> I think R will never do it's own memory swapping, as that is a
>> typical OS task. There are however several developments (provided in
>> add-on packages) that will not load all data in memory at start-up,
>> but instead call some data base whenever a data element is needed.
>> You might search r-help for rsqlite or biglm, and there are others;
>> also look at the award winners at useR this year.
>>
>> Here, we've run pretty successful R sessions needing 10-11 Gb of
>> memory on a 8Gb RAM 64 bits linux machine with lots of swap space.
>> Needs some patience, and still R might crash other parts of the
>> system when memory usage becomes too excessive.
>>
>> Best regards,
>> --
>> Edzer
>>
>> Didier Leibovici wrote:
>>> Thanks Roger
>>>
>>> I feel we've got a low RAM machine which would need a bit of an
>>> uplift (recent server though)!
>>> The linux machine is unfortunately also with 4Gb of RAM
>>> But I persist to say it would be interesting to have within R a way
>>> of automatically performing swapping memory if needed ...
>>>
>>> Didier
>>>
>>> Roger Bivand wrote:
>>>
>>>> On Tue, 11 Sep 2007, elw at stderr.org wrote:
>>>>
>>>>
>>>>>> These days in GIS on may have to manipulate big datasets or arrays.
>>>>>>
>>>>>> Here I am on WINDOWS I have a 4Gb
>>>>>> my aim was to have an array of dim 298249 12 10 22 but that's 2.9Gb
>>>>>>
>>>> Assuming double precision (no single precision in R), 5.8Gb.
>>>>
>>>>
>>>>> It used to be (maybe still is?) the case that a single process
>>>>> could only
>>>>> 'claim' a chunk of max size 2GB on Windows.
>>>>>
>>>>>
>>>>> Also remember to compute overhead for R objects... 58 bytes per
>>>>> object, I
>>>>> think it is.
>>>>>
>>>>>
>>>>>
>>>>>> It is also strange that once a dd needed 300.4Mb and then 600.7Mb
>>>>>> (?) as
>>>>>> also I made some room in removing ZZ?
>>>>>>
>>>>> Approximately double size - many things the interpreter does involve
>>>>> making an additional copy of the data and then working with
>>>>> *that*. This
>>>>> might be happening here, though I didn't read your code carefully
>>>>> enough
>>>>> to be able to be certain.
>>>>>
>>>>>
>>>>>
>>>>>> which I don't really know if it took into account as the limit is
>>>>>> greater than the physical RAM of 4GB. ...?
>>>>>>
>>>>> :)
>>>>>
>>>>>
>>>>>> would it be easier using Linux ?
>>>>>>
>>>>> possibly a little bit - on a linux machine you can at least run a PAE
>>>>> kernel (giving you a lot more address space to work with) and have
>>>>> the
>>>>> ability to turn on a bit more virtual memory.
>>>>>
>>>>> usually with data of the size you're trying to work with, i try to
>>>>> find a
>>>>> way to preprocess the data a bit more before i apply R's tools to it.
>>>>> sometimes we stick it into a database (postgres) and select out
>>>>> the bits
>>>>> we want our inferences to be sourced from. ;)
>>>>>
>>>>> it might be simplest to just hunt up a machine with 8 or 16GB of
>>>>> memory in
>>>>> it, and run those bits of the analysis that really need memory on
>>>>> that
>>>>> machine...
>>>>>
>>>> Yes, if there is no other way, a 64bit machine with lots of RAM
>>>> would not be so contrained, but maybe this is a matter of first
>>>> deciding why doing statistics on that much data is worth the
>>>> effort? It may be, but just trying to read large amounts of data
>>>> into memory is perhaps not justified in itself.
>>>>
>>>> Can you tile or subset the data, accumulating intermediate results?
>>>> This is the approach the biglm package takes, and the R/GDAL
>>>> interface also supports subsetting from an external file.
>>>>
>>>> Depending on the input format of the data, you should be able to do
>>>> all you need provided that you do not try to keep all the data in
>>>> memory. Using a database may be a good idea, or if the data are
>>>> multiple remote sensing images, subsetting and accumulating results.
>>>>
>>>> Roger
>>>>
>>>>
>>>>> --e
>>>>>
>>>>> _______________________________________________
>>>>> R-sig-Geo mailing list
>>>>> R-sig-Geo at stat.math.ethz.ch
>>>>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>>>>
>>>>>
>>>
>>>
>>
>> _______________________________________________
>> R-sig-Geo mailing list
>> R-sig-Geo at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/r-sig-geo
>>
>
More information about the R-sig-Geo
mailing list