[Rd] Reducing RAM usage using UKSM

Wed Jul 16 15:07:14 CEST 2014

Hi Varadharajan,

Linux uses a copy-on-write for the memory image of forked processes.  Thus, you may also get significant memory savings by launching a single R process, loading your large data object, and then using fork::fork() to split off the other worker process.     

-Greg

Sent from my iPad

> On Jul 16, 2014, at 5:07 AM, Varadharajan Mukundan <srinathsmn at gmail.com> wrote:
> 
> [Sending it again in plain text mode]
> 
> Greetings,
> 
> We've a fairly large dataset (around 60GB) to be loaded and crunched
> in real time. The kind of data operations that will be performed on
> this data are simple read only aggregates after filtering the
> data.table instance based on the parameters that will passed in real
> time. We need to have more than one instance of such R process running
> to serve different testing environments (each testing environment has
> fairly identical dataset but do have a *small amount of changes*). As
> we all know, data.table loads the entire dataset into memory for
> processing and hence we are facing a constraint on number of such
> process that we could run on the machine. On a 128GB RAM machine, we
> are coming up with ways in which we could reduce the memory footprint
> so that we can try to spawn more instances and use the resources
> efficiently. One of the approaches we tried out was memory
> de-duplication using UKSM
> (http://kerneldedup.org/en/projects/uksm/introduction), given that we
> did have few idle cpu cores. Outcome of the experiment was quite
> impressive, considering that the effort to set it up was quite less
> and the entire approach considers the application layer as a black
> box.
> 
> Quick snapshot of the results:
> 1 Instance (without UKSM): ~60GB RAM was being used
> 1 Instance (with UKSM): ~53 GB RAM was being used
> 
> 2 Instance (without UKSM): ~125GB RAM was being used
> 2 Instance (with UKSM): ~81 GB RAM was being used
> 
> We can see that around 44 GB of RAM was saved after UKSM merged
> similar pages and all this for  a compromise of 1 CPU core on a 48
> core machine. We did not feel any noticeable degradation of
> performance because the data is refreshed by a batch job only once
> (every morning); UKSM gets in at this time and performs the same page
> merging and for the rest of day, its just read only analysis. The kind
> of queries we fire on the dataset at most scans 2-3GB of the entire
> dataset and hence the query subset spike was low as well.
> 
> We're interested in knowing if this is a plausible solution to this
> problem? Any other points/solutions that we should be considering?
> 
> On Tue, Jul 15, 2014 at 9:25 PM, Varadharajan Mukundan
> <srinathsmn at gmail.com> wrote:
>> Greetings,
>> 
>> We've a fairly large dataset (around 60GB) to be loaded and crunched in real
>> time. The kind of data operations that will be performed on this data are
>> simple read only aggregates after filtering the data.table instance based on
>> the parameters that will passed in real time. We need to have more than one
>> instance of such R process running to serve different testing environments
>> (each testing environment has fairly identical dataset but do have a *small
>> amount of changes*). As we all know, data.table loads the entire dataset
>> into memory for processing and hence we are facing a constraint on number of
>> such process that we could run on the machine. On a 128GB RAM machine, we
>> are coming up with ways in which we could reduce the memory footprint so
>> that we can try to spawn more instances and use the resources efficiently.
>> One of the approaches we tried out was memory de-duplication using UKSM
>> (http://kerneldedup.org/en/projects/uksm/introduction), given that we did
>> have few idle cpu cores. Outcome of the experiment was quite impressive,
>> considering that the effort to set it up was quite less and the entire
>> approach considers the application layer as a black box.
>> 
>> Quick snapshot of the results:
>> 1 Instance (without UKSM): ~60GB RAM was being used
>> 1 Instance (with UKSM): ~53 GB RAM was being used
>> 
>> 2 Instance (without UKSM): ~125GB RAM was being used
>> 2 Instance (with UKSM): ~81 GB RAM was being used
>> 
>> We can see that around 44 GB of RAM was saved after UKSM merged similar
>> pages and all this for  a compromise of 1 CPU core on a 48 core machine. We
>> did not feel any noticeable degradation of performance because the data is
>> refreshed by a batch job only once (every morning); UKSM gets in at this
>> time and performs the same page merging and for the rest of day, its just
>> read only analysis. The kind of queries we fire on the dataset at most scans
>> 2-3GB of the entire dataset and hence the query subset spike was low as
>> well.
>> 
>> We're interested in knowing if this is a plausible solution to this problem?
>> Any other points/solutions that we should be considering?
>> 
>> --
>> Thanks,
>> M. Varadharajan
>> 
>> ------------------------------------------------
>> 
>> "Experience is what you get when you didn't get what you wanted"
>>              -By Prof. Randy Pausch in "The Last Lecture"
>> 
>> My Journal :- www.thinkasgeek.wordpress.com
> 
> 
> 
> -- 
> Thanks,
> M. Varadharajan
> 
> ------------------------------------------------
> 
> "Experience is what you get when you didn't get what you wanted"
>              -By Prof. Randy Pausch in "The Last Lecture"
> 
> My Journal :- www.thinkasgeek.wordpress.com
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel