[Rd] Q: R 2.2.1: Memory Management Issues?

Fri Jan 6 02:38:31 CET 2006

On Jan 5, 2006, at 7:33 PM, <Karen.Green at sanofi-aventis.com>  
<Karen.Green at sanofi-aventis.com> wrote:

> The empirically derived limit on my machine (under R 1.9.1) was  
> approximately 7500 data points.
> I have been able to successfully run the script that uses package  
> MCLUST on several hundred smaller data sets.
>
> I even had written a work-around for the case of greater than 9600  
> data points.  My work-around first orders the
> points by their value then takes a sample (e.g. every other point  
> or 1 point every n points) in order to bring the number under  
> 9600.  No problems with the computations were observed, but you are  
> correct that a deconvolution on that larger dataset of 9600 takes  
> almost 30 minutes.  However, for our purposes, we do not have many  
> datasets over 9600 so the time is not a major constraint.
>
> Unfortunately, my management does not like using a work-around and  
> really wants to operate on the larger data sets.
> I was told to find a way to make it operate on the larger data sets  
> or avoid using R and find another solution.

Well, sure, if your only concern is the memory then moving to unix  
will give you several hundred more data points you can use. I would  
recommend a  64-bit unix preferably, because then there is  
practically no software limit on the size of virtual memory.  
Nevertheless there is still a limit of ca. 4GB for a single vector,  
so that should give you around 32500 rows that mclust can handle as- 
is (I don't want to see the runtime, though ;)). For anything else  
you'll really have to think about another approach..

Cheers,
Simon