[Rd] Q: R 2.2.1: Memory Management Issues?
Simon Urbanek
simon.urbanek at r-project.org
Fri Jan 6 02:38:31 CET 2006
On Jan 5, 2006, at 7:33 PM, <Karen.Green at sanofi-aventis.com>
<Karen.Green at sanofi-aventis.com> wrote:
> The empirically derived limit on my machine (under R 1.9.1) was
> approximately 7500 data points.
> I have been able to successfully run the script that uses package
> MCLUST on several hundred smaller data sets.
>
> I even had written a work-around for the case of greater than 9600
> data points. My work-around first orders the
> points by their value then takes a sample (e.g. every other point
> or 1 point every n points) in order to bring the number under
> 9600. No problems with the computations were observed, but you are
> correct that a deconvolution on that larger dataset of 9600 takes
> almost 30 minutes. However, for our purposes, we do not have many
> datasets over 9600 so the time is not a major constraint.
>
> Unfortunately, my management does not like using a work-around and
> really wants to operate on the larger data sets.
> I was told to find a way to make it operate on the larger data sets
> or avoid using R and find another solution.
Well, sure, if your only concern is the memory then moving to unix
will give you several hundred more data points you can use. I would
recommend a 64-bit unix preferably, because then there is
practically no software limit on the size of virtual memory.
Nevertheless there is still a limit of ca. 4GB for a single vector,
so that should give you around 32500 rows that mclust can handle as-
is (I don't want to see the runtime, though ;)). For anything else
you'll really have to think about another approach..
Cheers,
Simon
More information about the R-devel
mailing list