[R] R on Large Data Sets (again)
Duncan Murdoch
murdoch at stats.uwo.ca
Sun Nov 29 14:55:39 CET 2009
On 28/11/2009 6:53 PM, Lars Bishop wrote:
> Dear R users,
>
> I’ve search the R site for help on this topic but it is hard to find a
> precise answer for my questions.
>
> Which are the best options to overcome the RAM memory limitation problems
> when using R on “large” data sets (such as 2 or 3 million records)?
There are several packages for handling datasets without keeping them in
RAM: bigmemory, ff, etc. You may find that you need to write functions
to handle your data a block at a time, or you may find they have already
been written, e.g. biglm. You can also keep your data in a database and
just retrieve it a block at a time for processing.
>
> - Is the free available version of R (as opposed to the one
> provided by REvolution Computing) compatible with a windows 64-bit machine?
> And if I increase the RAM memory enough on win-64, would this virtually
> solve my memory limitation problems?
It is compatible with Win64, but it is a 32 bit application. It
benefits from running on 64 bit Windows (because Windows can get out of
the way and give it most of 4 GB to work in), but not as much as a true
64 bit application. So it probably doesn't solve your problem.
> - Is a Unix-like platform a better option than win-64? Again, would
> this solve my memory limitation problems?
There are builds available for 64 bit Linux and MacOS (and maybe
others); they'd likely help more than running 32 bit R in Win64. I
don't know how they compare to running Revolution's 64 bit R in Win64.
Duncan Murdoch
>
>
>
> - Any better option?
> Thanks in advance for your help,
> Lars.
>
> [[alternative HTML version deleted]]
>
>
>
> ------------------------------------------------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list