[R] Best Hardware & OS For Large Data Sets
Allan Engelhardt
allane at cybaea.com
Sun Feb 28 09:59:50 CET 2010
On 27/02/10 17:47, J. Daniel wrote:
> Greetings,
>
> I am acquiring a new computer in order to conduct data analysis. I
> currently have a 32-bit Vista OS with 3G of RAM and I consistently run into
> memory allocation problems. I will likely be required to run Windows 7 on
> the new system, but have flexibility as far as hardware goes. Can people
> recommend the best hardware to minimize memory allocation problems? I am
> leaning towards dual core on a 64-bit system with 8G of RAM. Given the
> Windows constraint, is there anything I am missing here?
>
> I know that Windows limits the RAM that a single application can access.
> Does this fact over-ride many hardware considerations? Any way around this?
>
You are right on the RAM limit: the way around it is to move to 64-bit
operating system.There is an experimental build of core R for 64-bit
windows [1] and there is at least one commercial version available [2].
(You can run the 32 bit version of R on 64-bit Windows, but it will
only use up to 3.5G of memory [3].) How much memory you should have
really depends on your data sets and what you do. I have 16G on my
4-core workstation and frequently use it up, but I do marketing analysis
on tens of millions of telco customers. I overflow to AWS which has
instances with 7.5G, 15G, 17G, 34G, and 68G memory [4] which you may
consider as guides for your system(s).
I would reconsider the operating system constraint. A Unix-like 64-bit
operating system (I'm a Fedora guy but anything should work well) may
be a better long term solution and is likely to give you more easy
access to cloud computing (e.g. AWS or your own cluster) when your
processing requirements grow. Also 64 bit seems to be better supported
in that environment.
In all instances you are still going to be constrained by R limiting a
vector to 2^31-1 elements and, worse, representing a matrix as a vector
which means the product of the dimensions is limited to 2^31-1. What
you gain is the ability to have many more <2^31-1 vectors available.
Hope this helps a little
Allan
[1] http://cran.r-project.org/bin/windows64/contrib/
[2] http://www.revolution-computing.com/
[3] See FAQ 2.9 at http://cran.r-project.org/bin/windows/base/rw-FAQ.html
[4] http://aws.amazon.com/ec2/instance-types/
> Thanks,
>
> JD
>
>
>
More information about the R-help
mailing list