[R-sig-hpc] What to Experiment With?

Sat Apr 21 06:28:34 CEST 2012

I have just one general comment (I don't find linear models useful for large data in my line of work so I can't comment on that specific use): one often overlooked fact is that probably the most important part in parallel computing (in particular with R) is the amount of RAM. If you start any kind of explicit parallelization, you need several times more memory just for the computation than what you needed for one R process (even if you save memory using shared data with COW in multicore). For example my desktop is a 16 HT Xenon, but it has only 16GB of memory, so parallelizing even very simple tasks often leads to swapping since you have less than 1GB of private working memory per process. If you load a machine with many cores, you must also account for the additional memory that you have to put in and that can raise the cost significantly because cheaper machines can't hold much memory. I think the result is that you can either get few machines with enough memory and speed but not necessarily many cores or one big machine with many cores and lots of memory (but $5k may be just a little thin for the latter).

With the small data sizes you mentioned I would not worry about networking too much - it is easy to pre-load the data on all machines, disk is cheap and a few GB is really tiny even if you need to send it across network the for some reason.

As a side note re. GPUs - I'm still not convinced that they are in the ballpark yet - in my tests so far only Teslas give significant boost to DP computation and they are still way too expensive for the performance compared to CPUs. The future may lie therein, but IMHO the prices need to go down a bit more.

On Apr 20, 2012, at 6:16 PM, ivo welch wrote:

> Dear R HPC experts:
> 
> I have about $5,000 to spend on building fast computer hardware to run
> our problems.  if it works well, I may be able to scrounge up another
> $10k/year to scale it up.  I do not have the resources to program very
> complex algorithms, administer a full cluster, etc.  (the effective
> programmer's rate here is about $50/hour and up, and I have severe
> restrictions against hiring outsiders.)  the programs basically have
> to work with minimum special tweaking.
> 
> There are no real-time needs.  Typically, I operate on historical CRSP
> and Compustat data, which are about 1-5GB (depending on subset).  most
> of what I am doing involves linear regressions.  I often need to
> calculate Newey-West/Hansen-Hodrick/White adjusted standard errors,
> and I often do need to sort and rank, calculate means and covariances.
> these are not highly sophisticated stats, but it entails lots of it.
> most of what I do is embarrassingly parallel.
> 
> Now, I think in the $5k price range, I have a couple of options.
> Roughly, the landscape seems to be:
> 
> * 1 dual-socket xeon i7 computers.
> * 5 (desktop) i7 computers, networked (socket snow?).
> * 1 i7 computer, with 1 nvidia Tesla card
> * 1 i7 computers with 2-3 commodity graphics cards
>     --- apparently, nvidia cripples the DP performance of its gamer
> cards, so AMD should be a *lot* faster
>     at the same price, but I only see the lm() routine in
> nvidia-specific gputools.  then again, for Newey-West,
>     I may have to resort to my own calculations, anyway.  is there
> newey-west OLS code for AMD GPUs?
> 
> I would presume that an internal PCI bus is a lot faster than an
> ethernet network, and a GPU could be faster than a CPU, but a GPU is
> also less flexible.  Sigh...not sure.
> what should I try?
> 
> /iaw
> 
> 
> ----
> Ivo Welch (ivo.welch at gmail.com)
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
>