[R-sig-hpc] What to Experiment With?

Sat Apr 21 07:03:16 CEST 2012

thx, simon.  this is very helpful.  ok, let me scratch the GPU.  this
eliminates half the decision tree.  you are right that networking is
not critical.

for swap, an SSD should do very nicely these days.  swapping used to
be very costly, but I suspect that in the age of 500MB/s SSDs, this is
no longer true.  in any case, an X79 motherboard with 8 ram slots
costs about $250.  8GB costs about $50, so getting 64GB is about $400.
   the i7 Sandy Bridge is another $400.  drives (SSD + HD) another
$250.  so, around $1,500 per computer.   3 of those seem like a good
idea.

to do this, though, I also still need to find a simple example of
socket snow-type use of library(parallel).  I posted a question on
plain r-help.  see it as a suggestion of something to add to the
vignette or to ?parallel .  and thanks for parallel.  it's great.

is standard ubuntu linux R and its main libraries now compiled with
Intel AVX?  I hear this can double or triple the vector performance
vs. SSE.

A long-run suggestion: with parallel in the core R, and Whit's cloud
interface, we may soon be able to build a shared R community cluster.
see, I wish I could rent my own computers out to an R community
cluster when I do not need them, not in exchange for money but in
exchange for computation credits when I need to use this R community
cluster to do calculations myself.

best,

/iaw

On Fri, Apr 20, 2012 at 9:28 PM, Simon Urbanek
<simon.urbanek at r-project.org> wrote:
> I have just one general comment (I don't find linear models useful for large data in my line of work so I can't comment on that specific use): one often overlooked fact is that probably the most important part in parallel computing (in particular with R) is the amount of RAM. If you start any kind of explicit parallelization, you need several times more memory just for the computation than what you needed for one R process (even if you save memory using shared data with COW in multicore). For example my desktop is a 16 HT Xenon, but it has only 16GB of memory, so parallelizing even very simple tasks often leads to swapping since you have less than 1GB of private working memory per process. If you load a machine with many cores, you must also account for the additional memory that you have to put in and that can raise the cost significantly because cheaper machines can't hold much memory. I think the result is that you can either get few machines with enough memory and speed but not necessarily many cores or one big machine with many cores and lots of memory (but $5k may be just a little thin for the latter).
>
> With the small data sizes you mentioned I would not worry about networking too much - it is easy to pre-load the data on all machines, disk is cheap and a few GB is really tiny even if you need to send it across network the for some reason.
>
> As a side note re. GPUs - I'm still not convinced that they are in the ballpark yet - in my tests so far only Teslas give significant boost to DP computation and they are still way too expensive for the performance compared to CPUs. The future may lie therein, but IMHO the prices need to go down a bit more.
>
>
> On Apr 20, 2012, at 6:16 PM, ivo welch wrote:
>
>> Dear R HPC experts:
>>
>> I have about $5,000 to spend on building fast computer hardware to run
>> our problems.  if it works well, I may be able to scrounge up another
>> $10k/year to scale it up.  I do not have the resources to program very
>> complex algorithms, administer a full cluster, etc.  (the effective
>> programmer's rate here is about $50/hour and up, and I have severe
>> restrictions against hiring outsiders.)  the programs basically have
>> to work with minimum special tweaking.
>>
>> There are no real-time needs.  Typically, I operate on historical CRSP
>> and Compustat data, which are about 1-5GB (depending on subset).  most
>> of what I am doing involves linear regressions.  I often need to
>> calculate Newey-West/Hansen-Hodrick/White adjusted standard errors,
>> and I often do need to sort and rank, calculate means and covariances.
>> these are not highly sophisticated stats, but it entails lots of it.
>> most of what I do is embarrassingly parallel.
>>
>> Now, I think in the $5k price range, I have a couple of options.
>> Roughly, the landscape seems to be:
>>
>> * 1 dual-socket xeon i7 computers.
>> * 5 (desktop) i7 computers, networked (socket snow?).
>> * 1 i7 computer, with 1 nvidia Tesla card
>> * 1 i7 computers with 2-3 commodity graphics cards
>>     --- apparently, nvidia cripples the DP performance of its gamer
>> cards, so AMD should be a *lot* faster
>>     at the same price, but I only see the lm() routine in
>> nvidia-specific gputools.  then again, for Newey-West,
>>     I may have to resort to my own calculations, anyway.  is there
>> newey-west OLS code for AMD GPUs?
>>
>> I would presume that an internal PCI bus is a lot faster than an
>> ethernet network, and a GPU could be faster than a CPU, but a GPU is
>> also less flexible.  Sigh...not sure.
>> what should I try?
>>
>> /iaw
>>
>>
>> ----
>> Ivo Welch (ivo.welch at gmail.com)
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
>