[R-sig-hpc] What to Experiment With?]

Sat Apr 21 06:56:13 CEST 2012

I really agree with Simon's points.

The issue of doing parallel processing to circumvent RAM limitations is,
as Simon points out, not generally known.  Of course, it's still a
difficult problem even on multiple machines.

Regarding GPUs, I not only agree that they are not quite ready for prime
time, but would also raise the possibility that they may never be, at
least not in their current form.  In that form, the applications that
are amenable to dramatic speedup are pretty narrow, already a problem,
but also there is the issue of whether the current NVIDIA architecture
will stay dominant.  

There's a lot of speculation as to whether Intel's Knight's Ferry chip
will become big once it emerges (if it does).  The center of gravity on
GPGPU could shift to Intel, and with it the scope of applications that
can be done well on a GPU.  

Plus, there are issues as to how the standard way to program these
things will evolve.  I'll go out on a limb here and speculate that if
CUDA ceases to become dominant (because NVIDIA does, or vice versa),
OpenCL will not be the one to take its place; instead, OpenACC, a new
API set that is supposedly going to be folded in to OpenMP, will become
the new standard, I'm guessing.

So GPU is really up in the air right now, I think.  I'd be curious to
see what others think about this.

Norm Matloff

----- Forwarded message from Simon Urbanek <simon.urbanek at r-project.org> -----

Date: Sat, 21 Apr 2012 00:28:34 -0400
From: Simon Urbanek <simon.urbanek at r-project.org>
To: ivo welch <ivowel at gmail.com>
Cc: r-sig-hpc at r-project.org
Subject: Re: [R-sig-hpc] What to Experiment With?
X-Mailer: Apple Mail (2.1084)

I have just one general comment (I don't find linear models useful for large data in my line of work so I can't comment on that specific use): one often overlooked fact is that probably the most important part in parallel computing (in particular with R) is the amount of RAM. If you start any kind of explicit parallelization, you need several times more memory just for the computation than what you needed for one R process (even if you save memory using shared data with COW in multicore). For example my desktop is a 16 HT Xenon, but it has only 16GB of memory, so parallelizing even very simple tasks often leads to swapping since you have less than 1GB of private working memory per process. If you load a machine with many cores, you must also account for the additional memory that you have to put in and that can raise the cost significantly because cheaper machines can't hold much memory. I think the result is that you can either get few machines with enough memory and speed !

 but not necessarily many cores or one big machine with many cores and lots of memory (but $5k may be just a little thin for the latter).

With the small data sizes you mentioned I would not worry about networking too much - it is easy to pre-load the data on all machines, disk is cheap and a few GB is really tiny even if you need to send it across network the for some reason.

As a side note re. GPUs - I'm still not convinced that they are in the ballpark yet - in my tests so far only Teslas give significant boost to DP computation and they are still way too expensive for the performance compared to CPUs. The future may lie therein, but IMHO the prices need to go down a bit more.