[R-sig-hpc] R-sig-hpc] How to configure R to work with GPUs transparently

Fri Mar 19 22:20:39 CET 2010

> I took Brad's posting to mean that he is proposing that many of the more
> computation-intensive R functions be extended, so that the code in lm(),
> say, would first check to see if a GPU and the GPU software are present.
> The code would then take different actions in the two cases (present and
> nonpresent).  This is in contrast to a situation in which any R code
> would automatically use GPUs, which as Dirk points out, is not possible.
>
> By the way, I should also add that while GPUs are great for the
> "embarrassingly parallel" applications, it's hard to make them work well
> for other kinds of parallel apps.

One of the goals of the "gputools" package is to learn just what roles  
this type of coprocessor can play in R.  We've found significant  
speedups (> 10x) in the "lm()" command, for example, but please note a  
couple of caveats:

i) This compares a double-precision CPU implementation with a  
single-precision GPU implementation.

ii) Data-transfer rates to and from the card are much slower than  
to/from RAM.  In particular, dimensions greater than about 1000x1000  
are required just to see break-even in performance.

As for the first problem, it should be noted that much faster  
double-precision arithmetic will be supported in future generations of  
GPU.  For now, though, the more dramatic speedups are limited to  
single-precision implementations.

As for the second problem, the flip side is that you can now call,  
say, "lm()" on much larger matrices and get an answer in something  
approaching acceptable "user time".  If you need to call "lm()" on  
smaller matrices many times in a loop, however, a GPU probably will  
not buy you much unless you do the added work of implementing the  
enclosing loop in the GPU.

More directly to the points you raise, though:

Right now GPU support for R is limited to a small subset of  
command-level implementations.  These ports are somewhat hard to  
implement but, once implemented, they may benefit a large number of  
users.  They are also relatively easy to maintain as newer, more  
powerful GPU hardware becomes available.

It is true that the GPU really shines on embarrassingly-parallel  
applications and that communication and synchronization costs subtract  
from their potential.  The same could be said of a parallel cluster or  
a multicore chip, though.  If "someone else" has done the work to  
provide the tools to make them useful, though, it seems that the GPU,  
like these other types of hardware, may have a long-term niche to fill.

Ultimately (pie in the sky?) it would be nice if R itself sniffed out  
the user's resident hardware and pulled in libraries built to take  
advantage of that particular configuration.  In other words, the  
details of mapping command to library are kept hidden from the user.   
Nicer still if, at run time, R could choose the library into which to  
call based on the characteristics of the data - e.g., scalar for a  
certain size of matrix, GPU for a very large matrix, some mix of GPU  
and multicore for an interative problem with a large matrix.  This is  
a really tough thing to get right, though.