[Rd] Compiling R for the Sony Playstation 3?
Marcus G. Daniels
mgd at santafe.edu
Fri Aug 3 20:57:09 CEST 2007
Hi,
>> Beyond that, there may be a few more things that can be done to make R run
>> "stupidly fast" on ps3 or IBM Cell blades.
>>
>>
>
> Wouldn't the right way to go here be to make it use the PS3 graphics
> hardware, in a http://www.gpgpu.org/ kind of way? Or are the Cell
> processors on the PS3 graphics processors too?
>
The Cell can be thought as a mini cluster on a chip. It uses messaging
along the lines of the way one might program a distributed application
using MPI, or organize a program for remote procedure calls. However
the latency is about 5 nanoseconds instead of 5 microseconds (as one
might hope to get with typical high performance networking fabrics).
Applications for the Cell are typically multithreaded on the PPU
controlling synchronous, non-timeshared activity on the SPUs.
The PS3 primary processor, the Celll PPU, is on the same silicon as the
coprocessors, the SPUs. The PPU is basically like what's in a Mac G5.
The SPUs have less smarts in terms of out of order execution lookahead
and they have only 256K local store. Messaging is done using a DMA
controller, and there are some C routines for that. Newer versions of
the GNU toolchain have an overlay manager so that the 256K localstore
can be automatically managed for the most part.
The SPUs instruction set is new, but GCC has a cross compiler that works
fine for it.
I think a lot of the work to make R take advantage of the Cell would be
pretty general, e.g. localizing the scope of operations and making
operations multithreaded... It's desirable to keep computations on the
local store as much as possible, but it doesn't seem to be crucial.
The messaging is extremely fast. It's almost like worrying about
processor affinity on a SMP system. Also, there's more need to profile
and optimize the operations on the SPUs as they are dumb compared to a
modern microprocessor (e.g. by explicit prefetching)
One nuisance with the PS3 itself (as opposed to IBM blades) is the
limited RAM. There's only 256MB. It's pretty painful to bootstrap
GCC, for example.
The RAM itself (Rambus XDR) is several times faster latency-wise than
DDR2.
Marcus
More information about the R-devel
mailing list