[R] x86 SSE* Pointer Favors
Ivan Adzhubey
iadzhubey at rics.bwh.harvard.edu
Fri Jun 13 08:42:41 CEST 2008
Hi Ivo,
On Friday 13 June 2008 12:23:06 am ivo welch wrote:
> Dear Statisticians--- This is not even an R question, so please
> forgive me. I have so much ignorance in this matter that I do not
> know where to begin. I hope someone can point me to documentation
> and/or a sample.
You will sure find some answers to your questions if you look into
R-admin.html file under "Building from source" section. Do a search on BLAS
and you will be presented with some options. Using a bit of R web site search
on the same keyword will give you even more food for thought.
> I want to compute a covariance as quickly as non-humanly possible on
> an Intel core processor (up to SSE4) under linux. Alas, I have no
> idea how to engage CPU vectorization. Do I need to use special data
> types, or is "double" correct? Does SSE* understand NaN? Should I
> rely on gcc autodetection of the vectorized meaning of my code, or are
> there specific libraries that I should call?
I use Goto BLAS library and it works great. Usually runs 3 to 30 times faster
than the stock R BLAS library, depending on your code. Enabling SSE
instructions in addition while building R (yes, you have to enable them
explicitly, see man gcc) is possible but does not help much since all maths
is mostly done in BLAS.
That said, optimized BLAS libraries give most speed increase with older
processors. Newer crop of multi-core CPUs with large shared caches is much
more difficult to hand-tune code for. You may want to subscribe to Goto BLAS
mailing list for an in-depth discussion. ATLAS community is also very helpful
(I use their code with our AMD CPUs).
> What I want to learn about is as simple as it gets:
> typedef double Double; // or whatever SSE* needs as close equivalent
> Double vector1[N], vector2[N];
> // then fill them with stuff.
R does not have types, everything that does not look like character string or
an integer is treated as double. All arithmetics are always done in double
precision.
> vector3= vector_mult(vector1,vector2, N);
> vector4= sum(vector1, N);
>
> I just need a pointer and/or primer. PS: If someone knows of a
> superfast vectorized implementation of Gentleman's WLS algorithm,
> please point me to it, too. I am still using my old non-vectorized C
> routines.
HTH,
Ivan
More information about the R-help
mailing list