[R-SIG-Mac] restricting the number of threads used on a dual-core Intel machine

Douglas Bates bates at stat.wisc.edu
Wed Mar 8 15:51:50 CET 2006


A few days ago I wrote to r-devel about a curious timing result where
a dual-core Athlon 64 was much slower than a single core Athlon 64 on
the same task in R.  I noticed that the accelerated BLAS, either
Goto's BLAS or ACML (AMD Core Mathematics Library) were using two
threads on the dual-core machine.  It turns out that multithreading
was the cause of the slow performance of R on this task.  Setting
OMP_NUM_THREADS=1 in the environment slows down a BLAS-bound
calculation but gives much faster performance on the R task.  The use
of this environment variable is mentioned in an appendix of the R
Installation and Administration manual.

So my question is how does one set this environment variable for the R
Console on an Intel Mac or even R running in a terminal on an Intel
Mac?  I tried setting OMP_NUM_THREADS=1 in the environment before
running R in a terminal on a Mac but that did not seem to have an
effect.

To check whether you are using multiple threads you can run

mm <- matrix(rnorm(1e6), ncol = 1000)
for (i in 1:10) print(system.time(crossprod(mm)))

I do the timing multiple times because sometimes it will only use 1
thread for the first few cases then switch to multiple threads.  If
the elapsed time (third element of the timing result) is less than the
user time (first element) you are using multiple threads.  For example

> for (i in 1:30) print(system.time(crossprod(mm)))
[1] 0.65 0.02 0.35 0.00 0.00
[1] 0.65 0.03 0.36 0.00 0.00
[1] 0.65 0.03 0.35 0.00 0.00
[1] 0.65 0.02 0.35 0.00 0.00
[1] 0.65 0.02 0.35 0.00 0.00
[1] 0.66 0.03 0.35 0.00 0.00
[1] 0.65 0.02 0.35 0.00 0.00
[1] 0.65 0.03 0.36 0.00 0.00
[1] 0.65 0.03 0.35 0.00 0.00
[1] 0.66 0.02 0.35 0.00 0.00
[1] 0.65 0.03 0.36 0.00 0.00
[1] 0.65 0.03 0.36 0.00 0.00
[1] 0.66 0.02 0.35 0.00 0.00
[1] 0.66 0.03 0.35 0.00 0.00
[1] 0.65 0.03 0.36 0.00 0.00

To see that this slows down some computations install the Matrix and
mlmRev packages and try

library(Matrix)
data(star, package = 'mlmRev')
system.time(fm1 <-
lmer(math~sx+eth+gr+cltype+(yrs|id)+(1|tch)+(yrs|sch),star,control=list(nit=0,grad=0,msV=1)))

The iterations should converge around

 37      238799.:  3.01178 0.134283  1.48933 0.701769 0.303707 0.134235  1.84660
 38      238799.:  3.01173 0.134308  1.48939 0.701726 0.303810 0.134202  1.84648

and give a timing like

[1] 119.86 165.42 285.71   0.00   0.00

The very large system time is indicative of problems with multiple threads.

I got a similar result on the dual-core Athlon 64.  After setting the
number of threads to 1 the timing is

[1] 34.74  2.48 37.22  0.00  0.00



More information about the R-SIG-Mac mailing list