[R-sig-hpc] Trying to change OPENBLAS_NUM_THREADS from within R

Claudia Beleites claudia.beleites at ipht-jena.de
Mon Jul 2 18:39:37 CEST 2012


Dear Simons,

I gave it a quick try, moving the random matrix generation outside the
benchmarked code (i.e. looking at crossprod only).

I have R 2.15.0 on 2 different OSs (one Ubuntu 12.04, one CentOS 5). On
both I don't seem to be able to  change the number of used threads with
the environment variables.

The Ubuntu is stuck to using 2 cores, the CentOS uses 6. So I think
there's more going on here.

here's the code, it can be called rather conveniently with R CMD batch:

qq <- matrix(rnorm(25e6),5e3,5e3)

li<-list()
for(i in 1:12){
   li[[i]] <- matrix(rnorm(25e6),5e3)
}
library (microbenchmark)
system(sprintf('taskset -p 0xffffffff %d', Sys.getpid()))

timing <- microbenchmark(lapply(li,crossprod, qq),times=10)

cat (file = "timings.txt", timing$time, "\n", append = TRUE)

Best,

Claudia

Am 30.06.2012 17:19, schrieb Simon Urbanek:
> Simon,
> 
> what are you trying to do? In your example using func2 you're measuring the time it takes to generate the random matrix, so obviously you can't parallelize that and it won't be affected by OPENBLAS_NUM_THREADS...
> 
> Cheers,
> Simon
> 
> 
> 
> On Jun 30, 2012, at 9:37 AM, Simon Fuller wrote:
> 
>> Hello,
>>
>> I posted a couple of weeks ago about trying to change
>> OPENBLAS_NUM_THREADS from within R. I was on holidays since and have
>> not made much progress.
>>
>> To clarify my problem, I am aware that many people have had issues
>> combining implicit and explicit parallelization, and that there are
>> useful discussions on this issue already posted on the mailing list;
>> however I am experiencing performance problems with OPENBLAS for
>> functions that make no use of explicit parallelization.
>>
>> The problem arises for me in cases where a function that uses BLAS is
>> called in an apply statement.
>>
>> While any one such operation speeds up with an increase in
>> OPENBLAS_NUM_THREADS, this is not the case over all when e.g. under
>> lapply.
>>
>> I have some basic results for a crossprod below - individual
>> operations improve with added threads, but under lapply it flattens
>> out, and htop shows a lot of red bars at work on the processors
>> (indicating heavy system usage?). With other more complex functions
>> the performance deteriorates further and it is positively undesirable
>> to use more than 1 thread for OPENBLAS, but the results below should
>> be sufficient to illustrate the issue.
>>
>> It is because of this specific problem that I want to be able to
>> control the number of threads at runtime, and hence other previously
>> explained approaches are not applicable, at least as far as I can
>> tell. But any other suggestions on how to overcome the problem would
>> be very welcome.
>>
>> I intend to test on different systems and BLAS, but I was wondering
>> whether anyone had encountered this kind of problem before and if
>> there is a workaround, whether it is OPENBLAS specific, or indeed
>> whether it is specific to my system or processor (i7-2630QM).
>>
>> Thank you in advance. A basic illustration of the issue follows.
>>
>> Simon
>>
>> CS Dept
>> NUIM
>>
>> func2 <- function(x){
>>   qq <- matrix(rnorm(250000),500,500)
>>   return( crossprod(x,qq) )
>> }
>>
>> li<-list()
>> for(i in 1:500){
>>   li[[i]] <- matrix(rnorm(250000),500,500)
>> }
>>
>> #OPENBLAS_NUM_THREADS=1
>> microbenchmark(lapply(li,func2),times=10)
>> Unit: seconds
>>              expr      min       lq   median       uq     max
>> 1 lapply(li, func2) 23.04498 23.05062 23.07796 23.39843 32.7897
>>
>> microbenchmark(crossprod(li[[1]],li[[2]]))
>> Unit: milliseconds
>>                        expr      min       lq   median       uq      max
>> 1 crossprod(li[[1]], li[[2]]) 22.87143 23.32093 23.45628 24.34411 26.13419
>>
>>
>> #OPENBLAS_NUM_THREADS=2
>>
>> microbenchmark(lapply(li,func2),times=10)
>> Unit: seconds
>>              expr      min       lq  median       uq      max
>> 1 lapply(li, func2) 20.95075 22.29843 23.2581 23.71557 24.21765
>>
>> ## Clearly BLAS improves performance, but not with lapply
>>
>> microbenchmark(crossprod(li[[1]],li[[2]]))
>> Unit: milliseconds
>>                        expr      min       lq   median       uq      max
>> 1 crossprod(li[[1]], li[[2]]) 12.23434 13.25925 14.12305 14.54331 19.47039
>>
>>
>>
>> #OPENBLAS_NUM_THREADS=4
>>
>> microbenchmark(lapply(li,func2),times=10)
>> Unit: seconds
>>              expr     min       lq   median       uq      max
>> 1 lapply(li, func2) 19.0154 20.17587 22.27971 23.56876 24.40342
>>
>>
>> microbenchmark(crossprod(li[[1]],li[[2]]))
>> Unit: milliseconds
>>                        expr      min       lq   median       uq      max
>> 1 crossprod(li[[1]], li[[2]]) 7.301697 8.105116 8.346089 8.670551 10.41987
>>
>>
>>
>>
>>
>> On Mon, Jun 11, 2012 at 4:17 PM, Simon Fuller <simonfuller9 at gmail.com> wrote:
>>> Hello,
>>>
>>> I hope this is the relevant mailing list for my enquiry.
>>>
>>> I have googled the above point but have not been able to find a
>>> solution which works. I would be very grateful if anyone had any
>>> suggestions based on their own knowledge and experience.
>>>
>>> I have installed openblas and have it linked as my shared BLAS to R.
>>>
>>> So I can do "export OPENBLAS_NUM_THREADS = x" and then when I start R,
>>> I can run with x threads, no problem - the extra threads are visible
>>> in htop and there is speed improvement for crossprod etc.
>>>
>>> However, for some operations I find that the increased threads have a
>>> negative effect on my code.
>>>
>>> Therefore I would like to be able to change the number of threads from
>>> within the R session - the idea being that I can parcel certain
>>> functions along with an appropriate thread setting.
>>>
>>> However, e.g. Sys.setenv() does not work, probably not least because
>>> Sys.getenv() does not include the relevant variable(s) e.g.
>>> OPENBLAS_NUM_THREADS.
>>>
>>> I have also tried running a C function with getenv / setenv. Here,
>>> getenv can access the value of OPENBLAS_NUM_THREADS set before R was
>>> called, but setenv's changes have no effect on the operation of the
>>> BLAS, and they are lost when R is closed. I thought this might have to
>>> do with the way UNIX's child processes receive only copies of the
>>> parent's variables, and that the BLAS must be called at a level where
>>> the parent's variables inform the operation. However I do not know
>>> enough about Unix and the way that R works on it to know where to go
>>> from here.
>>>
>>> I can see a few possibilities, all or none of which might work.
>>>
>>> 1) A different install of R explicitly using openblas.
>>>
>>> 2) The number of threads might be manipulable through another value or
>>> function (e.g. another environment variable - although I could not
>>> find a likely candidate.) There could be something elemental here I am
>>> missing.
>>>
>>> 3) Use some kind of hack to send system() calls from a C function -
>>> but this seems messy if at all possible.
>>>
>>> 4) Start R with different options.
>>>
>>> 5) There is an openblas_set_numthreads function in the source for
>>> openblas, but it was not clear to me how or indeed whether this can be
>>> used to alter the threads, and I could find no instances of people
>>> using it or similar functions. If anyone has managed to link some the
>>> openblas code into a project in a way to implement this function I
>>> would be very glad to hear about it.
>>>
>>> My apologies if this question has been covered before.
>>>
>>> Any help is much appreciated.
>>>
>>> Best wishes,
>>>
>>> Simon
>>
>> _______________________________________________
>> R-sig-hpc mailing list
>> R-sig-hpc at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>>
>>
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 


-- 
Claudia Beleites
Spectroscopy/Imaging
Institute of Photonic Technology
Albert-Einstein-Str. 9
07745 Jena
Germany

email: claudia.beleites at ipht-jena.de
phone: +49 3641 206-133
fax:   +49 2641 206-399



More information about the R-sig-hpc mailing list