[Rsighpc] parallel and openblas
Dirk Eddelbuettel
edd at debian.org
Wed Apr 25 16:50:01 CEST 2012
[ Sylvestre: I am tossing you in the middle of a thread here. We may have a
buglet in OpenBLAS where NO_AFFINITY=1 might be a good config value to add. ]
Steve,
Nice work, and mostly confirming against OpenBLAS, Atlas and a local (old)
GotoBLAS2.
On 25 April 2012 at 10:24, Stephen Weston wrote:
 I was able to confirm that when I built R using OpenBLAS on my
 Linux machine, my CPU affinity was modified right at the
 beginning of the R session:

 $ grep Cpus_allowed /proc/self/status
 Cpus_allowed: ffffffff,ffffffff
 Cpus_allowed_list: 063
 $ bin/R
 > readLines('/proc/self/status')[32]
 [1] "Cpus_allowed:\t00000000,00000001"
What kernel is that? On 3.0.017 (Ubuntu 11.10, infrequently rebooted) I get
edd at max:~$ grep Cpus_allowed /proc/self/status
Cpus_allowed: ff
Cpus_allowed_list: 07
edd at max:~$
 I then confirmed that this causes problems for parallel packages
 such as "parallel" by trying to use all six cores of my machine
 using the "mclapply" function:

 > library(parallel)
 > cores < detectCores()
 > mclapply(1:cores, function(i) repeat sqrt(3.14159), mc.cores=cores)

 When I executed "top" from another window and pressed "1", it
 showed that only one core was being used, and there were six R
 sessions, each getting 17% of the CPU.
When I run these three commands as a single line for r (from the littler package)
edd at max:~$ r e 'library(parallel); cores < detectCores(); print(cores); mclapply(1:cores, function(i) repeat sqrt(3.14159), mc.cores=cores)'
[1] 8
^C
edd at max:~$
I also get just one core covered. That is with
edd at max:~$ COLUMNS=94 dpkg lgrep "blas\atlas"  cut c78
ii gotoblas2helper 0.112.local.1 GotoBLAS2 helper
ii libblasdev 1.2.201104192ubu Basic Linear Algebra Subroutines 3, st
ii libblastest 1.2.201104192ubu Basic Linear Algebra Subroutines 3, te
ii libblas3gf 1.2.201104192ubu Basic Linear Algebra Reference impleme
ii libopenblasbase 0.1alpha2.23 Optimized BLAS (linear algebra) librar
ii libopenblasdev 0.1alpha2.23 Optimized BLAS (linear algebra) librar
edd at max:~$
where OpenBLAS provides BLAS as default.
That was after I had removed Atlas which is still my default. So if I
reiinstall Atlas (which "ranks higher" in the defaults and hence replaces
OpenBLAS) everything is fine  eight cores used.
edd at max:~$ COLUMNS=94 dpkg lgrep "blas\atlas"  cut c78
ii gotoblas2helper 0.112.local.1 GotoBLAS2 helper
ii libatlasbasedev 3.8.43build1 Automatically Tuned Linear Algebra Sof
ii libatlasdev 3.8.43build1 Automatically Tuned Linear Algebra Sof
ii libatlas3gfbase 3.8.43build1 Automatically Tuned Linear Algebra Sof
ii libblasdev 1.2.201104192ubu Basic Linear Algebra Subroutines 3, st
ii libblastest 1.2.201104192ubu Basic Linear Algebra Subroutines 3, te
ii libblas3gf 1.2.201104192ubu Basic Linear Algebra Reference impleme
ii libopenblasbase 0.1alpha2.23 Optimized BLAS (linear algebra) librar
ii libopenblasdev 0.1alpha2.23 Optimized BLAS (linear algebra) librar
edd at max:~$
 I also confirmed that "Cpus_allowed" was being set to the same
 value for each of the workers:

 > mclapply(1:cores, function(i) readLines('/proc/self/status')[32],
 mc.cores=cores)
 [[1]]
 [1] "Cpus_allowed:\t00000000,00000001"

 [[2]]
 [1] "Cpus_allowed:\t00000000,00000001"

 [[3]]
 [1] "Cpus_allowed:\t00000000,00000001"

 [[4]]
 [1] "Cpus_allowed:\t00000000,00000001"

 [[5]]
 [1] "Cpus_allowed:\t00000000,00000001"

 [[6]]
 [1] "Cpus_allowed:\t00000000,00000001"

 That is definitely not what you want to see, and explains why
 "mclapply" is only able to use one core.

 When I rebuilt and reinstalled OpenBLAS after editing
 Makefile.rule so that it contained the line:

 NO_AFFINITY = 1

 and then restarted R, the problem went away:

 $ bin/R
 > readLines('/proc/self/status')[32]
 [1] "Cpus_allowed:\tffffffff,ffffffff"

 This time when I ran "mclapply", "top" confirmed that I was
 using all six cores at about 100%.

 I didn't try this experiment with the older GotoBLAS2, but I
 believe the results would be the same.
I can confirm this. Using the packages
edd at max:~$ COLUMNS=94 dpkg lgrep "blas\atlas"  cut c78
ii gotoblas2 1.131 GotoBLAS2
ii gotoblas2helper 0.112.local.1 GotoBLAS2 helper
ii libblasdev 1.2.201104192ubu Basic Linear Algebra Subroutines 3, st
ii libblastest 1.2.201104192ubu Basic Linear Algebra Subroutines 3, te
ii libblas3gf 1.2.201104192ubu Basic Linear Algebra Reference impleme
edd at max:~$
where the GotoBLAS2 (locally built, using the gotoblas2helper package) now
provide BLAS, everything sticks to one core when running the mclapply.
I guess I'd need to fix gotoblas2helper and rebuild the gotoblas2. Or stick
with / hope for a corrected OpenBLAS build.
Dirk
  Steve


 On Tue, Apr 24, 2012 at 8:37 PM, Dirk Eddelbuettel <edd at debian.org> wrote:
 >
 > On 24 April 2012 at 15:45, Stephen Weston wrote:
 >  There's an interesting discussion entitled "all processes run on
 >  one CPU core" at:
 > 
 >  https://github.com/ipython/ipython/issues/840
 > 
 >  Someone was experiencing a very similar problem to the one that
 >  Claudia described using GotoBLAS2 with IPython and NumPy.
 >  Apparently it was fixed by recompiling GotoBLAS2 with the
 >  "NO_AFFINITY" parameter set to "1" in Makefile.rule, and then
 >  rebuilding "NumPy".
 > 
 >  It seems pretty strange, but GotoBLAS2/OpenBLAS may be modifying
 >  the affinity of the R process by calling sched_setaffinity() when
 >  it is initialized, and that is causing the problems that Claudia
 >  and Martin have seen.
 > 
 >  So perhaps the solution is to recompile GotoBLAS2/OpenBLAS with
 >  NO_AFFINITY=1, and then rebuild R with it.
 >
 > Good discussion, but one important nit: never a need to rebuild a R (provided
 > you have external / dynamically linked BLAS).
 >
 > Just restart R.
 >
 > Dirk
 >
 > 
