[R-sig-hpc] socket cluster/gotoblas2 configuration confusion

Stephen Weston stephen.b.weston at gmail.com
Wed Nov 23 16:50:45 CET 2011


You don't say how you set GOTO_NUM_THREADS to 6.  You
might want to verify that it did get set in each of the snow worker
processes by using the command:

    clusterEvalQ(cl, Sys.getenv('GOTO_NUM_THREADS'))

If it returns any empty strings in the resulting list, then the
environment variable is not set in the corresponding worker.

You probably should set this variable through an appropriate
shell startup file, but you could at least temporarily use:

    clusterEvalQ(cl, Sys.setenv(GOTO_NUM_THREADS=6))

- Steve


On Wed, Nov 23, 2011 at 9:44 AM, Claudia Beleites
<claudia.beleites at ipht-jena.de> wrote:
> Dear all,
>
> I'm just doing my first steps with parallelized calculations and got quite
> confused.
>
> Here's what I want, what I have and what I did:
>
> - I want to parallelize calculations on a Centos server with 2 x 6 cores and
> 8 GB RAM (it is actually part of a cluster, but I have access only to this
> node, and the other nodes do not (yet) have R installed).
>
> - My Data is too large to work with in one piece.
> But it comes in separate files of suitable size: I can work nicely with 2 to
> 3 samples in memory at the same time.
>
> - So my idea was to start up a snow socket cluster with 2 or 3 workers.
>
> - In addition I want to use an optimized and blas. Linear algebra is only a
> small part of the analysis so it does make sense to have the socket cluster
> with as many workers as possible and have the linear algebra parts use up to
> n / nworkers cores.
>
> So I built R 2.14.0 using gotoblas2 and set $GOTO_NUM_THREADS to 6. Matrix
> multiplication in a fresh R session now is much faster and CPU usage shows
> the expected 6 cores working:
>
>> system.time ({m <- matrix (1:9e6, 3e3); m%*%m; NULL})
>       User      System verstrichen
>      5.219       0.126       1.111
>
> However, the socket clusters seem not to use the GOTO_NUM_THREADS:
>> library (snow)
>> cl <- makeCluster(2,type="SOCK")
>> tm <- snow.time(clusterEvalQ(cl, {m <- matrix (1:9e6, 3e3); m%*%m; NULL}))
>> tm
> elapsed    send receive  node 1  node 2
>  9.553   0.001   0.010   9.510   9.543
>> tm$data
> [[1]]
>     send_start send_end recv_start recv_end exec
> [1,]          0    0.001      9.511    9.512 9.51
>
> [[2]]
>     send_start send_end recv_start recv_end  exec
> [1,]      0.001    0.001      9.544    9.553 9.543
>
>> tm$elapsed
> elapsed
>  9.553
>>
>
> CPU usage shows 2 cores working, and the times correspond to that.
>
> What configuration do I need to do in order to make the blas use more
> threads for the worker processes? Anything else I should do differently?
>
>
>> sessionInfo ()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=de_DE.UTF-8        LC_COLLATE=de_DE.UTF-8
>  [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8
>  [7] LC_PAPER=C                 LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] snow_0.3-8
>
> Additional questions:
> - Is there some command like sessionInfo () that yields information about
> the blas (particularly NUM_THREADS)?
> - Is there some command that I can use to tell the blas how many threads to
> use during an R session? Can I set environment variables from within R?
> Searching didn't help as I got only info about R environments...) Would that
> actually help here?
>
> Thanks a lot for your help.
>
> Claudia
>
> --
> Claudia Beleites
> Spectroscopy/Imaging
> Institute of Photonic Technology
> Albert-Einstein-Str. 9
> 07745 Jena
> Germany
>
> email: claudia.beleites at ipht-jena.de
> phone: +49 3641 206-133
> fax:   +49 2641 206-399
>
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
>



More information about the R-sig-hpc mailing list