[R-sig-hpc] Parallel multi-core processing for complex R functions

Simon Urbanek simon.urbanek at r-project.org
Mon Jul 15 15:32:46 CEST 2013

On Jul 14, 2013, at 11:13 PM, Alistair Perry wrote:

> As you know R doesn't explicitly allow parallel processing on multiple
> cores.
> I have two complex functions called "test" and "rcc". I did not write these
> codes.
> The basic premise of the function code "test" is to calculate the rich-club
> members of a weighted matrix (a certain group of nodes [one node = one
> matrix definition] are more connected to each other than other nodes- see
> http://toreopsahl.com/tnet/weighted-networks/weighted-rich-club-effect/).
> In short, the code is designed to calculate the strength of the ties
> between prominent nodes. However, to test whether there is a significant
> effect, the input matrix (argument "net" in the code "test") must be
> compared to a randomized matrix (the links of the input matrix are
> reshuffled) reshuffled 1000 times. The randomized matrix is created by the
> function code "rcc". This code is embedded in the function code "test". You
> can set how many times the input matrix is to be reshuffled using the
> argument "NR" (i.e NR = 1000 - means input matrix will be reshuffled 1000
> times)
> The issue here is that as R does not explicitly allow multi-core parallel
> processing, so the computation for one matrix (500x500) using the "test"
> code can take over a week. I am using a quad-core processer with a linux
> OS, but only one core is being used. I am aware that there is the base
> package "parallel" and "mclapply" to multi-thread the function, but these
> commands require an input argument ("X"), along with the function I wish to
> process using all cores. However, the functions I am using require an input
> (equivalent to X) within the argument variables, so it would mean setting
> the input matrix twice as an input argument variable.

That's not true, you can use the matrix directly from the parallel code, because everything is shared. If you wrote your code using apply instead of a loop, you would have seen that all you need to do is to replace lapply with mclapply: You have

 rphi <- matrix(data = 0, nrow = nrow(ophi), ncol = NR)
 for (i in 1:NR) {    
    rnet <- rcc(net, option = reshuffle)
    rphi[, i] <- phi(rnet)

which can be simplified to

rphi <- sapply(seq.int(NR), function(i) phi(rcc(net, option=reshuffle)))

and thus the parallel version is simply

rphi <- simplify2array(mclapply(seq.int(NR), function(i) phi(rcc(net, option=reshuffle))))


> Does anyone have an idea of how I could multi-thread the code.
> Particularly, the section of the code where the matrix is reshuffled, so
> the matrix could be reshuffled 250 times on one core (if there were 4
> cores)? This would speed up the computation dramtically.
> The codes are available to be viewed on
> http://stackoverflow.com/questions/17646190/parallel-and-multicore-processing-for-complex-r-function
