[R-sig-hpc] Parallel multi-core processing for complex R functions

Mon Jul 15 15:32:46 CEST 2013

On Jul 14, 2013, at 11:13 PM, Alistair Perry wrote:

> As you know R doesn't explicitly allow parallel processing on multiple
> cores.
> 
> I have two complex functions called "test" and "rcc". I did not write these
> codes.
> 
> The basic premise of the function code "test" is to calculate the rich-club
> members of a weighted matrix (a certain group of nodes [one node = one
> matrix definition] are more connected to each other than other nodes- see
> http://toreopsahl.com/tnet/weighted-networks/weighted-rich-club-effect/).
> 
> In short, the code is designed to calculate the strength of the ties
> between prominent nodes. However, to test whether there is a significant
> effect, the input matrix (argument "net" in the code "test") must be
> compared to a randomized matrix (the links of the input matrix are
> reshuffled) reshuffled 1000 times. The randomized matrix is created by the
> function code "rcc". This code is embedded in the function code "test". You
> can set how many times the input matrix is to be reshuffled using the
> argument "NR" (i.e NR = 1000 - means input matrix will be reshuffled 1000
> times)
> 
> The issue here is that as R does not explicitly allow multi-core parallel
> processing, so the computation for one matrix (500x500) using the "test"
> code can take over a week. I am using a quad-core processer with a linux
> OS, but only one core is being used. I am aware that there is the base
> package "parallel" and "mclapply" to multi-thread the function, but these
> commands require an input argument ("X"), along with the function I wish to
> process using all cores. However, the functions I am using require an input
> (equivalent to X) within the argument variables, so it would mean setting
> the input matrix twice as an input argument variable.
> 

That's not true, you can use the matrix directly from the parallel code, because everything is shared. If you wrote your code using apply instead of a loop, you would have seen that all you need to do is to replace lapply with mclapply: You have

 rphi <- matrix(data = 0, nrow = nrow(ophi), ncol = NR)
 for (i in 1:NR) {    
    rnet <- rcc(net, option = reshuffle)
    rphi[, i] <- phi(rnet)
}

which can be simplified to

rphi <- sapply(seq.int(NR), function(i) phi(rcc(net, option=reshuffle)))

and thus the parallel version is simply

rphi <- simplify2array(mclapply(seq.int(NR), function(i) phi(rcc(net, option=reshuffle))))

Cheers,
Simon

> Does anyone have an idea of how I could multi-thread the code.
> Particularly, the section of the code where the matrix is reshuffled, so
> the matrix could be reshuffled 250 times on one core (if there were 4
> cores)? This would speed up the computation dramtically.
> 
> The codes are available to be viewed on
> http://stackoverflow.com/questions/17646190/parallel-and-multicore-processing-for-complex-r-function
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
> 
>