[R] more efficient way to parallel
Martin Morgan
mtmorgan at fhcrc.org
Mon Aug 6 18:50:50 CEST 2012
On 08/06/2012 09:41 AM, Jie wrote:
> After searching online, I found that clusterCall or foreach might be the
> solution.
Re-write your outer loop as an lapply, then on non-Windows use
parallel::mclapply. Or on windows use makePSOCKcluster and parLapply. I
ended with
library(parallel)
library(MASS)
Maxi <- 10
Maxj <- 1000
doit <- function(i, Maxi, Maxj)
{
## initialization, not of interest
Sigmahalf <- matrix(sample(10000, replace=TRUE), 100)
Sigma <- t(Sigmahalf) %*% Sigmahalf
x <- mvrnorm(n=Maxj, rep(0, 100), Sigma)
xlist <- lapply(seq_len(nrow(x)), function(i, x) matrix(x[i,], 10), x)
## end of initialization
fun <- function(x) {
v <- eigen(x, symmetric=FALSE, only.values=TRUE)$values
min(abs(v))
}
dd1 <- sapply(xlist, fun)
dd2 <- dd1 + dd1 / sum(dd1)
sum(dd1 * dd2)
}
> system.time(lapply(1:8, doit, Maxi, Maxj))
user system elapsed
6.677 0.016 6.714
> system.time(mclapply(1:64, doit, Maxi, Maxj, mc.cores=8))
user system elapsed
68.857 1.032 10.398
the extra arguments to eigen are important, as is avoiding unnecessary
repeated calculations. The strategy of allocate-and-grow
(result.vec=numeric(); result.vec[i] <- ...) is very inefficient
(result.vec is copied in its entirety for each new value of i); better
preallocate-and-fill (result.vec = integer(Maxi); result.vec[i] = ...)
or let lapply manage the allocation.
Martin
>
> Best wishes,
> Jie
>
> On Sun, Aug 5, 2012 at 10:23 PM, Jie <jimmycloud at gmail.com> wrote:
>
>> Dear All,
>>
>> Suppose I have a program as below: Outside is a loop for simulation (with
>> random generated data), inside there are several sapply()'s (10~100) over
>> the data and something else, but these sapply's have to be sequential. And
>> each sapply do not involve very intensive calculation (a few seconds only).
>> So the outside loop takes minutes to finish one iteration.
>> I guess the better way is not to parallel sapply but the outer loop.
>> But I have no idea how to modify it. I have a simple code here. Only two
>> sapply's involved for simplicity. The logical in the sapply is not
>> important.
>> Thank you for your attention and suggestion.
>>
>> library(parallel)
>> library(MASS)
>> result.seq=c()
>> Maxi <- 100
>> for (i in 1:Maxi)
>> {
>> ## initialization, not of interest
>> Sigmahalf <- matrix(sample(1:10000,size = 10000,replace =T ), 100)
>> Sigma <- t(Sigmahalf)%*%Sigmahalf
>> x <- mvrnorm(n=1000, rep(0, 10), Sigma)
>> xlist <- list()
>> for (j in 1:1000)
>> {
>> xlist[[j]] <- list(X = matrix( x [j, ],5))
>> }
>> ## end of initialization
>>
>> dd1 <- sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
>> ##
>> sumdd1=sum(dd1)
>> for (j in 1:1000)
>> {
>> xlist[[j]]$dd1 <- dd1[j]/sumdd1
>> }
>> ## Assume dd2 and dd1 can not be combined in one sapply()
>> dd2 <- sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
>> result.seq[i] <- sum(dd1*dd2)
>>
>> }
>>
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
More information about the R-help
mailing list