[R-sig-hpc] About Multicore: mclapply

Mon Jan 16 17:00:02 CET 2012

On Jan 16, 2012, at 9:02 AM, Prashantha Hebbar wrote:

> Hello friends,
> I was tryig to parallize a function using mclapply. But I find lapply() executes in lesser time than mclapply(). I have given here my system time taken for both the functions.
>> library(ShortRead)
>> library(multicore)> fqFiles <- list.files("./test")
>> system.time(lapply(fqFiles, function(fqFiles){
>   readsFq <- readFastq(dirPath="./test",pattern=fqFiles)
>   }))
>    user  system elapsed 
>   0.399   0.021   0.419 
>> system.time(mclapply(fqFiles, function(fqFiles){
>    readsFq <- readFastq(dirPath="./test",pattern=fqFiles)},mc.cores=3))
>    user  system elapsed 
>   0.830   0.151   0.261 
> 
> Since the ./test directory contains three fastq files. I have used mc.cores = 3.
> 
> here is my mpstat output for mclapply()
> 
> 04:47:55 PM  CPU   %user   %nice    %sys %iowait    %irq   %soft  %steal   %idle    intr/s
> 04:47:56 PM  all   13.86    0.00    1.37    0.00    0.00    0.00    0.00   84.77   1023.23
> 04:47:56 PM    0   21.21    0.00    2.02    0.00    0.00    0.00    0.00   76.77   1011.11
> 04:47:56 PM    1   33.00    0.00    2.00    0.00    0.00    0.00    0.00   65.00      9.09
> 04:47:56 PM    2    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
> 04:47:56 PM    3    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      3.03
> 04:47:56 PM    4    3.03    0.00    2.02    0.00    0.00    0.00    0.00   94.95      0.00
> 04:47:56 PM    5    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
> 04:47:56 PM    6    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00      0.00
> 04:47:56 PM    7   53.00    0.00    4.00    0.00    0.00    0.00    0.00   43.00      0.00
> 
> Hence,Can you please suggest me, why mclapply has taken more time than lapply()?
> 

multicore is designed for parallel *computing* which is not what you do. For serial tasks (like yours) it will be always slower, because it needs to a) spawn processes b) read the data (serially since you use the same location) c) serialize all the data and send it to the master process, d) unserialize and concatenate all the data in the master process to a list. If you run lapply it does only b) which is in your case not the slowest part. Using multicore makes only sense if you actually perform computations (or any parallel task).

Cheers,
Simon

> Thanking you in anticipation.
> Regards,
> Prashantha
> Prashantha Hebbar Kiradi,
> 
> E-mail: prashantha.hebbar at dasmaninstitute.org
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc