[Bioc-devel] BiocParallel load balancing and runtime

Jiefei Wang @zwj|08 @end|ng |rom gm@||@com
Wed Aug 9 03:53:06 CEST 2023


Hello Anna,

The speed of parallel computing depends on many factors. To avoid any
potential confounders, Please try to use this code for timing (assuming you
still have all the variables you used in your example)

```
parallel_param <- SnowParam(workers = ncores, type = "SOCK", tasks =
            length(my_list),exportglobals = FALSE,exportvariables = FALSE)
bpstart(param)
system.time({
  res2 <- bplapply(my_list, FUN, BPPARAM = parallel_param)
})
bpstop(param)
```

Also, I encourage you to submit your question along with a reproducible
example to the GitHub issue here:

https://github.com/Bioconductor/BiocParallel/issues

It can help us manage the discussion and pinpoint the problem. Thanks

Best,
Jiefei








On Tue, Aug 8, 2023 at 8:21 AM Anna Plaxienko <anna using plaxienko.com> wrote:

> My motivation for using distributed memory was that my package is also
> accessible on Windows. Is it better to use shared memory as default but
> check the user's system and then switch to socket only if necessary?
>
> Regarding the real data. I have 68 samples (rows) of methylation EPIC array
> data (850K columns), that I split by chromosomes. So I get 22 matrices,
> each from 80K to 10K columns – that's why I need load balancing. When I use
> *clusterApplyLB*, the running time of my method is 38 minutes. With
> *bplapply* it's 42 minutes. In other examples the difference is the same
> 10-15%. It's of course not dramatic, if you've already waited 38 minutes,
> you can wait an extra 4 :) But I'm just curious as to why and if it's
> something I can fix.
>
> вт, 8 авг. 2023 г. в 15:04, Waldir Leoncio Netto <w.l.netto using medisin.uio.no
> >:
>
> > Dear Anna,
> >
> > According to the documentation of "BiocParallelParam", SnowParam() is a
> > subclass suitable for distributed memory (e.g. cluster) computing. If
> > you're running your code on a simpler machine with shared memory (e.g.
> your
> > PC), you're probably better off using MulticoreParam() instead. Here's a
> > modified example based on yours:
> >
> > # Setup
> > library(parallel)
> > library(BiocParallel)
> > my_list <- list(1:10, 11:20, 21:30, 31:40, 41:50, 51:60, 61:70, 71:80,
> > 81:90)
> > FUN <- function(x) return(x ^ 10)
> > ncores <- min(detectCores() - 1L, 10L)
> >
> > # Parallel
> > cl <- makeCluster(ncores)
> > print(system.time(res <- clusterApplyLB(cl, my_list, FUN)))
> > stopCluster(cl)
> >
> > # BiocParallel
> > parallel_param_1 <- SnowParam(workers = ncores, tasks = length(my_list))
> > print(system.time(res2 <- bplapply(my_list, FUN, BPPARAM =
> > parallel_param_1)))
> > parallel_param_2 <- MulticoreParam(workers = ncores, tasks =
> > length(my_list))
> > print(system.time(res3 <- bplapply(my_list, FUN, BPPARAM =
> > parallel_param_2)))
> >
> > On my machine, the output is as follows (notice the last column, with the
> > total time, shows MulticoreParam() performing better than parallel):
> >
> > brukar system brukt
> >  0.000 0.004  0.088
> > brukar system brukt
> >  0.114 0.001  1.336
> > brukar system brukt
> >  0.074 0.124  0.060
> >
> > How does that work on your actual data?
> >
> > Best,
> > Waldir
> >
> > ti., 08.08.2023 kl. 13.10 +0200, skrev Anna Plaxienko:
> >
> > Hi all!
> >
> > I'm switching from the base R *parallel* package to *BiocParallel* for my
> > Bioconductor submission and I have two questions. First, I wanted advice
> on
> > whether I've implemented load balancing correctly. Second, I've noticed
> > that the running time is about 15% longer with BiocParallel. Any ideas
> why?
> >
> >
> > Parallel code
> >
> > cl <- makeCluster(ncores)
> > res <- clusterApplyLB(cl, my_list, FUN)
> > stopCluster(cl)
> >
> > BiocParallel
> >
> > parallel_param <- SnowParam(workers = ncores, type = "SOCK", tasks =
> > length(my_list))
> > res2 <- bplapply(my_list, FUN, BPPARAM = parallel_param)
> >
> > Thank you!
> >
> > Best regards,
> > Anna Plaksienko
> >
> >         [[alternative HTML version deleted]]
> >
> > _______________________________________________
> > Bioc-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/bioc-devel
> >
> >
> >
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

	[[alternative HTML version deleted]]



More information about the Bioc-devel mailing list