[R-sig-hpc] speed up a function containing resampling

Sven E Templer sven.templer at gmail.com
Thu May 4 09:47:14 CEST 2017


Dear John,


there are two things I recognized:

a) do not use do.call within mclapply: in this case, tii.calc is run once, and the result returned to mclapply, which is not a function; use the ... in mclapply instead to provide your config
b) in tii.calc you will need an argument for the iteration integer, can also be ..., see below

Put in a simple repruducible example:

library(parallel)
config <- list(a=2, b=2)
f <- function (a = 1, b = 1, ...) a + b
iter <- 1:3
mclapply(iter, do.call(f, config), mc.cores = 3) # this breaks, because f is called and returns a value, which is not a function
mclapply(iter, f, a = 2, b = 2, mc.cores = 3) # this breaks, if f() has no argument for each value in iter, otherwise runs; a=2 and b=2 get forwarded to f for each iter

So this needs to change:
a) mclapply(iters, tii.calc, treelist=miltrees, taxnames=nom, full.trees=fulltree, outgroup=outgroup, burnin=burnin, mc.cores=ncores)
b) tii.calc <- function(treelist, taxnames, full.trees, outgroup=NULL, burnin=NULL, ...) {...}


Hope this helps,

Sven

> On 3. May 2017, at 22:39, John Denton <jdenton at amnh.org> wrote:
> 
> Hi, all.
> 
> I’m trying to assemble a set of functions to do a resampling of a value that is calculated by a combination of subsampling and sums. Each iteration involves reading in a number of large files, sampling from the entries in the files, and doing calculations based on these subsamples. The way I have it set up right now is through a combination of lapply and for. The outer resampling is then done with mclapply.
> 
> I have two problems. First, the code used to run, but now doesn’t. I get the error
> 
> In parallel:::mclapply(iters, do.call(tii.calc, config), mc.cores = ncores) :
>   all scheduled cores encountered errors in user code
> > 
> 
> Any ideas on how to fix this problem? I am running R 3.3.1 in the terminal on a personal OS X Yosemite machine.
> 
> 
> Second, I’m hoping to speed up the function. It is currently very slow, and I need to do a number of outer resamplings in the 100s or 1000s.
> 
> Here is the outer resampling code that calls the function itself (attached):
> 
> source("tree_collapser.R")
> source("~/taxon_influence/tii_calc.r")
> library(phangorn)
> library(SnowballC)
> library(ape)
> library(stringr)
> 
> outgroup <- "Hemichordata"
> burnin <- 0.5
> n.iters <- 2
> ncores <- parallel:::detectCores()/4
> 
> d <- read.nexus.data(as.character("tully.nex"))
> nom <- sort(names(d)[-which(names(d) == outgroup)])
> 
> miltrees <- list.files(pattern="^minus[A-Z].*\\.t$")
> fulltree <- read.nexus("tully.nex.t_alltaxa")
> ##iter.list <- rep(list(miltrees), n.iters)
> iters <- 1:n.iters
> 
> ##tii.rep <- parallel:::mclapply(iters, tii.calc(iter.list, taxnames=nom, 
> ## full.trees=fulltree, outgroup=outgroup, burnin=burnin), mc.cores=ncores)
> 
> 
> config <- list(treelist=miltrees, taxnames=nom, full.trees=fulltree, 
> outgroup=outgroup, burnin=burnin)
> 
> tii.rep <- parallel:::mclapply(iters, do.call(tii.calc, config), mc.cores=ncores)
> 
> Thanks!
> 
> ~John
> 
> 
> <tii_calc.r>_______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc



More information about the R-sig-hpc mailing list