[R-sig-hpc] speed up a function containing resampling
Sven E Templer
sven.templer at gmail.com
Thu May 4 09:47:14 CEST 2017
Dear John,
there are two things I recognized:
a) do not use do.call within mclapply: in this case, tii.calc is run once, and the result returned to mclapply, which is not a function; use the ... in mclapply instead to provide your config
b) in tii.calc you will need an argument for the iteration integer, can also be ..., see below
Put in a simple repruducible example:
library(parallel)
config <- list(a=2, b=2)
f <- function (a = 1, b = 1, ...) a + b
iter <- 1:3
mclapply(iter, do.call(f, config), mc.cores = 3) # this breaks, because f is called and returns a value, which is not a function
mclapply(iter, f, a = 2, b = 2, mc.cores = 3) # this breaks, if f() has no argument for each value in iter, otherwise runs; a=2 and b=2 get forwarded to f for each iter
So this needs to change:
a) mclapply(iters, tii.calc, treelist=miltrees, taxnames=nom, full.trees=fulltree, outgroup=outgroup, burnin=burnin, mc.cores=ncores)
b) tii.calc <- function(treelist, taxnames, full.trees, outgroup=NULL, burnin=NULL, ...) {...}
Hope this helps,
Sven
> On 3. May 2017, at 22:39, John Denton <jdenton at amnh.org> wrote:
>
> Hi, all.
>
> I’m trying to assemble a set of functions to do a resampling of a value that is calculated by a combination of subsampling and sums. Each iteration involves reading in a number of large files, sampling from the entries in the files, and doing calculations based on these subsamples. The way I have it set up right now is through a combination of lapply and for. The outer resampling is then done with mclapply.
>
> I have two problems. First, the code used to run, but now doesn’t. I get the error
>
> In parallel:::mclapply(iters, do.call(tii.calc, config), mc.cores = ncores) :
> all scheduled cores encountered errors in user code
> >
>
> Any ideas on how to fix this problem? I am running R 3.3.1 in the terminal on a personal OS X Yosemite machine.
>
>
> Second, I’m hoping to speed up the function. It is currently very slow, and I need to do a number of outer resamplings in the 100s or 1000s.
>
> Here is the outer resampling code that calls the function itself (attached):
>
> source("tree_collapser.R")
> source("~/taxon_influence/tii_calc.r")
> library(phangorn)
> library(SnowballC)
> library(ape)
> library(stringr)
>
> outgroup <- "Hemichordata"
> burnin <- 0.5
> n.iters <- 2
> ncores <- parallel:::detectCores()/4
>
> d <- read.nexus.data(as.character("tully.nex"))
> nom <- sort(names(d)[-which(names(d) == outgroup)])
>
> miltrees <- list.files(pattern="^minus[A-Z].*\\.t$")
> fulltree <- read.nexus("tully.nex.t_alltaxa")
> ##iter.list <- rep(list(miltrees), n.iters)
> iters <- 1:n.iters
>
> ##tii.rep <- parallel:::mclapply(iters, tii.calc(iter.list, taxnames=nom,
> ## full.trees=fulltree, outgroup=outgroup, burnin=burnin), mc.cores=ncores)
>
>
> config <- list(treelist=miltrees, taxnames=nom, full.trees=fulltree,
> outgroup=outgroup, burnin=burnin)
>
> tii.rep <- parallel:::mclapply(iters, do.call(tii.calc, config), mc.cores=ncores)
>
> Thanks!
>
> ~John
>
>
> <tii_calc.r>_______________________________________________
> R-sig-hpc mailing list
> R-sig-hpc at r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-hpc
More information about the R-sig-hpc
mailing list