Install from CRAN
install.packages("mcprogress")
Install from Github
devtools::install_github("myles-lewis/mcprogress")
This package adds a progress bar to mclapply()
using
echo
to output to the console in Rstudio or Linux
environments. Simply replace your original call to
mclapply()
with pmclapply()
.
library(mcprogress)
# toy example
res <- pmclapply(letters[1:20], function(i) {
Sys.sleep(0.2 + runif(1) * 0.1)
setNames(rnorm(5), paste0(i, 1:5))
}, mc.cores = 2, title = "Working")
Working |================================ | 60% eta 3.1 secs
pmclapply()
can be used in an identical manner to
mclapply()
. It is ideal for use if the length of
X
is comparably > cores. As processes are spawned in a
block and most code for each process completes at roughly the same time,
processes move along in blocks as determined by mc.cores
.
To track progress, pmclapply
only tracks the nth process,
where n=mc.cores
. For example, with 4 cores,
pmclapply
reports progress when the 4th, 8th, 12th, 16th
etc process has completed.
ETA is approximate. As part of minimising overhead, it is only updated with each change in progress (i.e. each time a block of processes completes). It is not updated by interrupt.
However, in some scenarios the length of X
is comparable
to the number of cores and each process may take a long time. For
example, machine learning applied to each of 8 cross-validation folds on
an 8-core machine will open 8 processes from the outset. Each process
will often complete at roughly the same time. In this case
pmclapply
is much less informative as it only shows
completion at the end of 1 round of processes, so it will go from 0%
straight to 100%.
For this scenario, we recommend users use
mcProgressBar()
which allows more fine-grained reporting of
subprogress from within a block of parallel processes. The diagram below
illustrates computation involving 10 processes to complete across 8
cores, with subprogress divided into 5 intervals.
Technically only 1 process can be tracked. If cores
is
set to 4 and subval
is invoked, then the 1st, 5th, 9th,
13th etc process is tracked. Subprogress of this process is computed as
part of the number of blocks of processes required.
In the next example, we build a custom function showing how to use
mcProgressBar()
including a call to mclapply
wrapped around another nested function which can report subprogress.
library(parallel)
my_fun <- function(x, cores) {
start <- Sys.time()
mcProgressBar(0, title = "my_fun") # initialise progress bar
res <- mclapply(seq_along(x), function(i) {
# inner loop of calculation
y <- 1:4
inner <- lapply(seq_along(y), function(j) {
Sys.sleep(0.2 + runif(1) * 0.1)
mcProgressBar(val = i, len = length(x), cores, subval = j / length(y),
title = "my_fun", start = start)
rnorm(4)
})
inner
}, mc.cores = cores)
closeProgress(start, title = "my_fun") # finalise the progress bar
res
}
output <- my_fun(letters[1:4], cores = 2)
Alternatively even if the function call inside mclapply
does not have a for loop or equivalent, then progress can still be
reported manually after chunks of computation.
## Example of long function
longfun <- function(x, cores) {
start <- Sys.time()
mcProgressBar(0, title = "longfun") # initialise progress bar
res <- mclapply(seq_along(x), function(i) {
# long sequential calculation in parallel with 3 major steps applied to x[i]
Sys.sleep(0.5)
mcProgressBar(val = i, len = length(x), cores, subval = 0.33,
title = "longfun", start = start) # 33% complete
Sys.sleep(0.5)
mcProgressBar(val = i, len = length(x), cores, subval = 0.66,
title = "longfun", start = start) # 66% complete
Sys.sleep(0.5)
mcProgressBar(val = i, len = length(x), cores, subval = 1,
title = "longfun", start = start) # 100% complete
return(rnorm(4))
}, mc.cores = cores)
closeProgress(start, title = "longfun") # finalise the progress bar
res
}
output <- longfun(letters[1:4], cores = 2)
The mcProgressBar
function can be used with the
foreach
package and the doMC
package multicore
backend to show a progress bar.
# Example from doMC vignette
library(doMC)
library(foreach)
registerDoMC(4)
x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
{
start <- Sys.time()
r <- foreach(i = seq_len(trials), .combine = cbind) %dopar% {
ind <- sample(100, 100, replace = TRUE)
result1 <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
mcProgressBar(i, trials, cores = getDoParWorkers(), start = start)
coefficients(result1)
}
closeProgress(start)
}
# Equivalent using pmclapply
r <- pmclapply(seq_len(trials), function(i) {
ind <- sample(100, 100, replace = TRUE)
result1 <- glm(x[ind, 2] ~ x[ind, 1], family = binomial(logit))
coefficients(result1)
}, mc.cores = 4)
The package also includes functions to safely print messages (including error messages) from within parallelised code. These can be very useful for debugging parallel R code.
res <- mclapply(1:5, function(i) {
Sys.sleep(runif(1) /10)
message_parallel("Process ", i, " done")
rnorm(1)
})
## Process 1 done
## Process 3 done
## Process 2 done
## Process 5 done
## Process 4 done
If errors occur during parallel processing, mclapply
generates a nondescript warning “all scheduled cores encountered errors
in user code”. One option is to set mc.cores = 1
. This will
often reveal the error message, but can be slow if computation is long
and the error occurs only half way through.
out <- mclapply(1:5, function(i) {
rnorm(-1)
}, mc.cores = 2) # change mc.cores = 1 to reveal actual error message
## Warning in mclapply(1:5, function(i) {: all scheduled cores encountered errors
## in user code
The function catchError()
enables an expression to be
wrapped in try()
so that code is executed and if an error
message is produced it is printed to the console to be more visible. If
no error is generated the usual of the expression is returned. This
allows you to write your code as usual. It can more easily be utilised
using the pipe |>
. Additional arguments can be provided
to track values so that the programmer can more easily find out when the
error occurs.
out <- mclapply(1:5, function(i) {
j = 4 + i
rnorm(-1) |> catchError(i, j)
}, mc.cores = 2)
## Error in rnorm(-1) : invalid arguments
## i=1, j=5
## Error in rnorm(-1) : invalid arguments
## i=2, j=6
The function mcstop()
allows programmers to generate
visible error messages during parellel code.
res <- mclapply(1:5, function(i) {
Sys.sleep(runif(1) /10)
if (i == 5) mcstop("My error message")
rnorm(1)
})
## My error message