[Rd] segfault issue with parallel::mclapply and download.file() on Mac OS X

Seth Russell @eth@ru@@ell @ending from gm@il@com
Wed Sep 19 23:19:48 CEST 2018


I have an lapply function call that I want to parallelize. Below is a very
simplified version of the code:

url_base <- "https://cloud.r-project.org/src/contrib/"
files <- c("A3_1.0.0.tar.gz", "ABC.RAP_0.9.0.tar.gz")
res <- parallel::mclapply(files, function(s) download.file(paste0(url_base,
s), s))

Instead of download a couple of files in parallel, I get a segfault per
process with a 'memory not mapped' message. I've been working with Henrik
Bengtsson on resolving this issue and he recommended I send a message to
the R-Devel mailing list.

Here's the output:

trying URL 'https://cloud.r-project.org/src/contrib/A3_1.0.0.tar.gz'
trying URL 'https://cloud.r-project.org/src/contrib/ABC.RAP_0.9.0.tar.gz'

 *** caught segfault ***
address 0x11575ba3a, cause 'memory not mapped'

 *** caught segfault ***
address 0x11575ba3a, cause 'memory not mapped'

Traceback:
 1: download.file(paste0(url_base, s), s)
 2: FUN(X[[i]], ...)
 3: lapply(X = S, FUN = FUN, ...)
 4: doTryCatch(return(expr), name, parentenv, handler)
 5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 6: tryCatchList(expr, classes, parentenv, handlers)
 7: tryCatch(expr, error = function(e) {    call <- conditionCall(e)    if
(!is.null(call)) {        if (identical(call[[1L]], quote(doTryCatch)))
        call <- sys.call(-4L)        dcall <- deparse(call)[1L]
 prefix <- paste("Error in", dcall, ": ")
        LONG <- 75LTraceback:
        sm <- strsplit(conditionMessage(e), "\n")[[1L]] 1:         w <- 14L
+ nchar(dcall, type = "w") + nchar(sm[1L], type = "w")        if (is.na(w))
download.file(paste0(url_base, s), s)            w <- 14L + nchar(dcall,
type = "b") + nchar(sm[1L],
                type = "b")        if (w > LONG)  2: FUN(X[[i]], ...)
 3: lapply(X = S, FUN = FUN, ...)
 4: doTryCatch(return(expr), name, parentenv, handler)
 5: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 6:             prefix <- paste0(prefix, "\n  ")tryCatchList(expr, classes,
parentenv, handlers)
    }    else prefix <- "Error : " 7:     msg <- paste0(prefix,
conditionMessage(e), "\n")tryCatch(expr, error = function(e) {
 .Internal(seterrmessage(msg[1L]))    call <- conditionCall(e)    if
(!silent && isTRUE(getOption("show.error.messages"))) {    if
(!is.null(call)) {        cat(msg, file = outFile)        if
(identical(call[[1L]], quote(doTryCatch)))
.Internal(printDeferredWarnings())            call <- sys.call(-4L)    }
     dcall <- deparse(call)[1L]    invisible(structure(msg, class =
"try-error", condition = e))        prefix <- paste("Error in", dcall, ":
")})        LONG <- 75L        sm <- strsplit(conditionMessage(e),
"\n")[[1L]]
        w <- 14L + nchar(dcall, type = "w") + nchar(sm[1L], type = "w")
   if (is.na(w))  8:             w <- 14L + nchar(dcall, type = "b") +
nchar(sm[1L], try(lapply(X = S, FUN = FUN, ...), silent = TRUE)
   type = "b")
        if (w > LONG)             prefix <- paste0(prefix, "\n  ") 9:
}sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))    else
prefix <- "Error : "
    msg <- paste0(prefix, conditionMessage(e), "\n")
 .Internal(seterrmessage(msg[1L]))10:     if (!silent &&
isTRUE(getOption("show.error.messages"))) {FUN(X[[i]], ...)        cat(msg,
file = outFile)
        .Internal(printDeferredWarnings())    }11:
invisible(structure(msg, class = "try-error", condition =
e))lapply(seq_len(cores), inner.do)})

12:  8: parallel::mclapply(files, function(s)
download.file(paste0(url_base, try(lapply(X = S, FUN = FUN, ...), silent =
TRUE)    s), s))

 9:
sendMaster(try(lapply(X = S, FUN = FUN, ...), silent = TRUE))Possible
actions:

1: abort (with core dump, if enabled)
2: normal R exit
10: 3: exit R without saving workspace
FUN(X[[i]], ...)4: exit R saving workspace

11: lapply(seq_len(cores), inner.do)
12: parallel::mclapply(files, function(s) download.file(paste0(url_base,
  s), s))

Here's my sessionInfo()

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS/LAPACK: /usr/local/Cellar/openblas/0.3.3/lib/libopenblasp-r0.3.3.dylib

locale:
[1] en_US/en_US/en_US/C/en_US/en_US

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

loaded via a namespace (and not attached):
[1] compiler_3.5.1

My version of R I'm running was installed via homebrew with "brew install r
--with-java --with-openblas"

Also, the provided example code works as expected on Linux. Also, if I
provide a non-default download method to the download.file() call such as:

res <- parallel::mclapply(files, function(s) download.file(paste0(url_base,
s), s, method="wget"))
res <- parallel::mclapply(files, function(s) download.file(paste0(url_base,
s), s, method="curl"))

It works correctly - no segfault. If I use method="libcurl" it does
segfault.

I'm not sure what steps to take to further narrow down the source of the
error.

Is this a known bug? if not, is this a new bug or an unexpected feature?

Thanks,
Seth

	[[alternative HTML version deleted]]



More information about the R-devel mailing list