[R] Parallel processing random 'save' error
William Dunlap
wdunlap at tibco.com
Mon Jul 1 16:57:30 CEST 2013
> Error in checkForRemoteErrors(val) :
> one node produced an error: (converted from warning)
> 'D:\_pgf\quantile_analysis2_f13\_save\dbz084_nump48\bins' already exists
That warning looks like it comes from dir.create(). Do you have
code that looks like:
if (!file.exists(tempDir)) {
dir.create(tempDir)
}
If so that could be the problem. The directory may not exist when file.exists()
is called but by the time dir.create is called another process may have created
it. Try replacing such code with
suppressWarnings(dir.create(tempDir))
if (!isTRUE(file.info(tempDir)$isdir)) {
stop("Cannot create tempDir=", tempDir)
}
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Rguy
> Sent: Monday, July 01, 2013 1:07 AM
> To: r-help at r-project.org
> Subject: [R] Parallel processing random 'save' error
>
> Platform: Windows 7
> Package: parallel
> Function: parLapply
>
> I am running a lengthy program with 8 parallel processes running in main
> memory.
> The processes save data using the 'save' function, to distinct files so
> that no conflicts writing to the same file are possible.
> I have been getting errors like the one shown below on a random basis,
> i.e., sometimes at one point in the execution, sometimes at another,
> sometimes no error at all.
> I should note that the directory referred to in the error message
> ( 'D:\_pgf\quantile_analysis2_f13\_save\dbz084_nump48\bins') contains, as I
> write, 124 files saved to it by the program without any error; which
> underscores the point that most of the time the saves occur with no problem.
>
> Error in checkForRemoteErrors(val) :
> one node produced an error: (converted from warning)
> 'D:\_pgf\quantile_analysis2_f13\_save\dbz084_nump48\bins' already exists
>
> Enter a frame number, or 0 to exit
>
> 1: main_top(9)
> 2: main_top.r#26: eval(call_me)
> 3: eval(expr, envir, enclos)
> 4: quantile_analysis(2)
> 5: quantile_analysis.r#69: run_all(layr, prjp, np, rules_tb, pctiles_tb,
> parx, logdir, logg)
> 6: run_all.r#73: parLapply(cl, ctrl_all$vn, qa1, prjp, dfr1, "iu__bool",
> parx, logdir, tstamp)
> 7: do.call(c, clusterApply(cl, x = splitList(X, length(cl)), fun = lapply,
> fun, ...), quote = TRUE)
> 8: clusterApply(cl, x = splitList(X, length(cl)), fun = lapply, fun, ...)
> 9: staticClusterApply(cl, fun, length(x), argfun)
> 10: checkForRemoteErrors(val)
> 11: stop("one node produced an error: ", firstmsg, domain = NA)
> 12: (function ()
> {
> error()
> utils::recover()
> })()
>
> Following the latest error I checked the system's connections as follows:
>
> Browse[1]> showConnections()
> description class mode text isopen can read can
> write
> 3 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 4 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 5 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 6 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 7 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 8 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 9 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> 10 "<-LAPTOP_32G_01:11741" "sockconn" "a+b" "binary" "opened" "yes"
> "yes"
> Browse[1]>
>
> It seems that the parallel processes might be sharing the same
> connection--or is it that they are utilizing connections that have the same
> name but are actually distinct because they are running in parallel?
> If the connections are the problem, how can I force each parallel process
> to use a different connection?
> If the connections are not the problem, then can someone suggest a
> diagnostic I might apply to tease out what is going wrong? Or perhaps some
> program setting that I may have neglected to consider?
>
> Thanks in advance for your help.
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list