[R] interaction between clusterMap(), read.csv() and try() - try does not catch error
Strunk, Jacob (DNR)
Jacob.Strunk at dnr.wa.gov
Mon Aug 8 18:07:52 CEST 2016
Hello I am attempting to process a list of csv files in parallel, some of which may be empty and fail with read.csv. I tend to use clusterMap as my go-to parallel function but have run into an interesting behavior. The behavior is that try(read.csv(x)) does not catch read errors resulting from having an empty csv file inside of clusterMap. I have not tested this with other functions (e.g. read.table, mean, etc.). The parLapply function does, it appears, correctly catch the errors. Any suggestions on how I should code with clusterMap such that try is guaranteed to catch the error?
I am working on windows server 2012
I have the latest version of R and parallel
I am executing the code from within the rstudio ide Version 0.99.896
Here is a demonstration of the failure
R code used in demonstration:
#prepare csv files - an empty file and a file with data
close(file("c:/temp/badcsv.csv",open="w"))
write.table(data.frame(x=2),"c:/temp/goodcsv.csv")
#prepare a parallel cluster
clus0=makeCluster(1, rscript_args = "--no-site-file")
#read good / bad files in parallel with parLapply - which succeeds: try Does catch err
x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...)))
print(x1)
#read good / bad files in parallel with clusterMap - which fails: try does Not catch error
x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F)
print(x0)
R output:
> #prepare csv files - an empty file and a file with data
> close(file("c:/temp/badcsv.csv",open="w"))
> write.table(data.frame(x=2),"c:/temp/goodcsv.csv")
>
> #prepare a parallel cluster
> clus0=makeCluster(1, rscript_args = "--no-site-file")
>
> #read good / bad files in parallel with parLapply - which succeeds: try Does catch err
> x1=parLapply(clus0,c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),function(...)try(read.csv(...)))
> print(x1)
[[1]]
[1] "Error in read.table(file = file, header = header, sep = sep, quote = quote, : \n no lines available in input\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in read.table(file = file, header = header, sep = sep, quote = quote, dec = dec, fill = fill, comment.char = comment.char, ...): no lines available in input>
[[2]]
x
1 1 2
>
> #read good / bad files in parallel with clusterMap - which fails: try does Not catch error
> x0=clusterMap(clus0,function(...)try(read.csv(...)),c("c:/temp/badcsv.csv","c:/temp/goodcsv.csv"),SIMPLIFY=F)
Error in checkForRemoteErrors(val) :
one node produced an error: Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
> print(x0)
Error in print(x0) : object 'x0' not found
>
Thanks for any help,
Jacob
[[alternative HTML version deleted]]
More information about the R-help
mailing list