[R] Print All Warnings that Occurr in All Parallel Nodes
TELLERIA RUIZ DE AGUIRRE, JUAN
JTELLERIA at external.gamesacorp.com
Thu Sep 14 09:48:04 CEST 2017
Dear R Users,
I have developed the following code for importing a series of zipped CSV by parallel computing.
My problems are that:
A) Some ZIP Files (Which contain CSVs inside) are corrupted, and cannot be opened.
B) After executing parRapply I can only see the last.warning variable error, for knowing which CSV have failed in each node, but I cannot see all warnings, only 1 at a time.
So:
* For showing a list of all warnings in all nodes, I was thinking of using the following function in the code:
warnings(DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function))
Would this work?
* And also, How could I check that a CSV can be opened before applying the function, and create an empty data.frame for those CSV.
Thank you,
Juan
CODE
################################################################################
## DISPOIN Data Import Into MariaDB
################################################################################
## -----------------------------------------------------------------------------
## Packages
## -----------------------------------------------------------------------------
# update.packages("RODBC")
# update.packages("tidyverse")
## -----------------------------------------------------------------------------
## Libraries
## -----------------------------------------------------------------------------
suppressMessages(require(RODBC))
suppressMessages(require(tidyverse))
suppressMessages(require(parallel))
## -----------------------------------------------------------------------------
## CMD: Command for DISPOIN's Directory Acquisition
## -----------------------------------------------------------------------------
# shell(cmd = 'pushd "\\srvdiscsv\data" && dir *AL*.zip /b /s > D:\DISPOIN_Data_Directories.csv && popd')
## -----------------------------------------------------------------------------
## RODBC
## -----------------------------------------------------------------------------
## A) MariaDB Connection String
con <- odbcConnect("MariaDB_Tornado24")
invisible(sqlQuery(con, "USE dispoin;"))
# B) Import R Data Directories from MariaDB
DISPOIN_DIR_REL <- as_tibble(sqlFetch(con, "dispoin.t_DISPOIN_DIR_REL"))
odbcClose(con)
# C) Import Zipped CSV data into List of Dataframes, which latter on are compiled as a single dataframe by
# means of rbind
# C.1) parRapply Function Initialization:
parRaplly_Function <- function (DISPOIN_CSV_Row)
{
return(read_csv2(
file = DISPOIN_CSV_Row,
col_names = c(
"SCADA",
"TAG",
"ID_del_AEG",
"Descripcion",
"Time_ON",
"Time_OFF",
"Delta_Time",
"Comentario",
"Es_Alarma",
"Es_Ultima",
"Comentarios"),
col_types = cols(
"SCADA" = "c",
"TAG" = "c",
"ID_del_AEG" = "c",
"Descripcion" = "c",
"Time_ON" = "c",
"Time_OFF" = "c",
"Delta_Time" = "c",
"Comentario" = "c",
"Es_Alarma" = "c",
"Es_Ultima" = "c",
"Comentarios" = "c"),
locale = default_locale(),
na = c("", " "),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = FALSE))
}
# C.2) parallel Package: Environment Settings
no_cores <- detectCores()
c1 <- makeCluster(no_cores)
invisible(clusterEvalQ(c1, library(readr)))
setDefaultCluster(c1)
# C.3) parRapply Function Application:
DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function)
suppressWarnings(stopCluster(c1))
# D) List's Tibbles Compilation into a single Tibble:
DISPOIN_CSV <- do.call(rbind, DISPOIN_CSV_List)
# E) Write Compiled Table into CSV:
write_csv(
DISPOIN_CSV,
path = file.path("D:/MySQL/R", "DISPOIN_CSV.csv"),
na = "\\N",
append = FALSE,
col_names = TRUE)
# F) Data Cleaning: Environment Variable Removal
rm(list=ls())
[[alternative HTML version deleted]]
More information about the R-help
mailing list