[R] Print All Warnings that Occurr in All Parallel Nodes

TELLERIA RUIZ DE AGUIRRE, JUAN JTELLERIA at external.gamesacorp.com
Thu Sep 14 09:48:04 CEST 2017


Dear R Users,

I have developed the following code for importing a series of zipped CSV by parallel computing.

My problems are that:

A) Some ZIP Files (Which contain CSVs inside) are corrupted, and cannot be opened.
B) After executing parRapply I can only see the last.warning variable error, for knowing which CSV have failed in each node, but I cannot see all warnings, only 1 at a time.

So:

* For showing a list of all warnings in all nodes, I was thinking of using the following function in the code:

    warnings(DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function))

    Would this work?

* And also, How could I check that a CSV can be opened before applying the function, and create an empty data.frame for those CSV.

Thank you,
Juan


CODE
################################################################################
## DISPOIN Data Import Into MariaDB
################################################################################

## -----------------------------------------------------------------------------
## Packages
## -----------------------------------------------------------------------------

# update.packages("RODBC")
# update.packages("tidyverse")

## -----------------------------------------------------------------------------
## Libraries
## -----------------------------------------------------------------------------

suppressMessages(require(RODBC))
suppressMessages(require(tidyverse))
suppressMessages(require(parallel))

## -----------------------------------------------------------------------------
## CMD: Command for DISPOIN's Directory Acquisition
## -----------------------------------------------------------------------------

# shell(cmd = 'pushd "\\srvdiscsv\data" && dir *AL*.zip /b /s > D:\DISPOIN_Data_Directories.csv && popd')

## -----------------------------------------------------------------------------
## RODBC
## -----------------------------------------------------------------------------

## A) MariaDB Connection String

con <- odbcConnect("MariaDB_Tornado24")

invisible(sqlQuery(con, "USE dispoin;"))

# B) Import R Data Directories from MariaDB

DISPOIN_DIR_REL <- as_tibble(sqlFetch(con, "dispoin.t_DISPOIN_DIR_REL"))

odbcClose(con)

# C) Import Zipped CSV data into List of Dataframes, which latter on are compiled as a single dataframe by
#    means of rbind

  # C.1) parRapply Function Initialization:

  parRaplly_Function <- function (DISPOIN_CSV_Row)
  {
    return(read_csv2(
      file = DISPOIN_CSV_Row,
      col_names = c(
        "SCADA",
        "TAG",
        "ID_del_AEG",
        "Descripcion",
        "Time_ON",
        "Time_OFF",
        "Delta_Time",
        "Comentario",
        "Es_Alarma",
        "Es_Ultima",
        "Comentarios"),
      col_types = cols(
        "SCADA" = "c",
        "TAG" = "c",
        "ID_del_AEG" = "c",
        "Descripcion" = "c",
        "Time_ON" = "c",
        "Time_OFF" = "c",
        "Delta_Time" = "c",
        "Comentario" = "c",
        "Es_Alarma" = "c",
        "Es_Ultima" = "c",
        "Comentarios" = "c"),
      locale = default_locale(),
      na = c("", " "),
      quoted_na = TRUE,
      quote = "\"",
      comment = "",
      trim_ws = TRUE,
      skip = 0,
      n_max = Inf,
      guess_max = min(1000, n_max),
      progress = FALSE))
  }

  # C.2) parallel Package: Environment Settings

  no_cores <- detectCores()

  c1 <- makeCluster(no_cores)

  invisible(clusterEvalQ(c1, library(readr)))

  setDefaultCluster(c1)

  # C.3) parRapply Function Application:

  DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function)

  suppressWarnings(stopCluster(c1))

# D) List's Tibbles Compilation into a single Tibble:

  DISPOIN_CSV <- do.call(rbind, DISPOIN_CSV_List)

# E) Write Compiled Table into CSV:

  write_csv(
    DISPOIN_CSV,
    path = file.path("D:/MySQL/R", "DISPOIN_CSV.csv"),
    na = "\\N",
    append = FALSE,
    col_names = TRUE)

# F) Data Cleaning: Environment Variable Removal

  rm(list=ls())

	[[alternative HTML version deleted]]



More information about the R-help mailing list