[R] Spark DataFrame: replace NULL cell by NA

Karim Mezhoud kmezhoud @ending from gm@il@com
Sun Dec 9 22:06:51 CET 2018

Dear All,
## function to relpace empty cell by NA
empty_as_na <- function(x){
  if("factor" %in% class(x)) x <- as.character(x) ## since ifelse wont work
with factors
  ifelse(as.character(x)!="", x, NA)

## connect to spark local
sc <- spark_connect(master = "local")
# load an example of dataframe taht has empty cells (needs cgdsr package)
clinicalData <- cgdsr::getClinicalData(cgds, "gbm_tcga_pub_all")
## copy to spark
clinicalData_tbl <- dplyr::copy_to(sc, clinicalData, overwrite = TRUE)

 # This works
clinicalData %>% mutate_all(funs(empty_as_na))
# This Does not works
clinicalData_tbl %>% mutate_all(funs(empty_as_na))

