[R] Unable to read csv files with comma in values

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Sun Apr 7 17:56:25 CEST 2019


On 06/04/2019 10:03 a.m., Amit Govil wrote:
> Hi,
> 
> I have a bunch of csv files to read in R. I'm unable to read them correctly
> because in some of the files, there is a column ("Role") which has comma in
> the values.
> 
> Sample data:
> 
> User, Role, Rule, GAPId
> Sam, [HadoopAnalyst, DBA, Developer], R46443
> 
> I'm trying to play with the below code but it doesnt work:

Since you didn't give a reproducible example, you should at least say 
what "doesn't work" means.

But here's some general advice:  if you want to debug code, don't write 
huge expressions like the chain of functions below, put things in 
temporary variables and make sure you get what you were expecting at 
each stage.

Instead of
> 
> files <- list.files(pattern='.*REDUNDANT(.*).csv$')
> 
> tbl <- sapply(files, function(f) {
>    gsub('\\[|\\]', '"', readLines(f)) %>%
>      read.csv(text = ., check.names = FALSE)
> }) %>%
>    bind_rows(.id = "id") %>%
>    select(id, User, Rule) %>%
>    distinct()

try


files <- list.files(pattern='.*REDUNDANT(.*).csv$')

tmp1 <- sapply(files, function(f) {
   gsub('\\[|\\]', '"', readLines(f)) %>%
     read.csv(text = ., check.names = FALSE)
})

tmp2 <- tmp1 %>% bind_rows(.id = "id")

tmp3 <- tmp2 %>% select(id, User, Rule)

tbl <- tmp3 %>% distinct()

(You don't need pipes here, but it will make it easier to put the giant 
expression back together at the end.)

Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went 
wrong.

Duncan Murdoch



More information about the R-help mailing list