[R] read.csv quotes within fields

Tim Howard tghoward at gw.dec.state.ny.us
Fri Jan 25 19:46:51 CET 2013


Drat, I forgot to tell you what system I am on:
 
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    
 
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     


>>> Tim Howard 1/25/2013 1:42 PM >>>
All,
 
I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem:
 
# set up a matrix, using escape-quote as the internal double quote mark.
 
x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))
 
> write.csv(x, "test.csv")
 
# NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '. 
# here's what got written
 
"","X1","X2","X3"
"1","1","string one","another string"
"2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
"3","3","third row","last "" col"
 
# Importing test.csv works fine.
 
> read.csv("test.csv")
  X X1                                         X2             X3
1 1  1                                 string one another string
2 2  2 quotes escaped 10' 20" 5' 30" "test string   final string
3 3  3                                  third row     last " col
# this looks good. 
# now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the 
# quote escaped ' \" ' as is found in my data set. Like this:

"","X1","X2","X3"
"1","1","string one","another string"
"2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
"3","3","third row","last \" col"
 
# this breaks read.csv:
 
> read.csv("test.csv")
  X X1                                                                                    X2             X3
1 1  1                                                                            string one another string
2 2  2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col      
 
# we now have only two rows, with all the data captured in col2 row2
 
Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while!
 
Thank you in advance, 
Tim
 
 
 
 


More information about the R-help mailing list