[R] read.csv quotes within fields
Tim Howard
tghoward at gw.dec.state.ny.us
Fri Jan 25 19:46:51 CET 2013
Drat, I forgot to tell you what system I am on:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: i386-pc-mingw32/i386 (32-bit)
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
>>> Tim Howard 1/25/2013 1:42 PM >>>
I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem:
# set up a matrix, using escape-quote as the internal double quote mark.
x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))
> write.csv(x, "test.csv")
# NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '.
# here's what got written
"1","1","string one","another string"
"2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
"3","3","third row","last "" col"
# Importing test.csv works fine.
> read.csv("test.csv")
X X1 X2 X3
1 1 1 string one another string
2 2 2 quotes escaped 10' 20" 5' 30" "test string final string
3 3 3 third row last " col
# this looks good.
# now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the
# quote escaped ' \" ' as is found in my data set. Like this:
"1","1","string one","another string"
"2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
"3","3","third row","last \" col"
# this breaks read.csv:
> read.csv("test.csv")
X X1 X2 X3
1 1 1 string one another string
2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col
# we now have only two rows, with all the data captured in col2 row2
Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while!
Thank you in advance,
More information about the R-help
mailing list