[R] CSV format issues

jim holtman jholtman at gmail.com
Mon Jul 23 16:40:42 CEST 2012


try this; looks for strings of numbers with commas and quotes them:


> x <- readLines(textConnection("Time,Value
+ 32,-7,183246E-02
+ 32,05,3,469364E-02"))
> # process the data putting in quotes on scientific
> x.new1 <- gsub("(-?[0-9]+,[0-9]+E-?[0-9]+)", '"\\1"', x)
> x.new1
[1] "Time,Value"             "32,\"-7,183246E-02\""   "32,05,\"3,469364E-02\""
> # put quotes on just numbers
> x.new2 <- gsub("(-?[0-9]+,[0-9]+)(,|$)", '"\\1"\\2', x.new1)
> x.new2
[1] "Time,Value"                 "32,\"-7,183246E-02\""
"\"32,05\",\"3,469364E-02\""
> temp <- tempfile()
> writeLines(x.new2, temp)
> x.input <- read.csv(temp)
> x.input
   Time         Value
1    32 -7,183246E-02
2 32,05  3,469364E-02


On Mon, Jul 23, 2012 at 9:06 AM, Guillaume Meurice
<guillaume.meurice at igr.fr> wrote:
> Dear all,
>
> I have some encoding problem which I'm not familiar with.
> Here is the case :
> I'm read data files which can have been generated from a  computer either with global settings in french or in english.
>
> Here is an exemple ouf data file :
>
> * English output
> Time,Value
> 17,-0.0753953
> 17.05,-6.352454E-02
>
> * French output.
> Time,Value
> 32,-7,183246E-02
> 32,05,3,469364E-02
>
> In the first case, I can totally retrieve both columns, splitting each line using the comma as a separator.
> In the second case, it's impossible, since the comma (in french) is also used to separate decimal. Usually, the CSV french file format add some quote, to distinguish the comma used as column separator from comma used as decimal, like the following :
>
> Time,Value
> 32,"-7,183246E-02"
> "32,05","3,469364E-02"
>
> Since I'm expecting 2 numbers, I can set that if there is 3 comma, the first two number are to be gathered as well as the two lefting ones.
> But in case of only two comma, which number is the floating one (I know that it is the second one, but who this is a great source of bugs ...).
>
> the unix tools "file" returns :
> ===
> $ file P23_RD06_High\ Sensitivity\ DNA\ Assay_DE04103664_2012-06-27_11-57-29_Sample1.csv
> $ P23_RD06_High Sensitivity DNA Assay_DE04103664_2012-06-27_11-57-29_Sample1.csv: ASCII text, with CRLF line terminators
> ===
>
>
> Unfortunately, the raw file doesn't contains the precious quote. So sorry to bother with this question which is not totally related to R (which I'm using). Do you know if there any facilities using R to get the data in the good format ?
>
>
> Bests,
> --
> Guillaume Meurice - PhD
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list