[R] read.csv : double quoted numbers

Aval Sarri aval.sarri at gmail.com
Wed Aug 20 20:27:27 CEST 2008


Hello;

I am new user of R; so pardon me.

I am reading a .txt file that has around 50+ numeric columns with '\t'
as separator. I am using read.csv function along with colClasses but
that fails to recognize double quoted numeric values. (My numeric
values are something like "1,001.23"; "1,008,000.456".)   Basically
read.csv fails with  - "scan() expected 'a real', got '"1,044.059"'.

What I have tried and problems with them:


1) I tried  scan and pipe but getting following error message; that is
how do I replace all double quotes with nothing. I tired enclosing sed
command in single quotes but that does not help.
(Though the sed command works from shell)

scan(pipe("sed -e s/\"//g DataAll.txt"), sep="\t")
sh: Syntax error: Unterminated quoted string

2) On mailing list on solution I found was setAs() described here
http://www.nabble.com/Re%3A--R--read.table()-and-scientific-notation-p6734890.html

3) Other than using as.is=TRUE and then doing as.numeric for numeric
columns what is the solution?  But then how do I efficiently convert
50+ columns to numeric using regular expression? That is all my
numeric columns name starts with 'X' character, so how do I use sapply
and/or regular expression to convert all columns starting with X to
numeric? What is the alternate method to do so?

Basically 2 and 3 works but which one is efficient and correct way to do this.

(Also what is most efficient way to apply field level validation and
conversion while reading a file? Does one has to read the file and
only after that validation and conversion can happen?)

Thanks for taking out time to read through the mail.

Thanks and Regards
-Aval



More information about the R-help mailing list