[R] reading csv files

Jim Lemon jim at bitwrit.com.au
Fri Feb 5 23:29:50 CET 2010

On 02/06/2010 12:57 AM, Barry Rowlingson wrote:
> On Fri, Feb 5, 2010 at 10:23 AM, analyst41 at hotmail.com
> <analyst41 at hotmail.com>  wrote:
>> the csv files are downloaded from a database and it looks like some
>> character fields contain the CR-LF sequence within them.
>> This causes R to see a new record/row and the number of rows it sees
>> is different (usually higher) from the number of rows actually
>> extracted.
>   Hard to tell without an example, but I just tried this in a file:
> 1,2,"this
> is a test",99
> 2,3,"oneliner",45
> and:
>> read.table("test.csv",sep=",")
>    V1 V2              V3 V4
> 1  1  2 this\nis a test 99
> 2  2  3        oneliner 45
> seemed to work. But if your strings aren't "quoted" (hard to tell
> without an example) then you might have to find another way. Hard to
> tell without an example.
Maybe the database output looks like this:

is a test,99

in which case:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, 
na.strings,  :
   line 1 did not have 4 elements

However, if we try:

          V1 V2       V3 V4
1         1  2     this NA
2 is a test 99          NA
3         2  3 oneliner 45

If you can determine whether the embedded EOLs are different from those 
at the end of a record, you could do a global replace on the input file 
for the embedded EOLs to some character that isn't used (e.g. ~ or |) in 
the input file. I'll leave the syntax to the regexperts.


More information about the R-help mailing list