[R] reading csv files
Jim Lemon
jim at bitwrit.com.au
Fri Feb 5 23:29:50 CET 2010
On 02/06/2010 12:57 AM, Barry Rowlingson wrote:
> On Fri, Feb 5, 2010 at 10:23 AM, analyst41 at hotmail.com
> <analyst41 at hotmail.com> wrote:
>> the csv files are downloaded from a database and it looks like some
>> character fields contain the CR-LF sequence within them.
>>
>> This causes R to see a new record/row and the number of rows it sees
>> is different (usually higher) from the number of rows actually
>> extracted.
>
> Hard to tell without an example, but I just tried this in a file:
>
> 1,2,"this
> is a test",99
> 2,3,"oneliner",45
>
> and:
>
>> read.table("test.csv",sep=",")
> V1 V2 V3 V4
> 1 1 2 this\nis a test 99
> 2 2 3 oneliner 45
>
> seemed to work. But if your strings aren't "quoted" (hard to tell
> without an example) then you might have to find another way. Hard to
> tell without an example.
>
Maybe the database output looks like this:
1,2,this
is a test,99
2,3,oneliner,45
in which case:
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
line 1 did not have 4 elements
However, if we try:
read.csv("test.csv",header=FALSE)
V1 V2 V3 V4
1 1 2 this NA
2 is a test 99 NA
3 2 3 oneliner 45
If you can determine whether the embedded EOLs are different from those
at the end of a record, you could do a global replace on the input file
for the embedded EOLs to some character that isn't used (e.g. ~ or |) in
the input file. I'll leave the syntax to the regexperts.
Jim
More information about the R-help
mailing list