[Rd] Problem in scan() (PR#4128)
Paul.Bayer at gleichsam.de
Paul.Bayer at gleichsam.de
Thu Sep 11 23:07:44 MEST 2003
Full_Name: Paul Bayer
Version: 1.7.1
OS: Windows + Linux
Submission from: (NULL) (217.235.105.54)
I tried to read some large csv-files into R (30 - 100MB).
with scan(), skipping not needed columns by NULL-elements in
"what".
When these skipped elements are quoted strings with commas inside,
R interprets each such quoted comma as element separator
leading to wrong records in the rest of the line.
A little test will show what I mean. I have the following "test.csv":
"col.A","col.B","col.C","col.D"
1,"quoted string","again, again again",123
2,"nice quotes, isnt it","you got it",456
First I read all elements:
> tst <- scan("test.csv", what=list(a=0,b="",c="",d=0), sep=",", skip=1)
Read 2 records
> tst
$a
[1] 1 2
$b
[1] "quoted string" "nice quotes, isnt it"
$c
[1] "again, again again" "you got it"
$d
[1] 123 456
Everything is fine. Then I try to skip the 2nd column by giving b=NULL:
> tst <- scan("test.csv", what=list(a=0,b=NULL,c="",d=0), sep=",", skip=1)
Read 2 records
Warning message:
number of items read is not a multiple of the number of columns
> tst
$a
[1] 1 2
$b
NULL
$c
[1] "again, again again" " isnt it,you got it,456\n\n\n"
$d
[1] 123 NA
>
I got garbage.
More information about the R-devel
mailing list