[Rd] Problem in scan() (PR#4128)
Prof Brian Ripley
ripley at stats.ox.ac.uk
Thu Sep 11 22:41:35 MEST 2003
Quotes are only interpreted in character columns (scan.c line 240), and
NULL is not character. So this was intentional.
If you would like this changed, please supply a patch (which looks to be
a good exercise).
On Thu, 11 Sep 2003 Paul.Bayer at gleichsam.de wrote:
> Full_Name: Paul Bayer
> Version: 1.7.1
> OS: Windows + Linux
> Submission from: (NULL) (217.235.105.54)
>
>
> I tried to read some large csv-files into R (30 - 100MB).
> with scan(), skipping not needed columns by NULL-elements in
> "what".
>
> When these skipped elements are quoted strings with commas inside,
> R interprets each such quoted comma as element separator
> leading to wrong records in the rest of the line.
>
> A little test will show what I mean. I have the following "test.csv":
>
> "col.A","col.B","col.C","col.D"
> 1,"quoted string","again, again again",123
> 2,"nice quotes, isnt it","you got it",456
>
> First I read all elements:
>
> > tst <- scan("test.csv", what=list(a=0,b="",c="",d=0), sep=",", skip=1)
> Read 2 records
> > tst
> $a
> [1] 1 2
>
> $b
> [1] "quoted string" "nice quotes, isnt it"
>
> $c
> [1] "again, again again" "you got it"
>
> $d
> [1] 123 456
>
> Everything is fine. Then I try to skip the 2nd column by giving b=NULL:
>
> > tst <- scan("test.csv", what=list(a=0,b=NULL,c="",d=0), sep=",", skip=1)
> Read 2 records
> Warning message:
> number of items read is not a multiple of the number of columns
> > tst
> $a
> [1] 1 2
>
> $b
> NULL
>
> $c
> [1] "again, again again" " isnt it,you got it,456\n\n\n"
>
> $d
> [1] 123 NA
>
> >
>
> I got garbage.
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list