(PR#4955) [Rd] read.table leaves out data when reading
multiple-line records
Prof Brian Ripley
ripley at stats.ox.ac.uk
Tue Nov 11 08:58:23 MET 2003
That isn't a file in table format, and so is not supported.
Use scan if you want multi-line records.
On Wed, 5 Nov 2003 joehl at gmx.de wrote:
>
>
> Dear all,
>
> I discovered that read.table (RW1.8.0) leaves out data when reading
> multiple-line records.
>
> Replication code at the end
>
> Best regards
>
>
> Jens Oehlschlägel
>
>
> > filename <- "c:/tmp/c2.csv"
> >
> > data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted
> simpleline"'))
> >
> > #look at the data
> > write.table(data, sep=",", row.names=FALSE)
> "a","b"
> "c","d"
> "e
> newline","\"quoted simpleline\""
> >
> > # write it out
> > write.table(data, sep=",", row.names=FALSE, file=filename)
> >
> > # reading it in a line is missing
> > read.csv(filename)
> a b
> 1 e\nnewline \\quoted simpleline\\
> >
> > fc <- file(filename, open="r")
> >
> > # the problem seems to be
> > # readTableHead erroneously counts 3 lines as 4
> > lines <- .Internal(readTableHead(fc, 4, "", TRUE))
> > lines
> [1] "\"a\",\"b\"" "\"c\",\"d\""
> "\"e"
> [4] "newline\",\"\\\"quoted simpleline\\\"\""
> >
> > # double pushback is fine
> > pushBack(c(lines,lines), fc)
> >
> > # but nlines tells us we had 4 lines, which in fact are only 3
> > nlines <- length(lines)
> > nlines
> [1] 4
> >
> > # and the first scan eats up more than the first pushback
> > scan(fc, what="string", sep=",", nlines=nlines)
> Read 8 items
> [1] "a" "b" "c"
> "d" "e\nnewline"
> [6] "\\quoted simpleline\\" "a" "b"
> >
> > # thus the real scan misses data
> > scan(fc, what="string", sep=",")
> Read 4 items
> [1] "c" "d" "e\nnewline"
> "\\quoted simpleline\\"
> >
> > close(fc)
> >
> > version
> _
> platform i386-pc-mingw32
> arch i386
> os mingw32
> system i386, mingw32
> status
> major 1
> minor 8.0
> year 2003
> month 10
> day 08
> language R
>
>
>
>
> filename <- "c:/tmp/c2.csv"
>
> data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted simpleline"'))
>
> #look at the data
> write.table(data, sep=",", row.names=FALSE)
>
> # write it out
> write.table(data, sep=",", row.names=FALSE, file=filename)
>
> # reading it in a line is missing
> read.csv(filename)
>
> fc <- file(filename, open="r")
>
> # the problem seems to be
> # readTableHead erroneously counts 3 lines as 4
> lines <- .Internal(readTableHead(fc, 4, "", TRUE))
> lines
>
> # double pushback is fine
> pushBack(c(lines,lines), fc)
>
> # but nlines tells us we had 4 lines, which in fact are only 3
> nlines <- length(lines)
> nlines
>
> # and the first scan eats up more than the first pushback
> scan(fc, what="string", sep=",", nlines=nlines)
>
> # thus the real scan misses data
> scan(fc, what="string", sep=",")
>
> close(fc)
>
> version
>
>
> --
>
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list