(PR#4955) [Rd] read.table leaves out data when reading multiple-line records

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Nov 11 08:58:23 MET 2003


That isn't a file in table format, and so is not supported.

Use scan if you want multi-line records.


On Wed, 5 Nov 2003 joehl at gmx.de wrote:

> 
> 
> Dear all,
> 
> I discovered that read.table (RW1.8.0) leaves out data when reading
> multiple-line records.
> 
> Replication code at the end
> 
> Best regards
> 
> 
> Jens Oehlschlägel
> 
> 
> > filename <- "c:/tmp/c2.csv"
> > 
> > data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted
> simpleline"'))
> > 
> > #look at the data
> > write.table(data, sep=",", row.names=FALSE)
> "a","b"
> "c","d"
> "e
> newline","\"quoted simpleline\""
> > 
> > # write it out
> > write.table(data, sep=",", row.names=FALSE, file=filename)
> > 
> > # reading it in a line is missing
> > read.csv(filename)
>            a                     b
> 1 e\nnewline \\quoted simpleline\\
> > 
> > fc <- file(filename, open="r")
> > 
> > # the problem seems to be
> > # readTableHead erroneously counts 3 lines as 4
> > lines <- .Internal(readTableHead(fc, 4, "", TRUE))
> > lines
> [1] "\"a\",\"b\""                             "\"c\",\"d\""                 
>            "\"e"                                    
> [4] "newline\",\"\\\"quoted simpleline\\\"\""
> > 
> > # double pushback is fine
> > pushBack(c(lines,lines), fc)
> > 
> > # but nlines tells us we had 4 lines, which in fact are only 3
> > nlines <- length(lines)
> > nlines
> [1] 4
> > 
> > # and the first scan eats up more than the first pushback
> > scan(fc, what="string", sep=",", nlines=nlines)
> Read 8 items
> [1] "a"                     "b"                     "c"                    
> "d"                     "e\nnewline"           
> [6] "\\quoted simpleline\\" "a"                     "b"                    
> > 
> > # thus the real scan misses data
> > scan(fc, what="string", sep=",")
> Read 4 items
> [1] "c"                     "d"                     "e\nnewline"           
> "\\quoted simpleline\\"
> > 
> > close(fc)
> > 
> > version
>          _              
> platform i386-pc-mingw32
> arch     i386           
> os       mingw32        
> system   i386, mingw32  
> status                  
> major    1              
> minor    8.0            
> year     2003           
> month    10             
> day      08             
> language R
> 
> 
> 
> 
> filename <- "c:/tmp/c2.csv"
> 
> data <- data.frame(a=c("c", "e\nnewline"), b=c("d", '"quoted simpleline"'))
> 
> #look at the data
> write.table(data, sep=",", row.names=FALSE)
> 
> # write it out
> write.table(data, sep=",", row.names=FALSE, file=filename)
> 
> # reading it in a line is missing	
> read.csv(filename)
> 
> fc <- file(filename, open="r")
> 
> # the problem seems to be
> # readTableHead erroneously counts 3 lines as 4
> lines <- .Internal(readTableHead(fc, 4, "", TRUE))
> lines
> 
> # double pushback is fine
> pushBack(c(lines,lines), fc)
> 
> # but nlines tells us we had 4 lines, which in fact are only 3
> nlines <- length(lines)
> nlines
> 
> # and the first scan eats up more than the first pushback
> scan(fc, what="string", sep=",", nlines=nlines)
> 
> # thus the real scan misses data
> scan(fc, what="string", sep=",")
> 
> close(fc)
> 
> version
> 
> 
> --
> 
> ______________________________________________
> R-devel at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-devel
> 
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list