[Rd] Slow 'read.table' in R 1.4.0 (PR#1232)
ripley@stats.ox.ac.uk
ripley@stats.ox.ac.uk
Sat, 29 Dec 2001 22:25:39 +0100 (MET)
As I have told you privately several days ago,
*This has already been fixed in R-patched*.
On Sat, 29 Dec 2001 james.holtman@convergys.com wrote:
> The 'read.table' function appears to be up to 10X slower in R 1.4.0 than R
> 1.3.1 for some of the data sets I read in. I was comparing the source code
> for the 2 versions and see that it was rewritten in R 1.4.0.
>
> I think I found out what part of the problem might be. I was comparing
> R1.3.1 and R1.4.0 code and it appears that a statement is missing in some
> of the code for R 1.4. This is the section of code at the beginning of
> read.table. The loop starting with 'while (nlines < 5)' will read in the
> entire file, because there is no increment of 'nlines' in the loop. I
> traced through the code and this is what was happening. It then does a
> 'pushBack' of the entire file. In tracing through the code, this is where
> is appears to be taking the time. With the change noted below, the speed
> was similar to R 1.3.1 and the results were the same.
>
> Here is the current code with what I think is the additional statement
> needed:
>
> =================part of read.table========
>
> nlines <- 0
> lines <- NULL
> while (nlines < 5) {
> line <- readLines(file, 1, ok = TRUE)
> if (length(line) == 0)
> break
> if (blank.lines.skip && length(grep("^[ \\t]*$", line)))
> next
> if (length(comment.char) && nchar(comment.char)) {
> pattern <- paste("^[ \\t]*", substring(comment.char,
> 1, 1), sep = "")
> if (length(grep(pattern, line)))
> next
> }
> lines <- c(lines, line)
> #
> # additional line required
> #
> nlines <- nlines+1
> }
> nlines <- length(lines)
> if (!nlines) {
> if (missing(col.names))
> stop("no lines available in input")
> else {
> tmp <- vector("list", length(col.names))
> names(tmp) <- col.names
> class(tmp) <- "data.frame"
> return(tmp)
> }
> }
> if (all(nchar(lines) == 0))
> stop("empty beginning of file")
> pushBack(c(lines, lines), file)
>
> --
>
> NOTICE: The information contained in this electronic mail transmission is
> intended by Convergys Corporation for the use of the named individual or
> entity to which it is directed and may contain information that is
> privileged or otherwise confidential. If you have received this electronic
> mail transmission in error, please delete it from your system without
> copying or forwarding it, and notify the sender of the error by reply email
> or by telephone (collect), so that the sender's address records can be
> corrected.
>
>
>
> -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
--
Brian D. Ripley, ripley@stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272860 (secr)
Oxford OX1 3TG, UK Fax: +44 1865 272595
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._