[Rd] Slow 'read.table' in R 1.4.0 (PR#1232)
james.holtman@convergys.com
james.holtman@convergys.com
Sat, 29 Dec 2001 21:29:28 +0100 (MET)
The 'read.table' function appears to be up to 10X slower in R 1.4.0 than R
1.3.1 for some of the data sets I read in. I was comparing the source code
for the 2 versions and see that it was rewritten in R 1.4.0.
I think I found out what part of the problem might be. I was comparing
R1.3.1 and R1.4.0 code and it appears that a statement is missing in some
of the code for R 1.4. This is the section of code at the beginning of
read.table. The loop starting with 'while (nlines < 5)' will read in the
entire file, because there is no increment of 'nlines' in the loop. I
traced through the code and this is what was happening. It then does a
'pushBack' of the entire file. In tracing through the code, this is where
is appears to be taking the time. With the change noted below, the speed
was similar to R 1.3.1 and the results were the same.
Here is the current code with what I think is the additional statement
needed:
=================part of read.table========
nlines <- 0
lines <- NULL
while (nlines < 5) {
line <- readLines(file, 1, ok = TRUE)
if (length(line) == 0)
break
if (blank.lines.skip && length(grep("^[ \\t]*$", line)))
next
if (length(comment.char) && nchar(comment.char)) {
pattern <- paste("^[ \\t]*", substring(comment.char,
1, 1), sep = "")
if (length(grep(pattern, line)))
next
}
lines <- c(lines, line)
#
# additional line required
#
nlines <- nlines+1
}
nlines <- length(lines)
if (!nlines) {
if (missing(col.names))
stop("no lines available in input")
else {
tmp <- vector("list", length(col.names))
names(tmp) <- col.names
class(tmp) <- "data.frame"
return(tmp)
}
}
if (all(nchar(lines) == 0))
stop("empty beginning of file")
pushBack(c(lines, lines), file)
--
NOTICE: The information contained in this electronic mail transmission is
intended by Convergys Corporation for the use of the named individual or
entity to which it is directed and may contain information that is
privileged or otherwise confidential. If you have received this electronic
mail transmission in error, please delete it from your system without
copying or forwarding it, and notify the sender of the error by reply email
or by telephone (collect), so that the sender's address records can be
corrected.
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._