[Rd] write.table with row.names=FALSE unnecessarily slow?
Martin Maechler
maechler at stat.math.ethz.ch
Tue Mar 11 17:21:13 CET 2008
MartinMo> write.table with large data frames takes quite a long time
MartinMo> system.time({
MartinMo> + write.table(df, '/tmp/dftest.txt', row.names=FALSE)
MartinMo> + }, gcFirst=TRUE)
MartinMo> user system elapsed
MartinMo> 97.302 1.532 98.837
MartinMo> A reason is because dimnames is always called, causing 'anonymous' row
MartinMo> names to be created as character vectors. Avoiding this in
MartinMo> src/library/utils, along the lines of
Thank you, Martin.
Note that we needed to fix your patch
(for the case where the dataframe has 'matrix column'),
and I'd like to further remark that I consider
'.... == TRUE '
to be quite ugly (or inefficient) in all circumstances.
Martin Maechler, ETH Zurich
Index: write.table.R
===================================================================
--- write.table.R (revision 44717)
+++ write.table.R (working copy)
@@ -27,13 +27,18 @@
if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x)
+ makeRownames <- is.logical(row.names) && !is.na(row.names) &&
+ row.names==TRUE
+ makeColnames <- is.logical(col.names) && !is.na(col.names) &&
+ col.names==TRUE
if(is.matrix(x)) {
## fix up dimnames as as.data.frame would
p <- ncol(x)
d <- dimnames(x)
if(is.null(d)) d <- list(NULL, NULL)
- if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
- if(is.null(d[[2]]) && p > 0) d[[2]] <- paste("V", 1:p, sep="")
+ if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x))
+ if(is.null(d[[2]]) && p > 0 && makeColnames)
+ d[[2]] <- paste("V", 1:p, sep="")
if(is.logical(quote) && quote)
quote <- if(is.character(x)) seq_len(p) else numeric(0)
} else {
@@ -53,8 +58,8 @@
quote <- ord[quote]; quote <- quote[quote > 0]
}
}
- d <- dimnames(x)
- if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
+ d <- list(if (makeRownames==TRUE) row.names(x) else NULL,
+ if (makeColnames==TRUE) names(x) else NULL)
p <- ncol(x)
}
nocols <- p==0
> improves performance at least in proportion to nrow(x):
> > system.time({
> + write.table(df, '/tmp/dftest1.txt', row.names=FALSE)
> + }, gcFirst=TRUE)
> user system elapsed
> 8.132 0.608 8.899
> Martin
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
More information about the R-devel
mailing list