[Rd] write.table with row.names=FALSE unnecessarily slow?

Martin Maechler maechler at stat.math.ethz.ch
Tue Mar 11 17:21:13 CET 2008


    MartinMo> write.table with large data frames takes quite a long time
    MartinMo> system.time({
    MartinMo> +     write.table(df, '/tmp/dftest.txt', row.names=FALSE)
    MartinMo> + }, gcFirst=TRUE)
    MartinMo> user  system elapsed 
    MartinMo> 97.302   1.532  98.837 

    MartinMo> A reason is because dimnames is always called, causing 'anonymous' row
    MartinMo> names to be created as character vectors. Avoiding this in
    MartinMo> src/library/utils, along the lines of

Thank you, Martin.

Note that we needed to fix your patch 
(for the case where the dataframe has 'matrix column'),

and I'd like to further remark that I consider
 '.... == TRUE '
to be quite ugly (or inefficient) in all circumstances.

Martin Maechler, ETH Zurich



Index: write.table.R
===================================================================
--- write.table.R	(revision 44717)
+++ write.table.R	(working copy)
@@ -27,13 +27,18 @@
 
     if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x)
 
+    makeRownames <- is.logical(row.names) && !is.na(row.names) &&
+                    row.names==TRUE
+    makeColnames <- is.logical(col.names) && !is.na(col.names) &&
+                    col.names==TRUE
     if(is.matrix(x)) {
         ## fix up dimnames as as.data.frame would
         p <- ncol(x)
         d <- dimnames(x)
         if(is.null(d)) d <- list(NULL, NULL)
-        if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
-        if(is.null(d[[2]]) && p > 0) d[[2]] <-  paste("V", 1:p, sep="")
+        if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x))
+        if(is.null(d[[2]]) && p > 0 && makeColnames)
+            d[[2]] <-  paste("V", 1:p, sep="")
         if(is.logical(quote) && quote)
             quote <- if(is.character(x)) seq_len(p) else numeric(0)
     } else {
@@ -53,8 +58,8 @@
                 quote <- ord[quote]; quote <- quote[quote > 0]
             }
         }
-        d <- dimnames(x)
-        if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
+        d <- list(if (makeRownames==TRUE) row.names(x) else NULL,
+                  if (makeColnames==TRUE) names(x) else NULL)
         p <- ncol(x)
     }
     nocols <- p==0

> improves performance at least in proportion to nrow(x):

> > system.time({
> +     write.table(df, '/tmp/dftest1.txt', row.names=FALSE)
> + }, gcFirst=TRUE)
>    user  system elapsed 
>   8.132   0.608   8.899 

> Martin
> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109

> Location: Arnold Building M2 B169
> Phone: (206) 667-2793



More information about the R-devel mailing list