[Rd] write.table with row.names=FALSE unnecessarily slow?

Martin Morgan mtmorgan at fhcrc.org
Mon Mar 10 19:01:25 CET 2008


write.table with large data frames takes quite a long time

> system.time({
+     write.table(df, '/tmp/dftest.txt', row.names=FALSE)
+ }, gcFirst=TRUE)
   user  system elapsed 
 97.302   1.532  98.837 

A reason is because dimnames is always called, causing 'anonymous' row
names to be created as character vectors. Avoiding this in
src/library/utils, along the lines of

Index: write.table.R
===================================================================
--- write.table.R	(revision 44717)
+++ write.table.R	(working copy)
@@ -27,13 +27,18 @@
 
     if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x)
 
+    makeRownames <- is.logical(row.names) && !is.na(row.names) &&
+                    row.names==TRUE
+    makeColnames <- is.logical(col.names) && !is.na(col.names) &&
+                    col.names==TRUE
     if(is.matrix(x)) {
         ## fix up dimnames as as.data.frame would
         p <- ncol(x)
         d <- dimnames(x)
         if(is.null(d)) d <- list(NULL, NULL)
-        if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
-        if(is.null(d[[2]]) && p > 0) d[[2]] <-  paste("V", 1:p, sep="")
+        if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x))
+        if(is.null(d[[2]]) && p > 0 && makeColnames)
+            d[[2]] <-  paste("V", 1:p, sep="")
         if(is.logical(quote) && quote)
             quote <- if(is.character(x)) seq_len(p) else numeric(0)
     } else {
@@ -53,8 +58,8 @@
                 quote <- ord[quote]; quote <- quote[quote > 0]
             }
         }
-        d <- dimnames(x)
-        if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
+        d <- list(if (makeRownames==TRUE) row.names(x) else NULL,
+                  if (makeColnames==TRUE) names(x) else NULL)
         p <- ncol(x)
     }
     nocols <- p==0

improves performance at least in proportion to nrow(x):

> system.time({
+     write.table(df, '/tmp/dftest1.txt', row.names=FALSE)
+ }, gcFirst=TRUE)
   user  system elapsed 
  8.132   0.608   8.899 

Martin
-- 
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793



More information about the R-devel mailing list