[Rd] write.table with row.names=FALSE unnecessarily slow?
Martin Morgan
mtmorgan at fhcrc.org
Mon Mar 10 19:07:54 CET 2008
I neglected to include my test case,
> df <- data.frame(x=1:(10^7))
Martin
Martin Morgan <mtmorgan at fhcrc.org> writes:
> write.table with large data frames takes quite a long time
>
>> system.time({
> + write.table(df, '/tmp/dftest.txt', row.names=FALSE)
> + }, gcFirst=TRUE)
> user system elapsed
> 97.302 1.532 98.837
>
> A reason is because dimnames is always called, causing 'anonymous' row
> names to be created as character vectors. Avoiding this in
> src/library/utils, along the lines of
>
> Index: write.table.R
> ===================================================================
> --- write.table.R (revision 44717)
> +++ write.table.R (working copy)
> @@ -27,13 +27,18 @@
>
> if(!is.data.frame(x) && !is.matrix(x)) x <- data.frame(x)
>
> + makeRownames <- is.logical(row.names) && !is.na(row.names) &&
> + row.names==TRUE
> + makeColnames <- is.logical(col.names) && !is.na(col.names) &&
> + col.names==TRUE
> if(is.matrix(x)) {
> ## fix up dimnames as as.data.frame would
> p <- ncol(x)
> d <- dimnames(x)
> if(is.null(d)) d <- list(NULL, NULL)
> - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
> - if(is.null(d[[2]]) && p > 0) d[[2]] <- paste("V", 1:p, sep="")
> + if (is.null(d[[1]]) && makeRownames) d[[1]] <- seq_len(nrow(x))
> + if(is.null(d[[2]]) && p > 0 && makeColnames)
> + d[[2]] <- paste("V", 1:p, sep="")
> if(is.logical(quote) && quote)
> quote <- if(is.character(x)) seq_len(p) else numeric(0)
> } else {
> @@ -53,8 +58,8 @@
> quote <- ord[quote]; quote <- quote[quote > 0]
> }
> }
> - d <- dimnames(x)
> - if(is.null(d[[1]])) d[[1]] <- seq_len(nrow(x))
> + d <- list(if (makeRownames==TRUE) row.names(x) else NULL,
> + if (makeColnames==TRUE) names(x) else NULL)
> p <- ncol(x)
> }
> nocols <- p==0
>
> improves performance at least in proportion to nrow(x):
>
>> system.time({
> + write.table(df, '/tmp/dftest1.txt', row.names=FALSE)
> + }, gcFirst=TRUE)
> user system elapsed
> 8.132 0.608 8.899
>
> Martin
> --
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
>
> Location: Arnold Building M2 B169
> Phone: (206) 667-2793
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
More information about the R-devel
mailing list