[Rd] Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))
Martin Maechler
m@echler @ending from @t@t@m@th@ethz@ch
Tue Jul 31 09:33:32 CEST 2018
>>>>> Juan Telleria Ruiz de Aguirre
>>>>> on Tue, 31 Jul 2018 08:19:33 +0200 writes:
> I polished a little bit more the function:
> * Used: getOption("max.print")
> * Added comment at the end: cat('[ reached getOption("max.print") --
> omitted ', omitted,' rows ]')
> I polished a little bit more the function:
> * Used: getOption("max.print")
> * Added comment at the end: cat('[ reached getOption("max.print") --
> omitted ', omitted,' rows ]')
and before
> I would like to propose a simple optimization for print.data.frame
> base function:
>
> To add: x <- as.data.frame(head(x, n = options("max.print")))
>
> This would prevent that, if for example, we have a 10GB data.frame
> (e.g.: Instead of a data.table), and we accidentally print it, the R
> Session does not "collapse", forcing us to press ESC or kill the
> RSession.
Thank you, Juan.
You are right: The whole idea of introducing the 'max.print'
option (and the corresponding 'max' argument in print.default()
{and print.Date() currently })
was that print() ing should not use too much resources.
and you are also right to use 'max.print' .. but R should be as
functional a language as sensible, and hence print(<data.frame>)
should be getting an argument 'max' which by default is equal to
the "max.print" option.
Also, any good citizen print() method *must* return its argument invisibly.
==> you are not supposed to change 'x' here.
But I entirely agree with your basic intuition for the problem
resolution. Very good, thank you, indeed!
I'm currently running 'make check-all' with the following change
to the source code (aka "patch") :
===================================================================
--- src/library/base/R/dataframe.R (revision 75016)
+++ src/library/base/R/dataframe.R (working copy)
@@ -1477,7 +1477,7 @@
print.data.frame <-
function(x, ..., digits = NULL, quote = FALSE, right = TRUE,
- row.names = TRUE)
+ row.names = TRUE, max = NULL)
{
n <- length(row.names(x))
if(length(x) == 0L) {
@@ -1489,12 +1489,19 @@
print.default(names(x), quote = FALSE)
cat(gettext("<0 rows> (or 0-length row.names)\n"))
} else {
+ if(is.null(max)) max <- getOption("max.print", 99999L)
## format.<*>() : avoiding picking up e.g. format.AsIs
- m <- as.matrix(format.data.frame(x, digits = digits, na.encode = FALSE))
+ omit <- (n0 <- max %/% length(x)) < n
+ m <- as.matrix(
+ format.data.frame(if(omit) x[seq_len(n0), , drop=FALSE] else x,
+ digits = digits, na.encode = FALSE))
if(!isTRUE(row.names))
dimnames(m)[[1L]] <-
if(isFALSE(row.names)) rep.int("", n) else row.names
print(m, ..., quote = quote, right = right)
+ if(omit)
+ cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
+ n - n0, "rows ]\n")
}
invisible(x)
}
More information about the R-devel
mailing list