[Rd] Code Optimization: print.data.frame + as.data.frame(head(x, n = options("max.print")))

Martin Maechler m@echler @ending from @t@t@m@th@ethz@ch
Tue Jul 31 09:33:32 CEST 2018


>>>>> Juan Telleria Ruiz de Aguirre 
>>>>>     on Tue, 31 Jul 2018 08:19:33 +0200 writes:

    > I polished a little bit more the function:
    > * Used:  getOption("max.print")
    > * Added comment at the end:  cat('[ reached getOption("max.print") --
    > omitted ', omitted,' rows ]')

    > I polished a little bit more the function:

    > * Used:  getOption("max.print")
    > * Added comment at the end:  cat('[ reached getOption("max.print") --
    > omitted ', omitted,' rows ]')

and before

     > I would like to propose a simple optimization for print.data.frame
     > base function:
     > 
     > To add: x <- as.data.frame(head(x, n = options("max.print")))
     > 
     > This would prevent that, if for example, we have a 10GB data.frame
     > (e.g.: Instead of a data.table), and we accidentally print it, the R
     > Session does not "collapse", forcing us to press ESC or kill the
     > RSession.

Thank you, Juan.
You are right: The whole idea of introducing the 'max.print'
option (and the corresponding 'max' argument in print.default()
       {and print.Date() currently })
was that print() ing should not use too much resources.

and you are also right to use 'max.print' .. but R should be as
functional a language as sensible, and hence print(<data.frame>)
should be getting an argument 'max' which by default is equal to
the "max.print" option.

Also, any good citizen print() method *must* return its argument invisibly.
==> you are not supposed to change 'x' here.

But I entirely agree with your basic intuition for the problem
resolution.  Very good, thank you, indeed!

I'm currently running 'make check-all'  with the following change
to the source code (aka "patch") :

===================================================================
--- src/library/base/R/dataframe.R	(revision 75016)
+++ src/library/base/R/dataframe.R	(working copy)
@@ -1477,7 +1477,7 @@
 
 print.data.frame <-
     function(x, ..., digits = NULL, quote = FALSE, right = TRUE,
-	     row.names = TRUE)
+	     row.names = TRUE, max = NULL)
 {
     n <- length(row.names(x))
     if(length(x) == 0L) {
@@ -1489,12 +1489,19 @@
 	print.default(names(x), quote = FALSE)
 	cat(gettext("<0 rows> (or 0-length row.names)\n"))
     } else {
+	if(is.null(max)) max <- getOption("max.print", 99999L)
 	## format.<*>() : avoiding picking up e.g. format.AsIs
-	m <- as.matrix(format.data.frame(x, digits = digits, na.encode = FALSE))
+	omit <- (n0 <- max %/% length(x)) < n
+	m <- as.matrix(
+	    format.data.frame(if(omit) x[seq_len(n0), , drop=FALSE] else x,
+			      digits = digits, na.encode = FALSE))
 	if(!isTRUE(row.names))
 	    dimnames(m)[[1L]] <-
 		if(isFALSE(row.names)) rep.int("", n) else row.names
 	print(m, ..., quote = quote, right = right)
+	if(omit)
+	    cat(" [ reached 'max' / getOption(\"max.print\") -- omitted",
+		n - n0, "rows ]\n")
     }
     invisible(x)
 }



More information about the R-devel mailing list