[Rd] Huge performance difference between implicit and explicit print

Sean O'Riordain seanpor at acm.org
Thu Oct 31 08:46:52 CET 2013


Minor point and probably not relevant to the speed issue, but df() is
the density function for the F distribution, so I have (recently)
stopped using it for referring to data.frames.

Sean


On 30 October 2013 23:32, Gabriel Becker <gmbecker at ucdavis.edu> wrote:
> Hadley,
>
> As far as I can tell from a quick look, it is because implicit printing
> uses a different mechanism which does a fair bit more work.
>
> >From comments in  print.c in the R sources:
>
> *  print.default()  ->     do_printdefault (with call tree below)
>  *
>  *  auto-printing   ->  PrintValueEnv
>  *                      -> PrintValueRec
>  *                      -> call print() for objects
>  *  Note that auto-printing does not call print.default.
>  *  PrintValue, R_PV are similar to auto-printing.
>
> PrintValueEnv includes, among other things, checks for functions, S4
> objects, and s3 objects before constructing (in C code) an R call to print
> for S3 objects and show for S4 objects  and evaluating it using Rf_eval. So
> there is an extra trip to the R evaluator.
>
> I imagine that extra work is where the hangup is but that is a
> slightly-informed guess as I haven't done any detailed timings or checks.
>
> Basically my understanding of the processes is as follows:
>
> print(df)
> print call is evaluated, S3 dispatch happens, print.default in C is called,
> result printed to terminal, print call returns
>
> df
> expression "df" evaluated, auto-print initiated, type of object returned by
> expression is determined, print call is constructed in C code, print call
> is evaluated in C code, THEN all the stuff above happens.
>
> I dunno if that helps or not as I can't speak to how to change/fix it atm.
>
> ~G
>
>
>
> On Wed, Oct 30, 2013 at 3:22 PM, Hadley Wickham <h.wickham at gmail.com> wrote:
>
>> Hi all,
>>
>> Can anyone help me understand why an implicit print (i.e. just typing
>> df at the console), is so much slower than an explicit print (i.e.
>> print(df)) in the example below?  I see the difference in both Rstudio
>> and in a terminal.
>>
>> # Construct large df as quickly as possible
>> dummy <- 1:18e6
>> df <- lapply(1:10, function(x) dummy)
>> names(df) <- letters[1:10]
>> class(df) <- c("myobj", "data.frame")
>> attr(df, "row.names") <- .set_row_names(18e6)
>>
>> print.myobj <- function(x, ...) {
>>   print.data.frame(head(x, 2))
>> }
>>
>> start <- proc.time(); df; flush.console(); proc.time() - start
>> #  user  system elapsed
>> # 0.408   0.557   0.965
>> start <- proc.time(); print(df); flush.console(); proc.time() - start
>> #  user  system elapsed
>> # 0.019   0.002   0.020
>>
>> sessionInfo()
>> # R version 3.0.2 (2013-09-25)
>> # Platform: x86_64-apple-darwin10.8.0 (64-bit)
>> #
>> # locale:
>> # [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>> #
>> # attached base packages:
>> # [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> Thanks!
>>
>> Hadley
>>
>> --
>> Chief Scientist, RStudio
>> http://had.co.nz/
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> --
> Gabriel Becker
> Graduate Student
> Statistics Department
> University of California, Davis
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list