[Rd] serialize() to via temporary file is heaps faster than doing	it directly (on Windows)
    Henrik Bengtsson 
    hb at stat.berkeley.edu
       
    Fri Jul 25 05:10:35 CEST 2008
    
    
  
Hi,
FYI, I just notice that on Windows (but not Linux) it is orders of
magnitude (below it's 50x) faster to serialize() and object to a
temporary file and then read it back, than to serialize to an object
directly.  This has for instance impact on how fast digest::digest()
can provide a checksum.
Example:
x <- 1:1e7;
t1 <- system.time(raw1 <- serialize(x, connection=NULL));
print(t1);
#    user  system elapsed
#   174.23  129.35  304.70  ## 5 minutes
t2 <- system.time(raw2 <- serialize2(x, connection=NULL));
print(t2);
#     user  system elapsed
#     2.19    0.18    5.72      ## 5 seconds
print(t1/t2);
#      user    system   elapsed
#   79.55708 718.61111  53.26923
stopifnot(identical(raw1, raw2));
where serialize2() is serialize():ing to file and reading the results back:
serialize2 <- function(object, connection, ...) {
  if (is.null(connection)) {
    # It is faster to serialize to a temporary file and read it back
    pathname <- tempfile();
    con <- file(pathname, open="wb");
    on.exit({
      if (!is.null(con))
        close(con);
      if (file.exists(pathname))
        file.remove(pathname);
    });
    base::serialize(object, connection=con, ...);
    close(con);
    con <- NULL;
    fileSize <- file.info(pathname)$size;
    readBin(pathname, what="raw", n=fileSize);
  } else {
    base::serialize(object, connection=connection, ...);
  }
} # serialize2()
The above benchmarking was done in a fresh R v2.7.1 session on WinXP Pro:
> sessionInfo()
R version 2.7.1 Patched (2008-06-27 r46012)
i386-pc-mingw32
locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MON
ETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
When I do the same on a Linux machine there is no difference:
> sessionInfo()
R version 2.7.1 (2008-06-23)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base
Is there an obvious reason (and an obvious fix) for this?
Cheers
Henrik
    
    
More information about the R-devel
mailing list