[R] time of serialization

William Dunlap wdunlap at tibco.com
Sun Aug 15 20:23:39 CEST 2010


> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Saptarshi Guha
> Sent: Sunday, August 15, 2010 9:23 AM
> To: R-help at r-project.org
> Subject: [R] time of serialization
> 
> Hello,
> I have question about the overhead in lapply.
> x is a list of 3000 lists. Each of the i (1<=i<=3000) list elements is
> pair of two elements: a string vector and a data frame
> 
> x is roughly 235MB.
> 
> > gc()
> ##
> 
> > z <- system.time(y <- lapply(x,function(r){
>   system.time(serialize(r,NULL))['elapsed']
> }))
> > sum(unlist(y))
> 18.812
> > z
>    user  system elapsed
> 494.144   0.041 494.247
> 
> So, the entire lapply takes ~26 times longer than the sum of the
> individual operations.

Your test involves calling serialize(), system.time(), and `[`(),
and the anonymous function of 'r' 3000 times from lapply, so
why pick on lapply() as the culprit?  I made a 3000 long list 'x'
according to your description and tried the following experiments
(I didn't bother to put [ into the mix):

> system.time(lapply(x, function(xi)serialize(xi, NULL)))
   user  system elapsed 
   0.21    0.02    0.22 
> system.time(lapply(x, serialize, NULL))
   user  system elapsed 
   0.18    0.00    0.20 
> system.time(lapply(x, serialize, NULL))
   user  system elapsed 
   0.20    0.00    0.22 
> system.time(lapply(x, function(xi)serialize(xi, NULL)))
   user  system elapsed 
   0.19    0.00    0.20 
> system.time(lapply(x, function(xi)system.time(serialize(xi, NULL))))
   user  system elapsed 
 103.17    0.03  101.47 
> system.time(lapply(x, function(xi)system.time(1.0)))
   user  system elapsed 
 102.88    0.11  100.89 
> system.time(for(i in 1:3000)system.time(1.0))
   user  system elapsed 
  48.82    0.33   48.50 
> system.time(for(xi in x)system.time(1.0))
   user  system elapsed 
  97.06    0.35   97.70 

It looks like system.time() eats the time and the following
experiment indicates that its call to gc() eats most of its
time:

> system.time(lapply(x, function(xi)system.time(serialize(xi, NULL),
gcFirst=FALSE)))
   user  system elapsed 
   0.79    0.02    0.78 

You can use a profiled version of R to get this information
but some quick experimentation works pretty well.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com 

> 
> Have i missed something?
> 
> Regards
> Saptarshi
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 



More information about the R-help mailing list