[R] time of serialization
William Dunlap
wdunlap at tibco.com
Sun Aug 15 20:23:39 CEST 2010
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Saptarshi Guha
> Sent: Sunday, August 15, 2010 9:23 AM
> To: R-help at r-project.org
> Subject: [R] time of serialization
>
> Hello,
> I have question about the overhead in lapply.
> x is a list of 3000 lists. Each of the i (1<=i<=3000) list elements is
> pair of two elements: a string vector and a data frame
>
> x is roughly 235MB.
>
> > gc()
> ##
>
> > z <- system.time(y <- lapply(x,function(r){
> system.time(serialize(r,NULL))['elapsed']
> }))
> > sum(unlist(y))
> 18.812
> > z
> user system elapsed
> 494.144 0.041 494.247
>
> So, the entire lapply takes ~26 times longer than the sum of the
> individual operations.
Your test involves calling serialize(), system.time(), and `[`(),
and the anonymous function of 'r' 3000 times from lapply, so
why pick on lapply() as the culprit? I made a 3000 long list 'x'
according to your description and tried the following experiments
(I didn't bother to put [ into the mix):
> system.time(lapply(x, function(xi)serialize(xi, NULL)))
user system elapsed
0.21 0.02 0.22
> system.time(lapply(x, serialize, NULL))
user system elapsed
0.18 0.00 0.20
> system.time(lapply(x, serialize, NULL))
user system elapsed
0.20 0.00 0.22
> system.time(lapply(x, function(xi)serialize(xi, NULL)))
user system elapsed
0.19 0.00 0.20
> system.time(lapply(x, function(xi)system.time(serialize(xi, NULL))))
user system elapsed
103.17 0.03 101.47
> system.time(lapply(x, function(xi)system.time(1.0)))
user system elapsed
102.88 0.11 100.89
> system.time(for(i in 1:3000)system.time(1.0))
user system elapsed
48.82 0.33 48.50
> system.time(for(xi in x)system.time(1.0))
user system elapsed
97.06 0.35 97.70
It looks like system.time() eats the time and the following
experiment indicates that its call to gc() eats most of its
time:
> system.time(lapply(x, function(xi)system.time(serialize(xi, NULL),
gcFirst=FALSE)))
user system elapsed
0.79 0.02 0.78
You can use a profiled version of R to get this information
but some quick experimentation works pretty well.
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
>
> Have i missed something?
>
> Regards
> Saptarshi
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
More information about the R-help
mailing list