[Rd] serialize/unserialize vector improvement

Prof Brian Ripley ripley at stats.ox.ac.uk
Sun Jan 22 14:56:01 CET 2012


This has languished for a long time, and we should make a decision 
before FF for 2.15.0.

It seems to me that in so far as there is a problem, it is that we 
serialize via XDR, and that since that was invented little-endian CPUs 
have taken over the world.  So for the only cases I can imagine this is 
really a problem (passing objects in 'parallel'/snow ... contexts) a 
better answer might be to pass without byte-reordering: go back to the 
RDB format which was exposed for save() but AFAIK never for serialize.

I would say Sparc is the only big-endian platform left (some PPC Mac 
users may disagree), so little-endian really does rule.

Brian

On 03/10/2011 14:28, luke-tierney at uiowa.edu wrote:
> It's on my list to look at but I may not get to it for a couple of
> weeks. Someone else may get there earlier.
>
> Best,
>
> luke
>
> On Mon, 3 Oct 2011, Michael Spiegel wrote:
>
>> Any thoughts? I haven't heard any feedback on this patch.
>>
>> Thanks!
>> --Michael
>>
>> On Wed, Sep 28, 2011 at 3:10 PM, Michael Spiegel
>> <michael.m.spiegel at gmail.com> wrote:
>>> Hi folks,
>>>
>>> I've attached a patch to the svn trunk that improves the performance
>>> of the serialize/unserialize interface for vector types. The current
>>> implementation: a) invokes the R_XDREncode operation for each element
>>> of the vector type, and b) uses a switch statement to determine the
>>> stream type for each element of the vector type. I've added
>>> R_XDREncodeVector/R_XDRDecodeVector functions that accept N elements
>>> at a time, and I've reorganized the implementation so that the stream
>>> type is not queried once per element.
>>>
>>> In the following microbenchmark (below), I've observed performance
>>> improvements of about x2.4.  In a real benchmark that is using the
>>> serialization interface to make MPI calls, I see about a 10%
>>> improvement in performance.
>>>
>>> Cheers,
>>> --Michael
>>>
>>> microbenchmark:
>>>
>>> input <- matrix(1:100000000, 10000, 10000)
>>> output <- serialize(input, NULL)
>>> for(i in 1:10) { print(system.time(serialize(input, NULL))) }
>>> for(i in 1:10) { print(system.time(unserialize(output))) }
>>>
>>
>> ______________________________________________
>> R-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-devel mailing list