[R-pkg-devel] Fast Matrix Serialization in R?

Simon Urbanek @|mon@urb@nek @end|ng |rom R-project@org
Fri May 10 00:45:44 CEST 2024



> On 9/05/2024, at 11:58 PM, Vladimir Dergachev <volodya using mindspring.com> wrote:
> 
> 
> 
> On Thu, 9 May 2024, Sameh Abdulah wrote:
> 
>> Hi,
>> 
>> I need to serialize and save a 20K x 20K matrix as a binary file. This process is significantly slower in R compared to Python (4X slower).
>> 
>> I'm not sure about the best approach to optimize the below code. Is it possible to parallelize the serialization function to enhance performance?
> 
> Parallelization should not help - a single CPU thread should be able to saturate your disk or your network, assuming you have a typical computer.
> 
> The problem is possibly the conversion to text, writing it as binary should be much faster.
> 


FWIW serialize() is binary so there is no conversion to text:

> serialize(1:10+0L, NULL)
 [1] 58 0a 00 00 00 03 00 04 02 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00
[26] 00 0d 00 00 00 0a 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 04 00 00 00
[51] 05 00 00 00 06 00 00 00 07 00 00 00 08 00 00 00 09 00 00 00 0a

It uses the native representation so it is actually not as bad as it sounds.

One aspect I forgot to mention in the earlier thread is that if you don't need to exchange the serialized objects between machines with different endianness then avoiding the swap makes it faster. E.g, on Intel (which is little-endian and thus needs swapping):

> a=1:1e8/2
> system.time(serialize(a, NULL))
   user  system elapsed 
  2.123   0.468   2.661 
> system.time(serialize(a, NULL, xdr=FALSE))
   user  system elapsed 
  0.393   0.348   0.742 

Cheers,
Simon



More information about the R-package-devel mailing list