[R-pkg-devel] Fast Matrix Serialization in R?

Vladimir Dergachev vo|ody@ @end|ng |rom m|nd@pr|ng@com
Thu May 9 13:58:43 CEST 2024



On Thu, 9 May 2024, Sameh Abdulah wrote:

> Hi,
>
> I need to serialize and save a 20K x 20K matrix as a binary file. This process is significantly slower in R compared to Python (4X slower).
>
> I'm not sure about the best approach to optimize the below code. Is it possible to parallelize the serialization function to enhance performance?

Parallelization should not help - a single CPU thread should be able to 
saturate your disk or your network, assuming you have a typical computer.

The problem is possibly the conversion to text, writing it as binary 
should be much faster.

To add to other suggestions, you might want to try my package "RMVL" - 
aside from fast writes, it also gives you ability to share data between 
ultimate users of the package.

best

Vladimir Dergachev

PS Example:

library("RMVL")

M<-mvl_open("test1.mvl", append=TRUE, create=TRUE)

n <- 20000^2
cat("Generating matrices ... ")
INI.TIME <- proc.time()
A <- matrix(runif(n), ncol = m)
END_GEN.TIME <- proc.time()

mvl_write(M, A, name="A")

mvl_close(M)

END_SER.TIME <- proc.time()


# Use in another script:

library("RMVL")

M2<-mvl_open("test1.mvl")

print(M2$A[1:10, 1:10])



More information about the R-package-devel mailing list