[R-pkg-devel] Fast Matrix Serialization in R?
Vladimir Dergachev
vo|ody@ @end|ng |rom m|nd@pr|ng@com
Thu May 9 13:58:43 CEST 2024
On Thu, 9 May 2024, Sameh Abdulah wrote:
> Hi,
>
> I need to serialize and save a 20K x 20K matrix as a binary file. This process is significantly slower in R compared to Python (4X slower).
>
> I'm not sure about the best approach to optimize the below code. Is it possible to parallelize the serialization function to enhance performance?
Parallelization should not help - a single CPU thread should be able to
saturate your disk or your network, assuming you have a typical computer.
The problem is possibly the conversion to text, writing it as binary
should be much faster.
To add to other suggestions, you might want to try my package "RMVL" -
aside from fast writes, it also gives you ability to share data between
ultimate users of the package.
best
Vladimir Dergachev
PS Example:
library("RMVL")
M<-mvl_open("test1.mvl", append=TRUE, create=TRUE)
n <- 20000^2
cat("Generating matrices ... ")
INI.TIME <- proc.time()
A <- matrix(runif(n), ncol = m)
END_GEN.TIME <- proc.time()
mvl_write(M, A, name="A")
mvl_close(M)
END_SER.TIME <- proc.time()
# Use in another script:
library("RMVL")
M2<-mvl_open("test1.mvl")
print(M2$A[1:10, 1:10])
More information about the R-package-devel
mailing list