[Rd] dgTMatrix Segmentation Fault
Sokol Serguei
@oko| @end|ng |rom |n@@-tou|ou@e@|r
Mon Jun 7 10:00:13 CEST 2021
Le 07/06/2021 à 09:00, Dario Strbenac a écrit :
> Good day,
>
> I notice that summing rows of a large dgTMatrix fails.
>
> library(Matrix)
> aMatrix <- new("dgTMatrix",
> i = as.integer(sample(200000, 10000)-1), j = as.integer(sample(50000, 10000)-1), x = rnorm(10000),
> Dim = c(200000L, 50000L)
> )
> totals <- rowSums(aMatrix == 0) # Segmentation fault.
On my R v4.1 (Ubuntu 18), I don't have a segfault but I do have an error
message:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'x' in selecting a method for
function 'rowSums': cannot allocate vector of size 372.5 Gb
And the reason for this is quite clear: an intermediate logical matrix
'aMatrix == 0' is almost dense thus having 200000L*50000L - 10000L non
zero entries. It is a little bit too much ;) for my modest laptop. So I
can propose a workaround:
totals <- 50000 - rowSums(aMatrix != 0)
Hoping it helps.
Best,
Serguei.
>
> The server has 768 GB of RAM and it was never close to being consumed by this. Converting it to an ordinary matrix works fine.
>
> big <- as.matrix(aMatrix)
> totals <- rowSums(big == 0) # Uses more RAM but there is no segmentation fault and result is returned.
>
> May it be made more robust for dgTMatrix?
>
> --------------------------------------
> Dario Strbenac
> University of Sydney
> Camperdown NSW 2050
> Australia
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list