[Rd] dgTMatrix Segmentation Fault

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Thu Jun 10 09:13:09 CEST 2021


>>>>> Ben Bolker 
>>>>>     on Wed, 9 Jun 2021 21:11:18 -0400 writes:

    > Nice!

Indeed -- and thanks a lot, Dario (and Martin Morgan !) for
getting down to the root problem.

so, indeed a bug in Matrix (though "far away" from 'dgTMatrix').

Thank you once more!

Martin Maechler

    > On 6/9/21 9:00 PM, Dario Strbenac via R-devel wrote:
    >> Good day,
    >> 
    >> Thanks to handy hints from Martin Morgan, I ran R under gdb and checked for any numeric overflow. We pinpointed the cause:
    >> 
    >> (gdb) info locals
    >> i = 0
    >> j = 10738
    >> m = 200000
    >> n = 50000
    >> ans = 0x55555b332790
    >> aa = 0x55555b3327c0
    >> 
    >> There is a line of C code in dgeMatrix.c for (i = 0; i < m; i++) aa[i] += xx[i + j * m];
    >> 
    >> i  + j * m are all int, and overflow
    >> (lldb) print 0 + 10738 * 200000
    >> (int) $5 = -2147367296
    >> 
    >> So, either the code should check that this doesn't occur, or be adjusted to allow for large indexes.
    >> 
    >> If anyone is interested, this is in the context of single-cell ATAC-seq data, which typically has about 200000 genomic regions (rows) and perhaps 100000 biological cells (columns).
    >> 
    >> --------------------------------------
    >> Dario Strbenac
    >> University of Sydney
    >> Camperdown NSW 2050
    >> Australia
    >> ______________________________________________
    >> R-devel using r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel
    >> 

    > ______________________________________________
    > R-devel using r-project.org mailing list
    > https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list