[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices

Fri Mar 20 11:07:32 CET 2015

>>>>> "MH" == Michael Hahsler <mhahsler at lyle.smu.edu>
>>>>>     on Thu, 19 Mar 2015 20:15:37 -0500 writes:

    MH> Hi Martin,
    MH> package arules heavily relies on ngCMatrix and uses multiplication and 
    MH> addition for logical operations. I think it makes sense that in a mixed 
    MH> operation with one dgCMatrix and one ngCMatrix the ngCMatrix should be 
    MH> "promoted" to a dgCMatrix.

    MH> The current behavior of %*% and friends is in deed confusing:

    >> m <- matrix(sample(c(0,1), 5*5, replace=TRUE), nrow=5)
    >> x <- as(m, "dgCMatrix")
    >> y <- as(m, "ngCMatrix")
    >> x %*% y
    MH> 5 x 5 sparse Matrix of class "ngCMatrix"

    MH> [1,] | | | . |
    MH> [2,] | | | . |
    MH> [3,] . . | | .
    MH> [4,] . . . | .
    MH> [5,] | | | | |

    >> x %*% x
    MH> 5 x 5 sparse Matrix of class "dgCMatrix"

    MH> [1,] 1 2 1 . 2
    MH> [2,] 1 3 1 . 3
    MH> [3,] . . 1 2 .
    MH> [4,] . . . 1 .
    MH> [5,] 1 2 2 1 2

Indeed, that is not what one should expect.

    MH> We even explicitly coerce in our code ngCMatrix to dgCMatrix to avoid 
    MH> this behavior. I think all these operations probably should result 
    MH> consistently in a dgCMatrix.

Eventually.   As I said, it *is* useful to work with boolean
arithmetic in some cases here, so I do want to provide that
.. hopefully entirely consistently as well in the future, but
longer term not via '%*%'

    MH> I would love to see | and & for position-wise AND and OR for ngCMatrix.

Well, why don't you look? ;-)

These have worked for a long time already! (I checked a version
from 2008)

Thanks a lot, Michael, for your valuable feedback.
Martin

    MH> Thanks,
    MH> -Michael

    MH> On 03/19/2015 05:02 PM, Martin Maechler wrote:
    >> This is a Request For Comment, also BCCed to 390 package maintainers
    >> of reverse dependencies of the Matrix package.
    >> 
    >> Most users and package authors working with our 'Matrix' package will
    >> be using it for numerical computations, and so will be using
    >> "dMatrix" (d : double precision) matrix objects  M,   and indirectly, e.g., for
    >> M >= c  will also use "lMatrix" (l: logical i.e.  TRUE/FALSE/NA).
    >> All the following is  **not** affecting those numerical / logical
    >> computations.
    >> 
    >> A few others will know that we also have "pattern" matrices (purely
    >> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix" etc,
    >> all starting with "n" (from ``patter[n]``) which do play a prominent
    >> role in the internal sparse matrix algorithms, notably of the
    >> (underlying C code) CHOLMOD library in the so-called "symbolic"
    >> cholesky decomposition and other such operations. Another reason you
    >> may use them because they are equivalent to incidence matrices of
    >> unweighted (directed or undirected) graphs.
    >> 
    >> Now, as the subject says, I'm bringing up the topic of what should
    >> happen when these matrices appear in matrix multiplications.
    >> Somewhat by design, but also partly by coincidence,  the *sparse*
    >> pattern matrices multiplication in the Matrix package mostly builds on
    >> the CHOLMOD library `cholmod_ssmult()` function which implements
    >> "Boolean arithmetic" for them, instead of regular arithmetic:
    >> "+" is logical "or"
    >> "*" is  logical "and".
    >> Once we map  TRUE <-> 1  and  FALSE <-> 0, the only difference between
    >> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
    >> boolean arithmetic, because  "TRUE | TRUE" is TRUE in original logic.
    >> 
    >> The drawback of using the boolean arithmetic here is the "clash" with
    >> the usual numeric arithmetic, and arithmetic in R where logical is
    >> coerced to integer (and that to "double") when certain numerical
    >> functions/operations are used.
    >> 
    >> A more severe problem --- which I had not been aware of until
    >> relatively recently -- is the fact that  the CHOLMD function
    >> cholmod_ssdmult(A, B)
    >> treats *both* A and B as "pattern" as soon as one of them is a
    >> (sparse) pattern matrix.
    >> And this is - I say - in clear contrast to what R users would expect:
    >> If you multiply a numeric with a "kind of logical" matrix (a pattern
    >> one), you will expect that the
    >> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
    >> combined with a numeric matrix.
    >> So we could say that in this case, the Matrix package behavior is
    >> clearly bugous .... but still it has been the behavior for the last 10
    >> years or so.
    >> 
    >> RFC 1: "Change 1":
    >> I currently propose to change this behavior for the upcoming release
    >> of Matrix (version 1.2-0),  though I have no idea if dependent
    >> packages would partly fail their checks or otherwise have changed
    >> behavior subsequently.
    >> The change seems sensible, since I think if your package relied on
    >> this behavior, it was inadvertent and accidental.
    >> Still you may differ in your opinion about this change nr.1
    >> 
    >> RFC 2: "Change 2":
    >> This change would be more radical, and something I would not plan for
    >> the upcoming release of Matrix, but possibly for an update say one or
    >> two months later or so:  It concerns the matrix products when *both*
    >> matrices are pattern.  A situation where the boolean arithmetic may
    >> really make sense and where indeed packages may have depended on the
    >> current behavior  ("T + T  |--> T"). ... although that is currently
    >> only used for *sparse* pattern matrices, not for dense ones.
    >> 
    >> Further, it may still seem surprising that matrix multiplication does
    >> not behave numerically for a pair of such matrices, and by the
    >> principle of "least surprise" we should provide the boolean arithmetic
    >> matrix products in another way than  by the   standard  %*%,
    >> crossprod()  and  tcrossprod() functions.
    >> So one possibility could be to change the standard functions to behave
    >> numerically,
    >> and e.g., use   %&%  (replace the numeric "*" by a logical "&")  and
    >> crossprod(A,B, boolean=TRUE),  tcrossprod(A,B, boolean=TRUE)
    >> for the three  boolean arithmetic  version of matrix multiplications.
    >> 
    >> What do you think about this?   I'm particularly interested to hear
    >> from authors and users of  packages such as 'arules'  which IIRC
    >> explicitly work with sparse pattern matrices.
    >> 
    >> Thank you for your thoughts and creative ideas,
    >> Martin Maechler, ETH Zurich
    >> 

    MH> -- 
    MH> Michael Hahsler, Assistant Professor
    MH> Department of Engineering Management, Information, and Systems
    MH> Department of Computer Science and Engineering (by courtesy)
    MH> Bobby B. Lyle School of Engineering
    MH> Southern Methodist University, Dallas, Texas

    MH> office: Caruth Hall, suite 337, room 311
    MH> email:  mhahsler at lyle.smu.edu
    MH> web:    http://lyle.smu.edu/~mhahsler