[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices
Martin Maechler
maechler at lynne.stat.math.ethz.ch
Fri Mar 20 11:07:32 CET 2015
>>>>> "MH" == Michael Hahsler <mhahsler at lyle.smu.edu>
>>>>> on Thu, 19 Mar 2015 20:15:37 -0500 writes:
MH> Hi Martin,
MH> package arules heavily relies on ngCMatrix and uses multiplication and
MH> addition for logical operations. I think it makes sense that in a mixed
MH> operation with one dgCMatrix and one ngCMatrix the ngCMatrix should be
MH> "promoted" to a dgCMatrix.
MH> The current behavior of %*% and friends is in deed confusing:
>> m <- matrix(sample(c(0,1), 5*5, replace=TRUE), nrow=5)
>> x <- as(m, "dgCMatrix")
>> y <- as(m, "ngCMatrix")
>> x %*% y
MH> 5 x 5 sparse Matrix of class "ngCMatrix"
MH> [1,] | | | . |
MH> [2,] | | | . |
MH> [3,] . . | | .
MH> [4,] . . . | .
MH> [5,] | | | | |
>> x %*% x
MH> 5 x 5 sparse Matrix of class "dgCMatrix"
MH> [1,] 1 2 1 . 2
MH> [2,] 1 3 1 . 3
MH> [3,] . . 1 2 .
MH> [4,] . . . 1 .
MH> [5,] 1 2 2 1 2
Indeed, that is not what one should expect.
MH> We even explicitly coerce in our code ngCMatrix to dgCMatrix to avoid
MH> this behavior. I think all these operations probably should result
MH> consistently in a dgCMatrix.
Eventually. As I said, it *is* useful to work with boolean
arithmetic in some cases here, so I do want to provide that
.. hopefully entirely consistently as well in the future, but
longer term not via '%*%'
MH> I would love to see | and & for position-wise AND and OR for ngCMatrix.
Well, why don't you look? ;-)
These have worked for a long time already! (I checked a version
from 2008)
Thanks a lot, Michael, for your valuable feedback.
Martin
MH> Thanks,
MH> -Michael
MH> On 03/19/2015 05:02 PM, Martin Maechler wrote:
>> This is a Request For Comment, also BCCed to 390 package maintainers
>> of reverse dependencies of the Matrix package.
>>
>> Most users and package authors working with our 'Matrix' package will
>> be using it for numerical computations, and so will be using
>> "dMatrix" (d : double precision) matrix objects M, and indirectly, e.g., for
>> M >= c will also use "lMatrix" (l: logical i.e. TRUE/FALSE/NA).
>> All the following is **not** affecting those numerical / logical
>> computations.
>>
>> A few others will know that we also have "pattern" matrices (purely
>> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix" etc,
>> all starting with "n" (from ``patter[n]``) which do play a prominent
>> role in the internal sparse matrix algorithms, notably of the
>> (underlying C code) CHOLMOD library in the so-called "symbolic"
>> cholesky decomposition and other such operations. Another reason you
>> may use them because they are equivalent to incidence matrices of
>> unweighted (directed or undirected) graphs.
>>
>> Now, as the subject says, I'm bringing up the topic of what should
>> happen when these matrices appear in matrix multiplications.
>> Somewhat by design, but also partly by coincidence, the *sparse*
>> pattern matrices multiplication in the Matrix package mostly builds on
>> the CHOLMOD library `cholmod_ssmult()` function which implements
>> "Boolean arithmetic" for them, instead of regular arithmetic:
>> "+" is logical "or"
>> "*" is logical "and".
>> Once we map TRUE <-> 1 and FALSE <-> 0, the only difference between
>> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
>> boolean arithmetic, because "TRUE | TRUE" is TRUE in original logic.
>>
>> The drawback of using the boolean arithmetic here is the "clash" with
>> the usual numeric arithmetic, and arithmetic in R where logical is
>> coerced to integer (and that to "double") when certain numerical
>> functions/operations are used.
>>
>> A more severe problem --- which I had not been aware of until
>> relatively recently -- is the fact that the CHOLMD function
>> cholmod_ssdmult(A, B)
>> treats *both* A and B as "pattern" as soon as one of them is a
>> (sparse) pattern matrix.
>> And this is - I say - in clear contrast to what R users would expect:
>> If you multiply a numeric with a "kind of logical" matrix (a pattern
>> one), you will expect that the
>> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
>> combined with a numeric matrix.
>> So we could say that in this case, the Matrix package behavior is
>> clearly bugous .... but still it has been the behavior for the last 10
>> years or so.
>>
>> RFC 1: "Change 1":
>> I currently propose to change this behavior for the upcoming release
>> of Matrix (version 1.2-0), though I have no idea if dependent
>> packages would partly fail their checks or otherwise have changed
>> behavior subsequently.
>> The change seems sensible, since I think if your package relied on
>> this behavior, it was inadvertent and accidental.
>> Still you may differ in your opinion about this change nr.1
>>
>> RFC 2: "Change 2":
>> This change would be more radical, and something I would not plan for
>> the upcoming release of Matrix, but possibly for an update say one or
>> two months later or so: It concerns the matrix products when *both*
>> matrices are pattern. A situation where the boolean arithmetic may
>> really make sense and where indeed packages may have depended on the
>> current behavior ("T + T |--> T"). ... although that is currently
>> only used for *sparse* pattern matrices, not for dense ones.
>>
>> Further, it may still seem surprising that matrix multiplication does
>> not behave numerically for a pair of such matrices, and by the
>> principle of "least surprise" we should provide the boolean arithmetic
>> matrix products in another way than by the standard %*%,
>> crossprod() and tcrossprod() functions.
>> So one possibility could be to change the standard functions to behave
>> numerically,
>> and e.g., use %&% (replace the numeric "*" by a logical "&") and
>> crossprod(A,B, boolean=TRUE), tcrossprod(A,B, boolean=TRUE)
>> for the three boolean arithmetic version of matrix multiplications.
>>
>> What do you think about this? I'm particularly interested to hear
>> from authors and users of packages such as 'arules' which IIRC
>> explicitly work with sparse pattern matrices.
>>
>> Thank you for your thoughts and creative ideas,
>> Martin Maechler, ETH Zurich
>>
MH> --
MH> Michael Hahsler, Assistant Professor
MH> Department of Engineering Management, Information, and Systems
MH> Department of Computer Science and Engineering (by courtesy)
MH> Bobby B. Lyle School of Engineering
MH> Southern Methodist University, Dallas, Texas
MH> office: Caruth Hall, suite 337, room 311
MH> email: mhahsler at lyle.smu.edu
MH> web: http://lyle.smu.edu/~mhahsler
More information about the R-devel
mailing list