[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices
Martin Maechler
maechler at stat.math.ethz.ch
Thu Mar 19 23:02:15 CET 2015
This is a Request For Comment, also BCCed to 390 package maintainers
of reverse dependencies of the Matrix package.
Most users and package authors working with our 'Matrix' package will
be using it for numerical computations, and so will be using
"dMatrix" (d : double precision) matrix objects M, and indirectly, e.g., for
M >= c will also use "lMatrix" (l: logical i.e. TRUE/FALSE/NA).
All the following is **not** affecting those numerical / logical
computations.
A few others will know that we also have "pattern" matrices (purely
binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix" etc,
all starting with "n" (from ``patter[n]``) which do play a prominent
role in the internal sparse matrix algorithms, notably of the
(underlying C code) CHOLMOD library in the so-called "symbolic"
cholesky decomposition and other such operations. Another reason you
may use them because they are equivalent to incidence matrices of
unweighted (directed or undirected) graphs.
Now, as the subject says, I'm bringing up the topic of what should
happen when these matrices appear in matrix multiplications.
Somewhat by design, but also partly by coincidence, the *sparse*
pattern matrices multiplication in the Matrix package mostly builds on
the CHOLMOD library `cholmod_ssmult()` function which implements
"Boolean arithmetic" for them, instead of regular arithmetic:
"+" is logical "or"
"*" is logical "and".
Once we map TRUE <-> 1 and FALSE <-> 0, the only difference between
boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
boolean arithmetic, because "TRUE | TRUE" is TRUE in original logic.
The drawback of using the boolean arithmetic here is the "clash" with
the usual numeric arithmetic, and arithmetic in R where logical is
coerced to integer (and that to "double") when certain numerical
functions/operations are used.
A more severe problem --- which I had not been aware of until
relatively recently -- is the fact that the CHOLMD function
cholmod_ssdmult(A, B)
treats *both* A and B as "pattern" as soon as one of them is a
(sparse) pattern matrix.
And this is - I say - in clear contrast to what R users would expect:
If you multiply a numeric with a "kind of logical" matrix (a pattern
one), you will expect that the
TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
combined with a numeric matrix.
So we could say that in this case, the Matrix package behavior is
clearly bugous .... but still it has been the behavior for the last 10
years or so.
RFC 1: "Change 1":
I currently propose to change this behavior for the upcoming release
of Matrix (version 1.2-0), though I have no idea if dependent
packages would partly fail their checks or otherwise have changed
behavior subsequently.
The change seems sensible, since I think if your package relied on
this behavior, it was inadvertent and accidental.
Still you may differ in your opinion about this change nr.1
RFC 2: "Change 2":
This change would be more radical, and something I would not plan for
the upcoming release of Matrix, but possibly for an update say one or
two months later or so: It concerns the matrix products when *both*
matrices are pattern. A situation where the boolean arithmetic may
really make sense and where indeed packages may have depended on the
current behavior ("T + T |--> T"). ... although that is currently
only used for *sparse* pattern matrices, not for dense ones.
Further, it may still seem surprising that matrix multiplication does
not behave numerically for a pair of such matrices, and by the
principle of "least surprise" we should provide the boolean arithmetic
matrix products in another way than by the standard %*%,
crossprod() and tcrossprod() functions.
So one possibility could be to change the standard functions to behave
numerically,
and e.g., use %&% (replace the numeric "*" by a logical "&") and
crossprod(A,B, boolean=TRUE), tcrossprod(A,B, boolean=TRUE)
for the three boolean arithmetic version of matrix multiplications.
What do you think about this? I'm particularly interested to hear
from authors and users of packages such as 'arules' which IIRC
explicitly work with sparse pattern matrices.
Thank you for your thoughts and creative ideas,
Martin Maechler, ETH Zurich
More information about the R-devel
mailing list