[Rd] RFC: Matrix package: Matrix products (%*%, crossprod, tcrossprod) involving "nsparseMatrix" aka sparse pattern matrices
Trevor Hastie
hastie at stanford.edu
Fri Mar 20 00:03:38 CET 2015
Hi Martin
I got stung by this last week.
glmnet produces a coefficient matrix of class “dgCMatrix”
If a predictor matrix was created using sparseMatrix as follows,
one gets unexpected results, as this simple example shows.
My fix was easy (I always convert the predictor matrix to class “dgCMatrix” now)
Trevor
> y=Matrix(diag(4))
> y
4 x 4 diagonal matrix of class "ddiMatrix"
[,1] [,2] [,3] [,4]
[1,] 1 . . .
[2,] . 1 . .
[3,] . . 1 .
[4,] . . . 1
> z=sparseMatrix(1:4,1:4)
> z
4 x 4 sparse Matrix of class "ngCMatrix"
[1,] | . . .
[2,] . | . .
[3,] . . | .
[4,] . . . |
> beta=as(Matrix(1:4),"dgCMatrix")
> y%*%beta
4 x 1 sparse Matrix of class "dgCMatrix"
[1,] 1
[2,] 2
[3,] 3
[4,] 4
> z%*%beta
4 x 1 sparse Matrix of class "ngCMatrix"
[1,] |
[2,] |
[3,] |
[4,] |
>
> On Mar 19, 2015, at 3:02 PM, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
>
> This is a Request For Comment, also BCCed to 390 package maintainers
> of reverse dependencies of the Matrix package.
>
> Most users and package authors working with our 'Matrix' package will
> be using it for numerical computations, and so will be using
> "dMatrix" (d : double precision) matrix objects M, and indirectly, e.g., for
> M >= c will also use "lMatrix" (l: logical i.e. TRUE/FALSE/NA).
> All the following is **not** affecting those numerical / logical
> computations.
>
> A few others will know that we also have "pattern" matrices (purely
> binary: TRUE/FALSE, no NA) notably sparse ones, those "ngCMatrix" etc,
> all starting with "n" (from ``patter[n]``) which do play a prominent
> role in the internal sparse matrix algorithms, notably of the
> (underlying C code) CHOLMOD library in the so-called "symbolic"
> cholesky decomposition and other such operations. Another reason you
> may use them because they are equivalent to incidence matrices of
> unweighted (directed or undirected) graphs.
>
> Now, as the subject says, I'm bringing up the topic of what should
> happen when these matrices appear in matrix multiplications.
> Somewhat by design, but also partly by coincidence, the *sparse*
> pattern matrices multiplication in the Matrix package mostly builds on
> the CHOLMOD library `cholmod_ssmult()` function which implements
> "Boolean arithmetic" for them, instead of regular arithmetic:
> "+" is logical "or"
> "*" is logical "and".
> Once we map TRUE <-> 1 and FALSE <-> 0, the only difference between
> boolean and regular arithmetic is that "1+1 = 1" in the (mapped)
> boolean arithmetic, because "TRUE | TRUE" is TRUE in original logic.
>
> The drawback of using the boolean arithmetic here is the "clash" with
> the usual numeric arithmetic, and arithmetic in R where logical is
> coerced to integer (and that to "double") when certain numerical
> functions/operations are used.
>
> A more severe problem --- which I had not been aware of until
> relatively recently -- is the fact that the CHOLMD function
> cholmod_ssdmult(A, B)
> treats *both* A and B as "pattern" as soon as one of them is a
> (sparse) pattern matrix.
> And this is - I say - in clear contrast to what R users would expect:
> If you multiply a numeric with a "kind of logical" matrix (a pattern
> one), you will expect that the
> TRUE/FALSE matrix will be treated as a 1/0 matrix because it is
> combined with a numeric matrix.
> So we could say that in this case, the Matrix package behavior is
> clearly bugous .... but still it has been the behavior for the last 10
> years or so.
>
> RFC 1: "Change 1":
> I currently propose to change this behavior for the upcoming release
> of Matrix (version 1.2-0), though I have no idea if dependent
> packages would partly fail their checks or otherwise have changed
> behavior subsequently.
> The change seems sensible, since I think if your package relied on
> this behavior, it was inadvertent and accidental.
> Still you may differ in your opinion about this change nr.1
>
> RFC 2: "Change 2":
> This change would be more radical, and something I would not plan for
> the upcoming release of Matrix, but possibly for an update say one or
> two months later or so: It concerns the matrix products when *both*
> matrices are pattern. A situation where the boolean arithmetic may
> really make sense and where indeed packages may have depended on the
> current behavior ("T + T |--> T"). ... although that is currently
> only used for *sparse* pattern matrices, not for dense ones.
>
> Further, it may still seem surprising that matrix multiplication does
> not behave numerically for a pair of such matrices, and by the
> principle of "least surprise" we should provide the boolean arithmetic
> matrix products in another way than by the standard %*%,
> crossprod() and tcrossprod() functions.
> So one possibility could be to change the standard functions to behave
> numerically,
> and e.g., use %&% (replace the numeric "*" by a logical "&") and
> crossprod(A,B, boolean=TRUE), tcrossprod(A,B, boolean=TRUE)
> for the three boolean arithmetic version of matrix multiplications.
>
> What do you think about this? I'm particularly interested to hear
> from authors and users of packages such as 'arules' which IIRC
> explicitly work with sparse pattern matrices.
>
> Thank you for your thoughts and creative ideas,
> Martin Maechler, ETH Zurich
----------------------------------------------------------------------------------------
Trevor Hastie hastie at stanford.edu <mailto:hastie at stanford.edu>
Professor, Department of Statistics, Stanford University
Phone: (650) 725-2231 Fax: (650) 725-8977
URL: http://www.stanford.edu/~hastie <http://www-stat.stanford.edu/~hastie>
address: room 104, Department of Statistics, Sequoia Hall
390 Serra Mall, Stanford University, CA 94305-4065
--------------------------------------------------------------------------------------
[[alternative HTML version deleted]]
More information about the R-devel
mailing list