[Rd] surprised matrix (1:256, 8, 8) doesn't cause error/warning

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Tue Feb 9 11:58:43 CET 2021


>>>>> Wolfgang Huber 
>>>>>     on Sat, 6 Feb 2021 19:49:11 +0100 writes:

    > FWIW, I paste below a possible change to the warnings generating part of the do_matrix function in R/src/main/array.c that adds the kind of warning that Abby is asking for, and that IMHO would more often help users find bugs in their code than interfere with intended behaviour.

Thank you, Wolfgang.
Honestly,  I had originally not wanted to get into this.

Functions that have been in use for longer than R exists (namely
in S / S+) without the need for changes  is not something we'd
typically easily consider for a change.
{well, in the very old times,  0-extent matrices did not exist,
 so there were *some* changes during matrix()'s history ..}

But I think Abby and you have a point here.... so I have been
looking at your patch .. and from code reading had wondered
about another  behavior which *was* not quite consistent,
and you eliminated completely with your patch:

op <- options(warn=1)
for(n in 0:2) { cat(n,":\n") ; print(matrix(seq_len(n), 0, 0)) }
options(op)

shows (in released R)

0 :
<0 x 0 matrix>
1 :
<0 x 0 matrix>
2 :
Warning in matrix(seq_len(n), 0, 0) :
  data length exceeds size of matrix
<0 x 0 matrix>
> 

and it is really seems not logical that in matrix(x, 0,0),
 'x' of  size 1 and  size (>=) 2  are treated differently.
For consistency,  size 1  "should"  also warn (but read on!)

After your patch, theres' no warning in any case .. which is
consistent, within the  matrix(x, 0,0) situation  but not consistent
with your proposal to warn more often when  matrix(x, n,k)  is
"obviously" using inconsistent dimensions.

When trying to let   matrix(1, 0,0)  also warn,
I quickly found that this produces many new warnings in our (R
base packages) examples and tests, notably from construction
such as

	 matrix(NA, 0, 3)
or
	 matrix(NA_character_, 0, 4)

which would all have to be re-written as

      matrix(logical()  , 0, 3)
or    matrix(character(), 0, 4)

The latter *is* more strictly self-consistent, but do we really
want to impose such strictness ?

My conclusion:  Not at this moment

OTOH, I'd re-add the warning for  length(x) > 1  which has
been there in the current code (but not your patch).
{{No need to send another patch, I've changed too many small
  things already, for me to be useful}}

This still needs adaption of one of the regression tests of R
itself, and needs (at least) one of the tests of the Matrix package
(warning turned into error, from options(warn = 2)).

I'm willing to go that route, but I'm sure this will entail some
work by other package authors, too (and hence CRAN maintainers etc).

Opinions?


    >> matrix (1:6, nrow = 2, ncol = 3)

    >> matrix (1:12, nrow = 2, ncol = 3)
    > [,1] [,2] [,3]
    > [1,]    1    3    5
    > [2,]    2    4    6
    > Warning message:
    > In matrix(1:12, nrow = 2, ncol = 3) :
    > data length incompatible with size of matrix

    >> matrix (1:7, nrow = 2, ncol = 3)
    > Warning messages:
    > 1: In matrix(1:7, nrow = 2, ncol = 3) :
    > data length [7] is not a sub-multiple or multiple of the number of rows [2]
    > 2: In matrix(1:7, nrow = 2, ncol = 3) :
    > data length incompatible with size of matrix

    >> matrix (1:8, nrow = 2, ncol = 3)
    > Warning messages:
    > 1: In matrix(1:8, nrow = 2, ncol = 3) :
    > data length [8] is not a sub-multiple or multiple of the number of columns [3]
    > 2: In matrix(1:8, nrow = 2, ncol = 3) :
    > data length incompatible with size of matrix

    >> matrix (1:6, nrow = 0, ncol = 0)
    > <0 x 0 matrix>
    >> matrix (numeric(0), nrow = 2, ncol = 3)
    > [,1] [,2] [,3]
    > [1,]   NA   NA   NA
    > [2,]   NA   NA   NA

    >> matrix(1:2, ncol = 8)
    > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
    > [1,]    1    2    1    2    1    2    1    2

    > It would be nice to combine the new warning with that about “...not a sub-multiple or multiple…” into a single warning, if appropriate (as in two of the examples above), but that would require bigger surgery way above my payscale.

I agree that in those cases it should only show one warning, and
to keep things simple I'd say it should just be the 2nd of those
above  or more precisely (I have to check if that's correct)

  "data length larger than size of matrix"

Martin

    > Kind regards
    > Wolfgang Huber


    > Index: array.c
    > ===================================================================
    > --- array.c	(revision 79951)
    > +++ array.c	(working copy)
    > @@ -133,18 +133,19 @@
    > nc = (int) ceil((double) lendat / (double) nr);
    > }

    > -    if(lendat > 0) {
    > +    if (lendat > 1) {
    > R_xlen_t nrc = (R_xlen_t) nr * nc;
    > -	if (lendat > 1 && nrc % lendat != 0) {
    > +	if ((nrc % lendat) != 0) {
    > if (((lendat > nr) && (lendat / nr) * nr != lendat) ||
    > ((lendat < nr) && (nr / lendat) * lendat != nr))
    > warning(_("data length [%d] is not a sub-multiple or multiple of the number of rows [%d]"), lendat, nr);
    > else if (((lendat > nc) && (lendat / nc) * nc != lendat) ||
    > ((lendat < nc) && (nc / lendat) * lendat != nc))
    > -		warning(_("data length [%d] is not a sub-multiple or multiple of the number of columns [%d]"), lendat, nc);
    > -	}
    > -	else if ((lendat > 1) && (nrc == 0)){
    > +		    warning(_("data length [%d] is not a sub-multiple or multiple of the number of columns [%d]"), lendat, nc);
    > +	if (nrc == 0)
    > warning(_("data length exceeds size of matrix"));
    > +        if (nrc != lendat)
    > +            warning(_("data length incompatible with size of matrix"));
    > }
    > }


    > ------
    > // And here, for easy checking that part of the code in the new form:
    > if (lendat > 1) {
    > R_xlen_t nrc = (R_xlen_t) nr * nc;
    > if ((nrc % lendat) != 0) {
    > if (((lendat > nr) && (lendat / nr) * nr != lendat) ||
    > ((lendat < nr) && (nr / lendat) * lendat != nr))
    > warning(_("data length [%d] is not a sub-multiple or multiple of the number of rows [%d]"), lendat, nr);
    > else if (((lendat > nc) && (lendat / nc) * nc != lendat) ||
    > ((lendat < nc) && (nc / lendat) * lendat != nc))
    > warning(_("data length [%d] is not a sub-multiple or multiple of the number of columns [%d]"), lendat, nc);
    > if (nrc == 0)
    > warning(_("data length exceeds size of matrix"));
    > if (nrc != lendat)  
    > warning(_("data length incompatible with size of matrix"));
    > }
    > }

    >> Il giorno 2feb2021, alle ore 00:27, Abby Spurdle (/əˈbi/) <spurdle.a using gmail.com> ha scritto:
    >> 
    >> So, does that mean that a clean result is contingent on the length of
    >> the data being a multiple of both the number of rows and columns?
    >> 
    >> However, this rule is not straightforward.
    >> 
    >>> #EXAMPLE 1
    >>> #what I would expect
    >>> matrix (1:12, 0, 0)
    >> <0 x 0 matrix>
    >> Warning message:
    >> In matrix(1:12, 0, 0) : data length exceeds size of matrix
    >> 
    >>> #EXAMPLE 2
    >>> #don't like this
    >>> matrix (numeric (), 2, 3)
    >> [,1] [,2] [,3]
    >> [1,]   NA   NA   NA
    >> [2,]   NA   NA   NA
    >> 
    >> The first example is what I would expect, but is inconsistent with the
    >> previous examples.
    >> (Because zero is a valid multiple of twelve).
    >> 
    >> I dislike the second example with recycling of a zero-length vector.
    >> This *is* covered in the help file, but also seems inconsistent with
    >> the previous examples.
    >> (Because two and three are not valid multiples of zero).
    >> 
    >> Also, I can't think of any reason why someone would want to construct
    >> a matrix with extra data, and then discard part of it.
    >> And even if there was, then why not allow an arbitrarily longer length?
    >> 
    >> 
    >> On Mon, Feb 1, 2021 at 10:08 PM Martin Maechler
    >> <maechler using stat.math.ethz.ch> wrote:
    >>> 
    >>>>>>>> Abby Spurdle (/əˈbi/)
    >>>>>>>> on Mon, 1 Feb 2021 19:50:32 +1300 writes:
    >>> 
    >>>> I'm a little surprised that the following doesn't trigger an error or a warning.
    >>>> matrix (1:256, 8, 8)
    >>> 
    >>>> The help file says that the main argument is recycled, if it's too short.
    >>>> But doesn't say what happens if it's too long.
    >>> 
    >>> It's somewhat subtler than one may assume :
    >>> 
    >>>> matrix(1:9, 2,3)
    >>> [,1] [,2] [,3]
    >>> [1,]    1    3    5
    >>> [2,]    2    4    6
    >>> Warning message:
    >>> In matrix(1:9, 2, 3) :
    >>> data length [9] is not a sub-multiple or multiple of the number of rows [2]
    >>> 
    >>>> matrix(1:8, 2,3)
    >>> [,1] [,2] [,3]
    >>> [1,]    1    3    5
    >>> [2,]    2    4    6
    >>> Warning message:
    >>> In matrix(1:8, 2, 3) :
    >>> data length [8] is not a sub-multiple or multiple of the number of columns [3]
    >>> 
    >>>> matrix(1:12, 2,3)
    >>> [,1] [,2] [,3]
    >>> [1,]    1    3    5
    >>> [2,]    2    4    6
    >>>> 
    >>> 
    >>> So it looks to me the current behavior is quite on purpose.
    >>> Are you sure it's not documented at all when reading the docs
    >>> carefully?  (I did *not*, just now).
    >> 
    >> ______________________________________________
    >> R-devel using r-project.org mailing list
    >> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list