[Rd] handling of zero and negative indices in
src/main/subscript.c:mat2indsub() (PR#7824)
tplate at acm.org
tplate at acm.org
Fri Apr 29 18:59:46 CEST 2005
This message contains a description of what looks like a bug, examples
of the suspect behavior, a proposed change to the C code to change this
behavior, example of behavior with the fix, and suggestions for 3 places
to update the documentation to reflect the proposed behavior. It is
submitted for consideration for inclusion in R. Comments are requested.
Currently, the code for subscripting by matrices checks that values in
the matrix are not greater than the dimensions of the array being
indexed. However, it does not check for zero or negative indices, and
blindly does index computation with them as though they were positive
indices (including for negative indices whose absolute value exceeds the
dimensions of the array being indexed). Here are examples of indexing
by matrices that do not return unequivocally sensible results (in most
cases):
> x <- matrix(1:6,ncol=2)
> dim(x)
[1] 3 2
> x
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> x[rbind(c(1,1), c(2,2))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(0,1))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(0,0))]
Error: only 0's may be mixed with negative subscripts
> x[rbind(c(1,1), c(2,2), c(0,2))]
[1] 1 5 3
> x[rbind(c(1,1), c(2,2), c(0,3))]
Error: subscript out of bounds
> x[rbind(c(1,1), c(2,2), c(1,0))]
Error: only 0's may be mixed with negative subscripts
> x[rbind(c(1,1), c(2,2), c(2,0))]
Error: only 0's may be mixed with negative subscripts
> x[rbind(c(1,1), c(2,2), c(3,0))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(1,2))]
[1] 1 5 4
> x[rbind(c(1,1), c(2,2), c(-1,2))]
[1] 1 5 2
> x[rbind(c(1,1), c(2,2), c(-2,2))]
[1] 1 5 1
> x[rbind(c(1,1), c(2,2), c(-3,2))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(-4,2))]
Error: only 0's may be mixed with negative subscripts
> x[rbind(c(1,1), c(2,2), c(-1,-1))]
Error: subscript out of bounds
>
> # range checks are not applied to negative indices
> x <- matrix(1:6, ncol=3)
> dim(x)
[1] 2 3
> x[rbind(c(1,1), c(2,2), c(-3,3))]
[1] 1 4 1
> x[rbind(c(1,1), c(2,2), c(-4,3))]
[1] 1 4
>
> version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 1.0
year 2005
month 04
day 18
language R
>
The followed version of mat2indsub() (to replace the one in
src/main/subscript.c) allows zero indices and explicitly disallows all
negative indices:
/* Special Matrix Subscripting: Handles the case x[i] where */
/* x is an n-way array and i is a matrix with n columns. */
/* This code returns a vector containing the integer subscripts */
/* to be extracted when x is regarded as unravelled. */
/* Negative indices are not allowed. */
/* A zero anywhere in a row will cause a zero in the same */
/* position in the result. */
SEXP mat2indsub(SEXP dims, SEXP s)
{
int tdim, j, i, k, nrs = nrows(s);
SEXP rvec;
PROTECT(rvec = allocVector(INTSXP, nrs));
s = coerceVector(s, INTSXP);
setIVector(INTEGER(rvec), nrs, 0);
for (i = 0; i < nrs; i++) {
tdim = 1;
/* compute 0-based subscripts for a row (0 in the input */
/* gets -1 in the output here) */
for (j = 0; j < LENGTH(dims); j++) {
k = INTEGER(s)[i + j * nrs];
if(k == NA_INTEGER) {
INTEGER(rvec)[i] = NA_INTEGER;
break;
}
if(k < 0)
error(_("cannot have negative values in matrices used as subscripts"));
if(k == 0) {
INTEGER(rvec)[i] = -1;
break;
}
if (k > INTEGER(dims)[j])
error(_("subscript out of bounds"));
INTEGER(rvec)[i] += (k - 1) * tdim;
tdim *= INTEGER(dims)[j];
}
/* transform to 1 based subscripting (0 in the input */
/* gets 0 in the output here) */
if(INTEGER(rvec)[i] != NA_INTEGER)
INTEGER(rvec)[i]++;
}
UNPROTECT(1);
return (rvec);
}
With this version, the above commands (+ a couple more) produce the
following output (with commands that fail suitably wrapped in try() so
that they can be included in a test files.)
> x <- matrix(1:6,ncol=2)
> dim(x)
[1] 3 2
> x
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
> x[rbind(c(1,1), c(2,2))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(0,1))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(0,0))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(0,2))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(0,3))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(1,0))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(2,0))]
[1] 1 5
> x[rbind(c(1,1), c(2,2), c(3,0))]
[1] 1 5
> x[rbind(c(1,0), c(0,2), c(3,0))]
numeric(0)
> x[rbind(c(1,0), c(0,0), c(3,0))]
numeric(0)
> x[rbind(c(1,1), c(2,2), c(1,2))]
[1] 1 5 4
> x[rbind(c(1,1), c(2,NA), c(1,2))]
[1] 1 NA 4
> x[rbind(c(1,0), c(2,NA), c(1,2))]
[1] NA 4
> try(x[rbind(c(1,1), c(2,2), c(-1,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-1, 2))]) :
cannot have negative values in matrices used as subscripts
> try(x[rbind(c(1,1), c(2,2), c(-2,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-2, 2))]) :
cannot have negative values in matrices used as subscripts
> try(x[rbind(c(1,1), c(2,2), c(-3,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-3, 2))]) :
cannot have negative values in matrices used as subscripts
> try(x[rbind(c(1,1), c(2,2), c(-4,2))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-4, 2))]) :
cannot have negative values in matrices used as subscripts
> try(x[rbind(c(1,1), c(2,2), c(-1,-1))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-1, -1))]) :
cannot have negative values in matrices used as subscripts
>
> # verify that range checks are applied to negative indices
> x <- matrix(1:6, ncol=3)
> dim(x)
[1] 2 3
> try(x[rbind(c(1,1), c(2,2), c(-3,3))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-3, 3))]) :
cannot have negative values in matrices used as subscripts
> try(x[rbind(c(1,1), c(2,2), c(-4,3))])
Error in try(x[rbind(c(1, 1), c(2, 2), c(-4, 3))]) :
cannot have negative values in matrices used as subscripts
>
The documentation page for ?Extract could have the following sentences
added to the end of the 2nd para in the description of arguments "i, j,
...":
Negative indices are not allowed in matrices used as indices.
NA and zero values are allowed: rows containing a zero are
omitted from the result, and rows containing an NA produce an
NA in the result.
In "Introduction to R", the same material could be added to the end of
the section "Index Arrays" (in the chapter "Arrays and matrices").
In "R Language Definition", the third paragraph of the section "Indexing
matrices and arrays" (Section 3.4.2 in my copy) currently reads:
> It is possible to use a matrix of integers as an index. In
> this case, the number of columns of the matrix should match
> the number of dimensions of the structure, and the result
> will be a vector with length as the number of rows of the
> matrix. The following example shows how to extract the
> elements m[1, 1] and m[2, 2] in one operation.
It could be changed to the following:
It is possible to use a matrix of integers as an index.
Negative indices are not allowed in matrices used as
indices. NA and zero values are allowed: rows containing a
zero are omitted from the result, and rows containing an NA
produce an NA in the result. To use a matrix as an index,
the number of columns of the matrix should match the number
of dimensions of the structure, and the result will be a
vector with length as the number of rows of the matrix
(except when the matrix contains zeros). The following
example shows how to extract the elements m[1, 1] and m[2,
2] in one operation.
Additionally, indexing by matrix just goes ahead and uses matrix
elements as vector indices in cases where the number of columns in the
matrix does not match the number of dimensions in the array. E.g.:
> x <- matrix(1:6,ncol=2)
> x[rbind(c(1,1,1), c(2,2,2))]
[1] 1 2 1 2 1 2
>
Would it not be preferable behavior to stop with an error in such cases?
More information about the R-devel
mailing list