[R] Different Index behaviors of Array and Matrix
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Sep 3 17:40:24 CEST 2004
[I will copy a version of this to R-bugs: please be careful when you reply
to only copy to R-bugs a version with a PR number in the subject.]
On Fri, 3 Sep 2004, Yao, Minghua wrote:
> I found a difference between the indexing of an array and that of a
> matrix when there are NA's in the index array. The screen copy is as
> follows.
>
> > A <- array(NA, dim=6)
> > A
> [1] NA NA NA NA NA NA
> > idx <- c(1,NA,NA,4,5,6)
> > B <- c(10,20,30,40,50,60)
> > A[idx] <- B
> > A
> [1] 10 NA NA 40 50 60
> > AA <- matrix(NA,6,1)
> > AA
> [,1]
> [1,] NA
> [2,] NA
> [3,] NA
> [4,] NA
> [5,] NA
> [6,] NA
> > AA[idx,1] <- B
> > AA
> [,1]
> [1,] 10
> [2,] NA
> [3,] NA
> [4,] 20
> [5,] 30
> [6,] 40
> >
> In the case of a array, we miss the elements (20 and 30) in B
> corresponding to the NA's in the index array. In the case of a matrix,
> 20 and 30 are assigned to the elements indexed by the indexes following
> the NA's. Is this a reasonable behavior. Thanks in advance for
> explanation.
A is a 1D array but it behaves just like a vector.
Wierder things happen with multi-dimensional arrrays
> A <- array(NA, dim=c(6,1,1))
> A[idx,1,1] <- B
> A
, , 1
[,1]
[1,] 10
[2,] NA
[3,] NA
[4,] NA
[5,] NA
[6,] NA
One problem with what happens for matrices is that
> idx <- c(1,4,5,6)
> AA <- matrix(NA,6,1)
> AA[idx,1] <- B
Error in "[<-"(`*tmp*`, idx, 1, value = B) :
number of items to replace is not a multiple of replacement length
is an error, so it is not counting the values consistently.
The only discussion I could find (Blue Book p.103, which is also
discussing LHS subscripting) just says
If a subscript is NA, an NA is returned.
S normally does not use up values when encountering an NA in an index set,
although it does for logical matrix indexing of data frames.
I can see two possible interpretations.
1) The NA indicates the values was lost after assignment. We don't know
what index the first NA was, so 20 got assigned somewhere. And as we
don't know where, all the elements had better be NA. However, that is
unless the NA was 0, when no assignment took place any no value was used.
2) The NA indicates the value was lost before assignment, so no assignment
took place and no value was used.
R does neither of those. I suspect the correct course of action is to ban
NAs in subscripted assignments.
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list