which() does not handle NAs in named vectors. (PR#226)

Martin Maechler Martin Maechler <maechler@stat.math.ethz.ch>
Thu, 15 Jul 1999 14:57:22 +0200


>>>>> On Thu, 15 Jul 1999 09:14, ripley@stats.ox.ac.uk (Brian D. Ripley) said:

Thank you for the bug report

    BDR> -- It is unclear to me that the handling of NAs is desirable, and
    BDR> it has problems with names:

{function which in its present form very much evolved out of user wishes...}

    BDR> z <- c(T,T,NA,F,T)
    BDR> names(z) <- letters[1:5]
    BDR> which(z)
    BDR> Error: names attribute must be the same length as the vector

fixed for release-patches [available in a day or two from CRAN src/devel/]
and hence every new release.

    BDR> (Why do the vector and its names have different subscripts?  And
    BDR> while you are correcting this,

    BDR> Arguments:

    BDR>        x: a logical vector or array.  `NA's are allowed an
    BDR>	   omitted.

is now

       x: a `logical' vector or array.  `NA's are allowed
          and omitted (treated as if `FALSE').

  
    BDR> has a typo, and the logic can be simplified: see below.)

    BDR> On Thu, 15 Jul 1999, Martin Maechler wrote:

    >> >>>>> "BDR" == Prof Brian D Ripley <ripley@stats.ox.ac.uk> writes:
    >> 
    BDR> On Wed, 14 Jul 1999, Friedrich Leisch wrote:
    >> >> >>>>> On Wed, 14 Jul 1999 04:09:21, >>>>> Peter B Mandeville
    >> (PBM) >> wrote:
    >> >> 
    PBM> I have a vector Pes with 600 elements some of which are NA's. How
    PBM> can I form a vector of the indices of the NA's.
    >> >>
    PBM> for(i in 1:600) if(is.na(Pes[i])) print(i)
    >> >>
    PBM> prints the indices of the NA's but I can't figure out how to put
    PBM> the results in a vector.
    >> >> try this:
    >> >> 
    >> >> x <- (1:length(Pes))[is.na(Pes)]
    >> 
    BDR> Tip: that sort of thing often fails for a length 0 vector. The
    BDR> `approved' spell is
    >>
    BDR> seq(along=Pes)[is.na(Pes)]

BTW, currently  seq(along = x)  returns "numeric" ("double") 
whereas		1:length(x)     returns "integer".
I'm about to fix this...

    BDR> In this case it does not matter as the subscript is of length 0,
    BDR> but it has floored enough library/package writers to be worth
    BDR> thinking about.
    >>  Good teaching about seq() vs.  1:n
    >> 
    >> However, the solution I gave
    >> 
    >> which(is.na(Pes))
    >> 
    >> is the one I stilly really recommend; it does deal with 0-length
    >> objects, and it keeps names when there are some, and it has an
    >> `arr.ind = FALSE' argument to return array indices instead of vector
    >> indices when so desired.

    BDR> Yes, but

    BDR> -- It is not in S (so causing difficulty in porting from R to S)

Well, I know what you mean and your point is all well in the above case...
but anyway:
Our group here has been using this ("which" function) in S for quite a while and
eventually, someone will have to collect a library of things from R, missing in
S-plus and easily implementable.

And then, for quite a few R users, S-plus backward compatibility is not the
big issue. Locally, in our collection of S-plus add-ons, we've got already
quite a few of them.. 
And in other ways, R is so much nicer
    - math annotation in graphics
    - color, line types  {  plot(x,y, col="light blue", col.main = "blue") }
    - filled.contour
    - persp() with shading..

I think if you want to live in both worlds, I want (and recommend) to use

    if(is.R()) {

       ...R specific...

    } 
    else { ## S-plus ---

       ...S-plus specific...

    }

anyway, even within user written functions
and make sure (via .First or S_FIRST or ...)  that is.R() |--> FALSE in S-plus


    BDR> -- It looks a relatively expensive operation.

I don't think it is expensive (for arr.ind=FALSE !) if you want to do deal
with missings (NA) at all.  (Peter's example above is one of the few places 
where you are absolutely sure there are no missings...)
Assume x has some NAs, e.g.
    x <- rnorm(1000); x[1000*runif(rpois(1,lam=50))] <- NA
Then
    which( x < -2 )  

works how one would want;

    seq(along = x)[x < -2]

gives silly NA's (which make sense for the logical vector but not for the
		 extraction).

    BDR> -- Internally which could be simplified by using seq(along=) as it is a wrapper for
    BDR> this construct, but actually the separate handling of n == 0 is
    BDR> unnecessary (as logic & !is.na(logic) will have length zero.)

You are right, and that's part of the fix for `which' which is currently

which <- function(logic, arr.ind = FALSE)
{
    if(!is.logical(logic))
	stop("argument to \"which\" is not logical")
    wh <- seq(along=logic)[ll <- logic & !is.na(logic)]
    if ((m <- length(wh)) > 0) {
	dl <- dim(logic)
	if (is.null(dl) || !arr.ind) {
	    names(wh) <- names(logic)[ll]
	}
	else { ##-- return a matrix  length(wh) x rank
	    rank <- length(dl)
	    wh1 <- wh - 1
	    wh <- 1 + wh1 %% dl[1]
	    wh <- matrix(wh, nrow = m, ncol = rank,
			 dimnames =
			 list(dimnames(logic)[[1]][wh],
			      if(rank == 2) c("row", "col")# for matrices
			      else paste("dim", 1:rank, sep="")))
	    if(rank >= 2) {
		denom <- 1
		for (i in 2:rank) {
		    denom <- denom * dl[i-1]
		    nextd1 <- wh1 %/% denom# (next dim of elements) - 1
		    wh[,i] <- 1 + nextd1 %% dl[i]
		}
	    }
	    storage.mode(wh) <- "integer"
	}
    }
    wh
}
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-devel mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-devel-request@stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._