[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Martin Maechler maechler at stat.math.ethz.ch
Tue Nov 15 12:58:36 CET 2016


Finally getting back to this :

>>>>> Hadley Wickham <h.wickham at gmail.com>
>>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:

    > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
    > <h.wickham at gmail.com> wrote:
    >>> >> One possibility would also be to consider a
    >>> "numbers-only" or >> rather "same type"-only {e.g.,
    >>> would also work for characters} >> version.
    >>>
    >>> > I don't know what you mean by these.
    >>>
    >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
    >>> which is very relevant, thank you Bob!
    >>>
    >>> As I have found, that actually works in such a "same
    >>> type"-only way: It does not try to coerce, but gives an
    >>> error when the classes differ, even in this somewhat
    >>> debatable case :
    >>>
    >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
    >>> `false` has type 'double' not 'integer'
    >>> >
    >>>
    >>> As documented, if_else() is clearly stricter than
    >>> ifelse() and e.g., also does no recycling (but of
    >>> length() 1).
    >>
    >> I agree that if_else() is currently too strict - it's
    >> particularly annoying if you want to replace some values
    >> with a missing:
    >>
    >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
    >> has type 'integer' not 'logical'
    >>
    >> But I would like to make sure that this remains an error:
    >>
    >> if_else(x > 5, x, "BLAH")
    >>
    >> Because that seems more likely to be a user error (but
    >> reasonable people might certainly believe that it should
    >> just work)
    >>
    >> dplyr is more accommodating in other places (i.e. in
    >> bind_rows(), collapse() and the joins) but it's
    >> surprisingly hard to get all the details right. For
    >> example, what should the result of this call be?
    >>
    >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
    >> factor(c("c", "b"))
    >>
    >> Strictly speaking I think you could argue it's an error,
    >> but that's not very user-friendly. Should it be a factor
    >> with the union of the levels? Should it be a character
    >> vector + warning? Should the behaviour change if one set
    >> of levels is a subset of the other set?
    >>
    >> There are similar issues for POSIXct (if the time zones
    >> are different, which should win?), and difftimes
    >> (similarly for units).  Ideally you'd like the behaviour
    >> to be extensible for new S3 classes, which suggests it
    >> should be a generic (and for the most general case, it
    >> would need to dispatch on both arguments).

    > One possible principle would be to use c() -
    > i.e. construct out as

    > out <- c(yes[0], no[0]
    > length(out) <- max(length(yes), length(no))

yes; this would require that a  `length<-` method works for the
class of the result.

Duncan Murdoch mentioned a version of this, in his very
first reply:

    ans <- c(yes, no)[seq_along(test)]
    ans <- ans[seq_along(test)]

which is less efficient for atomic vectors, but requires
less from the class: it "only" needs `c` and `[` to work

and a mixture of your two proposals would be possible too:

    ans <- c(yes[0], no[0])
    ans <- ans[seq_along(test)]

which does *not* work for my "mpfr" numbers (CRAN package 'Rmpfr'),
but that's a buglet in the  c.mpfr() implementation of my Rmpfr
package... (which has already been fixed in the development version on R-forge,
	    https://r-forge.r-project.org/R/?group_id=386)

    > But of course that wouldn't help with factor responses.

Yes.  However, a version of Duncan's suggestion -- of treating 'yes' first
-- does help in that case.

For once, mainly as "feasability experiment",
I have created a github gist to make my current ifelse2() proposal available
for commenting, cloning, pullrequesting, etc:

Consisting of 2 files
- ifelse-def.R :  Functions definitions only, basically all the current
	        proposals, called  ifelse*()
- ifelse-checks.R : A simplistic checking function
 	and examples calling it, notably demonstrating that my
	ifelse2()  does work with
	"Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
	and "mpfr" (the arbitrary-precision numbers in my package "Rmpfr")

Also if you are not on github, you can quickly get to the ifelse2()
definition :

https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d87894#file-ifelse-def-r-L168

    > Also, if you're considering an improved ifelse(), I'd
    > strongly urge you to consider adding an `na` argument,

I now did (called it 'NA.').

    > so that you can use ifelse() to transform all three
    > possible values in a logical vector.

    > Hadley
    > -- http://hadley.nz

For those who really hate GH (and don't want or cannot easily follow the
above URL), here's my current definition: 


##' Martin Maechler, 14. Nov 2016 --- taking into account Duncan M. and Hadley's
##' ideas in the R-devel thread starting at (my mom's 86th birthday):
##' https://stat.ethz.ch/pipermail/r-devel/2016-August/072970.html
ifelse2 <- function (test, yes, no, NA. = NA) {
    if(!is.logical(test)) {
        if(is.atomic(test))
            storage.mode(test) <- "logical"
        else ## typically a "class"; storage.mode<-() typically fails
            test <- if(isS4(test)) methods::as(test, "logical") else as.logical(test)
    }

    ## No longer optimize the  "if (a) x else y"  cases:
    ## Only "non-good" R users use ifelse(.) instead of if(.) in these cases.

    ans <-
	tryCatch(rep(if(is.object(yes) && identical(class(yes), class(no)))
			 ## as c(o) or o[0] may not work for the class
			 yes else c(yes[0], no[0]), length.out = length(test)),
		 error = function(e) { ## try asymmetric, yes-leaning
		     r <- yes
		     r[!test] <- no[!test]
		     r
		 })
    ok <- !(nas <- is.na(test))
    if (any(test[ok]))
	ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
    if (any(!test[ok]))
	ans[!test & ok] <- rep(no, length.out = length(ans))[!test & ok]
    ans[nas] <- NA. # possibly coerced to class(ans)
    ans
}



More information about the R-devel mailing list