[Rd] ifelse() woes ... can we agree on a ifelse2() ?
Gabriel Becker
gmbecker at ucdavis.edu
Mon Nov 28 16:58:57 CET 2016
Well, that's embarrassing. Sorry for the noise on that front, everyone. I
misunderstood something from the aforementioned unrelated conversation I
was having, but not double checking is on me (I rarely use if else and when
I do I avoid that situation in my own code, which is why I didn't already
know this)
I'd still argue that situation should at least warn, possibly error, as it
seems indicative of a bug in the user's code.
On Mon, Nov 28, 2016 at 7:00 AM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:
> >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
> >>>>> on Sat, 26 Nov 2016 17:14:01 +0000 writes:
>
> > Just stating, in 'ifelse', 'test' is not recycled. As I said in
> "R-intro: length of 'ifelse' result" (https://stat.ethz.ch/
> pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a' or 'b' is longer.
>
> yes and ?ifelse (the help page) also does not say that test is
> recycled, rather
>
> >> If \code{yes} or \code{no} are too short, their elements are
> recycled.
>
> (*and* the problem you wrote the above has been corrected in the
> R-intro manual shortly after).
>
>
> > On current 'ifelse' code in R:
>
> > * The part
> > ans[nas] <- NA
> > could be omitted because NA's are already in place.
> > If the part is removed, variable 'nas' is no longer used.
>
> I agree that this seems logical. If I apply the change, R's own
> full checks do not seem affected, and I may try to commit that
> change and "wait and see".
>
>
> > * The any(*) part actually checks the thing that is used as the
> index vector. The index vector could be stored and then repeatedly used,
> like the following.
>
> > if (any(sel <- test & ok))
> > ans[sel] <- rep(yes, length.out = length(ans))[sel]
>
> yes, I know, and have had similar thoughts in the past.
> However note (I know you that) the current code is
>
> if (any(test[ok]))
> ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
>
> and any(test[ok]) may be considerably faster than
> any(sel <- test & ok)
>
> OTOH I think the current code would only be faster (for the
> above) when any(.) returned FALSE ...
> I think it may depend on the typical use cases which of the two
> versions is more efficient.
>
>
> > * If 'test' is a factor, doing
> > storage.mode(test) <- "logical"
> > is not appropriate, but is.atomic(test) returns TRUE. Maybe use
> > if(!is.object(test))
> > instead of
> > if(is.atomic(test)) .
>
> This would be a considerable change I think...
> Note that I'm currently really proposing to introduce an *additional*
> ifelse function with different "more reasonable" semantic, and
> your last change would do that too.
>
> My alternative should really work
> - for factors
> - for "array"s including "matrix" (as the current ifelse() does!)
> - for "Date", "POSIXct", "ts"(timeseries), "zoo",
> "sparseVector", "sparseMatrix" (*), or "mpfr",
> without any special code, but rather by design.
>
> *) Currently needs the R-forge version of Matrix, version 1.2-8.
>
> A bit less than an hour ago, I have updated the gist with an updated
> proposal ifelse2() {and the current alternatives that I know},
> modified so it *does* keep more, e.g. dim() attributes in
> reasonable cases.
>
> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
> 94#file-ifelse-def-r-L168
>
> Hence my ifelse2() became even a bit longer (but not slower)
> working for even more classes of "yes" and "no".
>
>
> > On ifelse-checks.R:
> > * In function 'chkIfelse', if the fourth function argument names is
> not "NA.", the argument name is changed, but the function body still uses
> the old name. That makes error in chkIfelse(ifelseHW) .
> > A fix:
> > if(names(formals(FUN))[[4]] != "NA.") {
> > body(FUN) <- do.call(substitute, list(body(FUN),
> > setNames(list(quote(NA.)),
> names(formals(FUN))[[4]])))
> > names(formals(FUN))[[4]] <- "NA."
> > }
>
> yes, thank you! (a bit embarrassing for me ..)
>
> > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt,
> as.POSIXlt(ifct)) .
> > 'iflt' has NA as 'tzone' and 'isdst' components.
> > * Because function 'chkIfelse' continues checking after failure,
> > as.POSIXlt(ifct)
> > may give error. The error happens, for example, in
> chkIfelse(ifelseR) . Maybe place it inside try(...).
> > * If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object.
> > So,
> > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
> > is an example of mixed class.
>
> good; thank you for the hint.
>
> > * The part of function 'chkIfelse' in
> > for(i in seq_len(nFact))
> > uses 'NA.' function argument. That makes error when 'chkIfelse' is
> applied to function without fourth argument.
> > The part should be wrapped in
> > if(has.4th) .
> yes of course
>
> > * Function 'ifelseJH' has fourth argument, but the argument is not
> for value if NA. So, instead of
> > chkIfelse(ifelseJH) ,
> > maybe call
> > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .
> You are right;
> I've decided to solve this differently.
>
> I'm looking at these suggestions now, notably also your proposals below;
> thank you, Suharto!
>
> (I wanted to put my improved 'ifelse2' out first, quickly).
> Martin
>
>
> > A concrete version of 'ifelse2' that starts the result from 'yes':
> > function(test, yes, no, NA. = NA) {
> > if(!is.logical(test))
> > test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
> > n <- length(test)
> > ans <- rep(yes, length.out = n)
> > ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & !
> is.na(test)]
> > ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
> > ans
> > }
>
> > It requires 'rep' method that is compatible with subsetting. It also
> works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an
> appropriate result if time zones are the same.
> > For coercion of 'test', there is no need of keeping attributes. So,
> it doesn't do
> > storage.mode(test) <- "logical"
> > and goes directly to 'as.logical'.
> > It relies on subassignment for silent coercions of
> > logical < integer < double < complex .
> > Unlike 'ifelse', it never skips any subassignment. So, phenomenon as
> in "example of different return modes" in ?ifelse doesn't happen.
>
> > Another version, for keeping attributes as pointed out by Duncan
> Murdoch:
> > function(test, yes, no, NA. = NA) {
> > if(!is.logical(test))
> > test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
> > n <- length(test)
> > n.yes <- length(yes); n.no <- length(no)
> > if (n.yes != n) {
> > if (n.no == n) { # swap yes <-> no
> > test <- !test
> > ans <- yes; yes <- no; no <- ans
> > n.no <- n.yes
> > } else yes <- yes[rep_len(seq_len(n.yes), n)]
> > }
> > ans <- yes
> > if (n.no == 1L)
> > ans[!test] <- no
> > else
> > ans[!test & !is.na(test)] <- no[
> > if (n.no == n) !test & !is.na(test)
> > else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
> > stopifnot(length(NA.) == 1L)
> > ans[is.na(test)] <- NA.
> > ans
> > }
>
> > Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
> > First, it chooses the first of 'yes' and 'no' that has the same
> length as the result. If none of 'yes' and 'no' matches the length of the
> result, it chooses recycled (or truncated) 'yes'.
> > It uses 'rep' on the index and subsetting as a substitute for 'rep'
> on the value.
> > It requires 'length' method that is compatible with subsetting.
> > Additionally, it uses the same idea as dplyr::if_else, or more
> precisely the helper function 'replace_with'. It doesn't use 'rep' if the
> length of 'no' is 1 or is the same as the length of the result. For
> subassignment with value of length 1, recycling happens by itself and NA in
> index is OK.
> > It limits 'NA.' to be of length 1, considering 'NA.' just as a label
> for NA.
>
> > Cases where the last version above or 'ifelse2 or 'ifelseHW' in
> ifelse-def.R gives inappropriate answers:
> > - 'yes' and 'no' are "difftime" objects with different "units"
> attribute
> > - 'yes' and 'no' are "POSIXlt" objects with different time zone
> > Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in
> "EST5EDT" and 'no' in "UTC" gives error.
>
> > For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R
> gives a right answer for "POSIXlt" case.
> > ---------------------
> > Martin et al.,
>
>
>
>
> > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at
> stat.math.ethz.ch
> >> wrote:
>
> >>
> >> Note that my premise was really to get *away* from inheriting
> >> too much from 'test'.
> >> Hence, I have *not* been talking about replacing ifelse() but
> >> rather of providing a new ifelse2()
> >>
> >> [ or if_else() if Hadley was willing to ditch the dplyr one
> >> in favor of a base one]
> >>
> >> > Specifically, based on an unrelated discussion with Henrik
> Bengtsson
> >> on
> >> > Twitter, I wonder if preserving the recycling behavior test
> is
> >> longer than
> >> > yes, no, but making the case where
> >>
> >> > length( test ) < max(length( yes ), length( no ))
> >>
> >> > would simplify usage for userRs in a useful way.
> >>
>
> > That was a copyediting bug on my part, it seems I hit send with my
> message
> > only half-edited/proofread. Apologies.
>
> > That should have said that making the case where test is the one
> that will
> > be recycled (because it is shorter than either yes or no) an error.
> My
> > claim is that the fact that test itself can be recycled, rather than
> just
> > yes or no, is confusing to many R users. If we are writing an
> ifelse2 we
> > might want to drop that feature and just throw an error in that case.
> > (Users could still use the original ifelse if they understand and
> > specifically want that behavior).
>
> > Does that make more sense?
>
>
>
> >>
> >> > Also, If we combine a stricter contract that the output will
> always
> >> be of
> >> > length with the suggestion of a specified output class
> >>
> >>
> > Here, again, I was talking about the restriction that the output be
> > guaranteed to be the length of test, regardless of the length of yes
> and
> > no. That, combined with a specific, guaranteed output class would
> make a
> > much narrower/more restricted but also (I argue) much easier to
> understand
> > function. Particularly for beginning and intermediate users.
>
> > I do hear what you're saying about silent conversion, though, so
> what I'm
> > describing might be a third function (ifelse3 for lack of a better
> name for
> > now), as you pointed out.
>
>
> >> that was not my intent here.... but would be another interesting
> >> extension. However, I would like to keep R-semantic silent
> coercions
> >> such as
> >> logical < integer < double < complex
> >>
> >> and your pseudo code below would not work so easily I think.
> >>
> >> > the pseudo code could be
> >>
> >> (I'm changing assignment '=' to '<-' ... [please!] )
> >>
> >> > ifelse2 <- function(test, yes, no, outclass) {
> >> > lenout <- length(test)
> >> > out <- as( rep(yes, length.out <- lenout), outclass)
> >> > out[!test] <- as(rep(no, length.out = lenout)[!test],
> outclass)
> >> > # handle NA stuff
> >> > out
> >> > }
> >>
> >>
> >> > NAs could be tricky if outclass were allowed to be completely
> >> general, but
> >> > doable, I think? Another approach if we ARE fast-passing
> while
> >> leaving
> >> > ifelse intact is that maybe NA's in test just aren't allowed
> in
> >> ifelse2.
> >> > I'm not saying we should definitely do that, but it's
> possible and
> >> would
> >> > make things faster.
> >>
> >> > Finally, In terms of efficiency, with the stuff that Luke
> and I are
> >> working
> >> > on, the NA detection could be virtually free in certain
> cases, which
> >> could
> >> > give a nice boost for long vectors that don't have any NAs
> (and
> >> 'know'
> >> > that they don't).
> >>
> >> That *is* indeed a very promising prospect!
> >> Thank you in advance!
> >>
> >> > Best,
> >> > ~G
> >>
> >> I still am bit disappointed by the fact that it seems nobody has
> >> taken a good look at my ifelse2() proposal.
> >>
>
> > I plan to look at it soon. Thanks again for all your work.
>
> > ~G
>
>
> >>
> >> I really would like an alternative to ifelse() in *addition* to
> >> the current ifelse(), but hopefully in the future being used in
> >> quite a few places instead of ifelse()
> >> efficiency but for changed semantics, namely working for
> considerably
> >> more "vector like" classes of 'yes' and 'no' than the current
> >> ifelse().
> >>
> >> As I said, the current proposal works for objects of class
> >> "Date", "POSIXct", "POSIXlt", "factor", "mpfr" (pkg 'Rmpfr')
> >> and hopefully for "sparseVector" (in a next version of the 'Matrix'
> pkg).
> >>
> >> Martin
> >>
> >> > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
> >> maechler at stat.math.ethz.ch
> >> >> wrote:
> >>
> >> >> Finally getting back to this :
> >> >>
> >> >> >>>>> Hadley Wickham <h.wickham at gmail.com>
> >> >> >>>>> on Mon, 15 Aug 2016 07:51:35 -0500 writes:
> >> >>
> >> >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
> >> >> > <h.wickham at gmail.com> wrote:
> >> >> >>> >> One possibility would also be to consider a
> >> >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
> >> >> >>> would also work for characters} >> version.
> >> >> >>>
> >> >> >>> > I don't know what you mean by these.
> >> >> >>>
> >> >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
> >> >> >>> which is very relevant, thank you Bob!
> >> >> >>>
> >> >> >>> As I have found, that actually works in such a "same
> >> >> >>> type"-only way: It does not try to coerce, but gives an
> >> >> >>> error when the classes differ, even in this somewhat
> >> >> >>> debatable case :
> >> >> >>>
> >> >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
> >> >> >>> `false` has type 'double' not 'integer'
> >> >> >>> >
> >> >> >>>
> >> >> >>> As documented, if_else() is clearly stricter than
> >> >> >>> ifelse() and e.g., also does no recycling (but of
> >> >> >>> length() 1).
> >> >> >>
> >> >> >> I agree that if_else() is currently too strict - it's
> >> >> >> particularly annoying if you want to replace some values
> >> >> >> with a missing:
> >> >> >>
> >> >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
> >> >> >> has type 'integer' not 'logical'
> >> >> >>
> >> >> >> But I would like to make sure that this remains an error:
> >> >> >>
> >> >> >> if_else(x > 5, x, "BLAH")
> >> >> >>
> >> >> >> Because that seems more likely to be a user error (but
> >> >> >> reasonable people might certainly believe that it should
> >> >> >> just work)
> >> >> >>
> >> >> >> dplyr is more accommodating in other places (i.e. in
> >> >> >> bind_rows(), collapse() and the joins) but it's
> >> >> >> surprisingly hard to get all the details right. For
> >> >> >> example, what should the result of this call be?
> >> >> >>
> >> >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
> >> >> >> factor(c("c", "b"))
> >> >> >>
> >> >> >> Strictly speaking I think you could argue it's an error,
> >> >> >> but that's not very user-friendly. Should it be a factor
> >> >> >> with the union of the levels? Should it be a character
> >> >> >> vector + warning? Should the behaviour change if one set
> >> >> >> of levels is a subset of the other set?
> >> >> >>
> >> >> >> There are similar issues for POSIXct (if the time zones
> >> >> >> are different, which should win?), and difftimes
> >> >> >> (similarly for units). Ideally you'd like the behaviour
> >> >> >> to be extensible for new S3 classes, which suggests it
> >> >> >> should be a generic (and for the most general case, it
> >> >> >> would need to dispatch on both arguments).
> >> >>
> >> >> > One possible principle would be to use c() -
> >> >> > i.e. construct out as
> >> >>
> >> >> > out <- c(yes[0], no[0]
> >> >> > length(out) <- max(length(yes), length(no))
> >> >>
> >> >> yes; this would require that a `length<-` method works for
> the
> >> >> class of the result.
> >> >>
> >> >> Duncan Murdoch mentioned a version of this, in his very
> >> >> first reply:
> >> >>
> >> >> ans <- c(yes, no)[seq_along(test)]
> >> >> ans <- ans[seq_along(test)]
> >> >>
> >> >> which is less efficient for atomic vectors, but requires
> >> >> less from the class: it "only" needs `c` and `[` to work
> >> >>
> >> >> and a mixture of your two proposals would be possible too:
> >> >>
> >> >> ans <- c(yes[0], no[0])
> >> >> ans <- ans[seq_along(test)]
> >> >>
> >> >> which does *not* work for my "mpfr" numbers (CRAN package
> 'Rmpfr'),
> >> >> but that's a buglet in the c.mpfr() implementation of my
> Rmpfr
> >> >> package... (which has already been fixed in the development
> version
> >> on
> >> >> R-forge,
> >> >> https://r-forge.r-project.org/R/?group_id=386)
> >> >>
> >> >> > But of course that wouldn't help with factor responses.
> >> >>
> >> >> Yes. However, a version of Duncan's suggestion -- of
> treating
> >> 'yes' first
> >> >> -- does help in that case.
> >> >>
> >> >> For once, mainly as "feasability experiment",
> >> >> I have created a github gist to make my current ifelse2()
> proposal
> >> >> available
> >> >> for commenting, cloning, pullrequesting, etc:
> >> >>
> >> >> Consisting of 2 files
> >> >> - ifelse-def.R : Functions definitions only, basically all
> the
> >> current
> >> >> proposals, called ifelse*()
> >> >> - ifelse-checks.R : A simplistic checking function
> >> >> and examples calling it, notably demonstrating that my
> >> >> ifelse2() does work with
> >> >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
> >> >> and "mpfr" (the arbitrary-precision numbers in my package
> "Rmpfr")
> >> >>
> >> >> Also if you are not on github, you can quickly get to the
> ifelse2()
> >> >> definition :
> >> >>
> >> >> https://gist.github.com/mmaechler/
> 9cfc3219c4b89649313bfe6853d878
> >> >> 94#file-ifelse-def-r-L168
> >> >>
> >> >> > Also, if you're considering an improved ifelse(), I'd
> >> >> > strongly urge you to consider adding an `na` argument,
> >> >>
> >> >> I now did (called it 'NA.').
> >> >>
> >> >> > so that you can use ifelse() to transform all three
> >> >> > possible values in a logical vector.
> >> >>
> >> >> > Hadley
> >> >> > -- http://hadley.nz
> >> >>
> >> >> For those who really hate GH (and don't want or cannot
> easily
> >> follow the
> >> >> above URL), here's my current definition:
> >> >>
> >> >>
> >> >> ##' Martin Maechler, 14. Nov 2016 --- taking into account
> Duncan M.
> >> and
> >> >> Hadley's
> >> >> ##' ideas in the R-devel thread starting at (my mom's 86th
> >> birthday):
> >> >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/
> 072970.html
> >> >> ifelse2 <- function (test, yes, no, NA. = NA) {
> >> >> if(!is.logical(test)) {
> >> >> if(is.atomic(test))
> >> >> storage.mode(test) <- "logical"
> >> >> else ## typically a "class"; storage.mode<-() typically
> fails
> >> >> test <- if(isS4(test)) methods::as(test, "logical") else
> >> >> as.logical(test)
> >> >> }
> >> >>
> >> >> ## No longer optimize the "if (a) x else y" cases:
> >> >> ## Only "non-good" R users use ifelse(.) instead of if(.)
> in these
> >> >> cases.
> >> >>
> >> >> ans <-
> >> >> tryCatch(rep(if(is.object(yes) && identical(class(yes),
> class(no)))
> >> >> ## as c(o) or o[0] may not work for the class
> >> >> yes else c(yes[0], no[0]), length.out =
> >> >> length(test)),
> >> >> error = function(e) { ## try asymmetric, yes-leaning
> >> >> r <- yes
> >> >> r[!test] <- no[!test]
> >> >> r
> >> >> })
> >> >> ok <- !(nas <- is.na(test))
> >> >> if (any(test[ok]))
> >> >> ans[test & ok] <- rep(yes, length.out = length(ans))[test &
> ok]
> >> >> if (any(!test[ok]))
> >> >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test
> & ok]
> >> >> ans[nas] <- NA. # possibly coerced to class(ans)
> >> >> ans
> >> >> }
> >> >>
> >> >> ______________________________________________
> >> >> R-devel at r-project.org mailing list
> >> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >>
> >>
> >>
> >>
> >> > --
> >> > Gabriel Becker, PhD
> >> > Associate Scientist (Bioinformatics)
> >> > Genentech Research
> >>
> >> > [[alternative HTML version deleted]]
> >>
> >>
>
>
> > --
> > Gabriel Becker, PhD
> > Associate Scientist (Bioinformatics)
> > Genentech Research
>
> > [[alternative HTML version deleted]]
>
> > ______________________________________________
> > R-devel at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
--
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research
[[alternative HTML version deleted]]
More information about the R-devel
mailing list