[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Mon Nov 28 16:58:57 CET 2016

Well, that's embarrassing. Sorry for the noise on that front, everyone. I
misunderstood something from the aforementioned unrelated conversation I
was having, but not double checking is on me (I rarely use if else and when
I do I avoid that situation in my own code, which is why I didn't already
know this)

I'd still argue that situation should at least warn, possibly error, as it
seems indicative of a bug in the user's code.

On Mon, Nov 28, 2016 at 7:00 AM, Martin Maechler <maechler at stat.math.ethz.ch
> wrote:

> >>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
> >>>>>     on Sat, 26 Nov 2016 17:14:01 +0000 writes:
>
>     > Just stating, in 'ifelse', 'test' is not recycled. As I said in
> "R-intro: length of 'ifelse' result" (https://stat.ethz.ch/
> pipermail/r-devel/2016-September/073136.html), ifelse(condition, a, b)
> returns a vector of the length of 'condition', even if 'a' or 'b' is longer.
>
> yes and  ?ifelse (the help page) also does not say that test is
> recycled, rather
>
>    >>  If \code{yes} or \code{no} are too short, their elements are
> recycled.
>
> (*and* the problem you wrote the above has been corrected in the
>  R-intro manual shortly after).
>
>
>     > On current 'ifelse' code in R:
>
>     > * The part
>     > ans[nas] <- NA
>     > could be omitted because NA's are already in place.
>     > If the part is removed, variable 'nas' is no longer used.
>
> I agree that this seems logical.  If I apply the change, R's own
> full checks do not seem affected, and I may try to commit that
> change and "wait and see".
>
>
>     > * The any(*) part actually checks the thing that is used as the
> index vector. The index vector could be stored and then repeatedly used,
> like the following.
>
>     >     if (any(sel <- test & ok))
>     >     ans[sel] <- rep(yes, length.out = length(ans))[sel]
>
> yes, I know, and have had similar thoughts in the past.
> However note  (I know you that) the current code is
>
>     if (any(test[ok]))
>         ans[test & ok] <- rep(yes, length.out = length(ans))[test & ok]
>
> and   any(test[ok])          may be considerably faster than
>       any(sel <- test & ok)
>
> OTOH I think the current code would only be faster (for the
> above) when any(.) returned FALSE ...
> I think it may depend on the typical use cases which of the two
> versions is more efficient.
>
>
>     > * If 'test' is a factor, doing
>     > storage.mode(test) <- "logical"
>     > is not appropriate, but is.atomic(test) returns TRUE. Maybe use
>     > if(!is.object(test))
>     >    instead of
>     > if(is.atomic(test)) .
>
> This would be a considerable change I think...
> Note that I'm currently really proposing to introduce an *additional*
> ifelse function with different "more reasonable" semantic,  and
> your last change would do that too.
>
> My alternative should really work
> - for factors
> - for "array"s including "matrix" (as the current ifelse() does!)
> - for "Date", "POSIXct", "ts"(timeseries), "zoo",
>       "sparseVector", "sparseMatrix" (*), or "mpfr",
>   without any special code, but rather by design.
>
>  *) Currently needs the R-forge version of  Matrix, version 1.2-8.
>
> A bit less than an hour ago, I have updated the gist with an updated
> proposal ifelse2() {and the current alternatives that I know},
> modified so it *does* keep more, e.g.  dim() attributes in
> reasonable cases.
>
> https://gist.github.com/mmaechler/9cfc3219c4b89649313bfe6853d878
> 94#file-ifelse-def-r-L168
>
> Hence my ifelse2() became even a bit longer (but not slower)
> working for even more classes of  "yes" and "no".
>
>
>     > On ifelse-checks.R:
>     > * In function 'chkIfelse', if the fourth function argument names is
> not "NA.", the argument name is changed, but the function body still uses
> the old name. That makes error in chkIfelse(ifelseHW) .
>     > A fix:
>     >         if(names(formals(FUN))[[4]] != "NA.") {
>     >             body(FUN) <- do.call(substitute, list(body(FUN),
>     >                 setNames(list(quote(NA.)),
> names(formals(FUN))[[4]])))
>     >             names(formals(FUN))[[4]] <- "NA."
>     >         }
>
> yes, thank you!  (a bit embarrassing for me ..)
>
>     > After fixing, chkIfelse(ifelseHW) just fails at identical(iflt,
> as.POSIXlt(ifct)) .
>     > 'iflt' has NA as 'tzone' and 'isdst' components.
>     > * Because function 'chkIfelse' continues checking after failure,
>     > as.POSIXlt(ifct)
>     > may give error. The error happens, for example, in
> chkIfelse(ifelseR) . Maybe place it inside try(...).
>     > * If 'lt' is a "POSIXlt" object, (lt-100) is a "POSIXct" object.
>     > So,
>     > FUN(c(TRUE, FALSE, NA, TRUE), lt, lt-100)
>     > is an example of mixed class.
>
> good; thank you for the hint.
>
>     > * The part of function 'chkIfelse' in
>     > for(i in seq_len(nFact))
>     > uses 'NA.' function argument. That makes error when 'chkIfelse' is
> applied to function without fourth argument.
>     > The part should be wrapped in
>     > if(has.4th) .
> yes of course
>
>     > * Function 'ifelseJH' has fourth argument, but the argument is not
> for value if NA. So, instead of
>     > chkIfelse(ifelseJH) ,
>     > maybe call
>     > chkIfelse(function(test, yes, no) ifelseJH(test, yes, no)) .
> You are right;
> I've decided to solve this differently.
>
> I'm looking at these suggestions now, notably also your proposals below;
> thank you, Suharto!
>
> (I wanted to put my improved 'ifelse2' out first, quickly).
> Martin
>
>
>     > A concrete version of 'ifelse2' that starts the result from 'yes':
>     > function(test, yes, no, NA. = NA) {
>     >     if(!is.logical(test))
>     >         test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
>     >     n <- length(test)
>     >     ans <- rep(yes, length.out = n)
>     >     ans[!test & !is.na(test)] <- rep(no, length.out = n)[!test & !
> is.na(test)]
>     >     ans[is.na(test)] <- rep(NA., length.out = n)[is.na(test)]
>     >     ans
>     > }
>
>     > It requires 'rep' method that is compatible with subsetting. It also
> works with "POSIXlt" in R 2.7.2, when 'length' gives 9, and gives an
> appropriate result if time zones are the same.
>     > For coercion of 'test', there is no need of keeping attributes. So,
> it doesn't do
>     > storage.mode(test) <- "logical"
>     > and goes directly to 'as.logical'.
>     > It relies on subassignment for silent coercions of
>     > logical < integer < double < complex .
>     > Unlike 'ifelse', it never skips any subassignment. So, phenomenon as
> in "example of different return modes" in ?ifelse doesn't happen.
>
>     > Another version, for keeping attributes as pointed out by Duncan
> Murdoch:
>     > function(test, yes, no, NA. = NA) {
>     >     if(!is.logical(test))
>     >         test <- if(isS4(test)) methods::as(test, "logical") else
> as.logical(test)
>     >     n <- length(test)
>     >     n.yes <- length(yes); n.no <- length(no)
>     >     if (n.yes != n) {
>     >         if (n.no == n) {  # swap yes <-> no
>     >             test <- !test
>     >             ans <- yes; yes <- no; no <- ans
>     >             n.no <- n.yes
>     >         } else yes <- yes[rep_len(seq_len(n.yes), n)]
>     >     }
>     >     ans <- yes
>     >     if (n.no == 1L)
>     >         ans[!test] <- no
>     >     else
>     >         ans[!test & !is.na(test)] <- no[
>     >             if (n.no == n) !test & !is.na(test)
>     >             else rep_len(seq_len(n.no), n)[!test & !is.na(test)]]
>     >     stopifnot(length(NA.) == 1L)
>     >     ans[is.na(test)] <- NA.
>     >     ans
>     > }
>
>     > Note argument evaluation order: 'test', 'yes', 'no', 'NA.'.
>     > First, it chooses the first of 'yes' and 'no' that has the same
> length as the result. If none of 'yes' and 'no' matches the length of the
> result, it chooses recycled (or truncated) 'yes'.
>     > It uses 'rep' on the index and subsetting as a substitute for 'rep'
> on the value.
>     > It requires 'length' method that is compatible with subsetting.
>     > Additionally, it uses the same idea as dplyr::if_else, or more
> precisely the helper function 'replace_with'. It doesn't use 'rep' if the
> length of 'no' is 1 or is the same as the length of the result. For
> subassignment with value of length 1, recycling happens by itself and NA in
> index is OK.
>     > It limits 'NA.' to be of length 1, considering 'NA.' just as a label
> for NA.
>
>     > Cases where the last version above or 'ifelse2 or 'ifelseHW' in
> ifelse-def.R gives inappropriate answers:
>     > - 'yes' and 'no' are "difftime" objects with different "units"
> attribute
>     > - 'yes' and 'no' are "POSIXlt" objects with different time zone
>     > Example: 'yes' in "UTC" and 'no' in "EST5EDT". The reverse, 'yes' in
> "EST5EDT" and 'no' in "UTC" gives error.
>
>     > For the cases, c(yes, no) helps. Function 'ifelseJH' in ifelse-def.R
> gives a right answer for "POSIXlt" case.
>     > ---------------------
>     > Martin et al.,
>
>
>
>
>     > On Tue, Nov 22, 2016 at 2:12 AM, Martin Maechler <maechler at
> stat.math.ethz.ch
>     >> wrote:
>
>     >>
>     >> Note that my premise was really to get *away* from inheriting
>     >> too much from 'test'.
>     >> Hence, I have *not* been talking about replacing ifelse() but
>     >> rather of providing a new  ifelse2()
>     >>
>     >>         [ or if_else()  if Hadley was willing to ditch the dplyr one
>     >>                         in favor of a base one]
>     >>
>     >>      > Specifically, based on an unrelated discussion with Henrik
> Bengtsson
>     >> on
>     >>      > Twitter, I wonder if preserving the recycling behavior test
> is
>     >> longer than
>     >>      > yes, no, but making the case where
>     >>
>     >>      > length( test ) < max(length( yes ), length( no ))
>     >>
>     >>      > would simplify usage for userRs in a useful way.
>     >>
>
>     > That was a copyediting bug on my part, it seems I hit send with my
> message
>     > only half-edited/proofread. Apologies.
>
>     > That should have said that making the case where test is the one
> that will
>     > be recycled (because it is shorter than either yes or no) an error.
> My
>     > claim is that the fact that test itself can be recycled, rather than
> just
>     > yes or no, is confusing to many R users. If we are writing an
> ifelse2 we
>     > might want to drop that feature and just throw an error in that case.
>     > (Users could still use the original ifelse if they understand and
>     > specifically want that behavior).
>
>     > Does that make more sense?
>
>
>
>     >>
>     >>      > Also, If we combine a stricter contract that the output will
> always
>     >> be of
>     >>      > length with the suggestion of a specified output class
>     >>
>     >>
>     > Here, again, I was talking about the restriction that the output be
>     > guaranteed to be the length of test, regardless of the length of yes
> and
>     > no. That, combined with a specific, guaranteed output class would
> make a
>     > much narrower/more restricted but also (I argue) much easier to
> understand
>     > function. Particularly for beginning and intermediate users.
>
>     > I do hear what you're saying about silent conversion, though, so
> what I'm
>     > describing might be a third function (ifelse3 for lack of a better
> name for
>     > now), as you pointed out.
>
>
>     >> that was not my intent here.... but would be another interesting
>     >> extension. However, I would like to keep  R-semantic silent
> coercions
>     >> such as
>     >>            logical < integer < double < complex
>     >>
>     >> and your pseudo code below would not work so easily I think.
>     >>
>     >>      > the pseudo code could be
>     >>
>     >> (I'm changing assignment '=' to  '<-' ...  [please!] )
>     >>
>     >>      > ifelse2 <- function(test, yes, no, outclass) {
>     >>      >   lenout  <- length(test)
>     >>      >   out <- as( rep(yes, length.out <- lenout), outclass)
>     >>      >   out[!test] <- as(rep(no, length.out = lenout)[!test],
> outclass)
>     >>      >   # handle NA stuff
>     >>      >   out
>     >>      > }
>     >>
>     >>
>     >>      > NAs could be tricky if outclass were allowed to be completely
>     >> general, but
>     >>      > doable, I think? Another approach  if we ARE fast-passing
> while
>     >> leaving
>     >>      > ifelse intact is that maybe NA's in test just aren't allowed
> in
>     >> ifelse2.
>     >>      > I'm not saying we should definitely do that, but it's
> possible and
>     >> would
>     >>      > make things faster.
>     >>
>     >>      > Finally, In terms of efficiency, with the stuff that Luke
> and I are
>     >> working
>     >>      > on, the NA detection could be virtually free in certain
> cases, which
>     >> could
>     >>      > give a nice boost for long vectors  that don't have any NAs
> (and
>     >> 'know'
>     >>      > that they don't).
>     >>
>     >> That *is* indeed a very promising prospect!
>     >> Thank you in advance!
>     >>
>     >>      > Best,
>     >>      > ~G
>     >>
>     >> I still am bit disappointed by the fact that it seems nobody has
>     >> taken a good look at my ifelse2() proposal.
>     >>
>
>     > I plan to look at it soon. Thanks again for all your work.
>
>     > ~G
>
>
>     >>
>     >> I really would like an alternative to ifelse() in *addition* to
>     >> the current ifelse(), but hopefully in the future being used in
>     >> quite a few places instead of ifelse()
>     >> efficiency but for changed semantics, namely working for
> considerably
>     >> more "vector like" classes of  'yes' and 'no'  than the current
>     >> ifelse().
>     >>
>     >> As I said, the current proposal works for objects of class
>     >>     "Date", "POSIXct", "POSIXlt", "factor",  "mpfr" (pkg 'Rmpfr')
>     >> and hopefully for "sparseVector" (in a next version of the 'Matrix'
> pkg).
>     >>
>     >> Martin
>     >>
>     >>      > On Tue, Nov 15, 2016 at 3:58 AM, Martin Maechler <
>     >> maechler at stat.math.ethz.ch
>     >>      >> wrote:
>     >>
>     >>      >> Finally getting back to this :
>     >>      >>
>     >>      >> >>>>> Hadley Wickham <h.wickham at gmail.com>
>     >>      >> >>>>>     on Mon, 15 Aug 2016 07:51:35 -0500 writes:
>     >>      >>
>     >>      >> > On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham
>     >>      >> > <h.wickham at gmail.com> wrote:
>     >>      >> >>> >> One possibility would also be to consider a
>     >>      >> >>> "numbers-only" or >> rather "same type"-only {e.g.,
>     >>      >> >>> would also work for characters} >> version.
>     >>      >> >>>
>     >>      >> >>> > I don't know what you mean by these.
>     >>      >> >>>
>     >>      >> >>> In the mean time, Bob Rudis mentioned dplyr::if_else(),
>     >>      >> >>> which is very relevant, thank you Bob!
>     >>      >> >>>
>     >>      >> >>> As I have found, that actually works in such a "same
>     >>      >> >>> type"-only way: It does not try to coerce, but gives an
>     >>      >> >>> error when the classes differ, even in this somewhat
>     >>      >> >>> debatable case :
>     >>      >> >>>
>     >>      >> >>> > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11) Error:
>     >>      >> >>> `false` has type 'double' not 'integer'
>     >>      >> >>> >
>     >>      >> >>>
>     >>      >> >>> As documented, if_else() is clearly stricter than
>     >>      >> >>> ifelse() and e.g., also does no recycling (but of
>     >>      >> >>> length() 1).
>     >>      >> >>
>     >>      >> >> I agree that if_else() is currently too strict - it's
>     >>      >> >> particularly annoying if you want to replace some values
>     >>      >> >> with a missing:
>     >>      >> >>
>     >>      >> >> x <- sample(10) if_else(x > 5, NA, x) # Error: `false`
>     >>      >> >> has type 'integer' not 'logical'
>     >>      >> >>
>     >>      >> >> But I would like to make sure that this remains an error:
>     >>      >> >>
>     >>      >> >> if_else(x > 5, x, "BLAH")
>     >>      >> >>
>     >>      >> >> Because that seems more likely to be a user error (but
>     >>      >> >> reasonable people might certainly believe that it should
>     >>      >> >> just work)
>     >>      >> >>
>     >>      >> >> dplyr is more accommodating in other places (i.e. in
>     >>      >> >> bind_rows(), collapse() and the joins) but it's
>     >>      >> >> surprisingly hard to get all the details right. For
>     >>      >> >> example, what should the result of this call be?
>     >>      >> >>
>     >>      >> >> if_else(c(TRUE, FALSE), factor(c("a", "b")),
>     >>      >> >> factor(c("c", "b"))
>     >>      >> >>
>     >>      >> >> Strictly speaking I think you could argue it's an error,
>     >>      >> >> but that's not very user-friendly. Should it be a factor
>     >>      >> >> with the union of the levels? Should it be a character
>     >>      >> >> vector + warning? Should the behaviour change if one set
>     >>      >> >> of levels is a subset of the other set?
>     >>      >> >>
>     >>      >> >> There are similar issues for POSIXct (if the time zones
>     >>      >> >> are different, which should win?), and difftimes
>     >>      >> >> (similarly for units).  Ideally you'd like the behaviour
>     >>      >> >> to be extensible for new S3 classes, which suggests it
>     >>      >> >> should be a generic (and for the most general case, it
>     >>      >> >> would need to dispatch on both arguments).
>     >>      >>
>     >>      >> > One possible principle would be to use c() -
>     >>      >> > i.e. construct out as
>     >>      >>
>     >>      >> > out <- c(yes[0], no[0]
>     >>      >> > length(out) <- max(length(yes), length(no))
>     >>      >>
>     >>      >> yes; this would require that a  `length<-` method works for
> the
>     >>      >> class of the result.
>     >>      >>
>     >>      >> Duncan Murdoch mentioned a version of this, in his very
>     >>      >> first reply:
>     >>      >>
>     >>      >> ans <- c(yes, no)[seq_along(test)]
>     >>      >> ans <- ans[seq_along(test)]
>     >>      >>
>     >>      >> which is less efficient for atomic vectors, but requires
>     >>      >> less from the class: it "only" needs `c` and `[` to work
>     >>      >>
>     >>      >> and a mixture of your two proposals would be possible too:
>     >>      >>
>     >>      >> ans <- c(yes[0], no[0])
>     >>      >> ans <- ans[seq_along(test)]
>     >>      >>
>     >>      >> which does *not* work for my "mpfr" numbers (CRAN package
> 'Rmpfr'),
>     >>      >> but that's a buglet in the  c.mpfr() implementation of my
> Rmpfr
>     >>      >> package... (which has already been fixed in the development
> version
>     >> on
>     >>      >> R-forge,
>     >>      >> https://r-forge.r-project.org/R/?group_id=386)
>     >>      >>
>     >>      >> > But of course that wouldn't help with factor responses.
>     >>      >>
>     >>      >> Yes.  However, a version of Duncan's suggestion -- of
> treating
>     >> 'yes' first
>     >>      >> -- does help in that case.
>     >>      >>
>     >>      >> For once, mainly as "feasability experiment",
>     >>      >> I have created a github gist to make my current ifelse2()
> proposal
>     >>      >> available
>     >>      >> for commenting, cloning, pullrequesting, etc:
>     >>      >>
>     >>      >> Consisting of 2 files
>     >>      >> - ifelse-def.R :  Functions definitions only, basically all
> the
>     >> current
>     >>      >> proposals, called  ifelse*()
>     >>      >> - ifelse-checks.R : A simplistic checking function
>     >>      >> and examples calling it, notably demonstrating that my
>     >>      >> ifelse2()  does work with
>     >>      >> "Date", <dateTime> (i.e. "POSIXct" and "POSIXlt"), factors,
>     >>      >> and "mpfr" (the arbitrary-precision numbers in my package
> "Rmpfr")
>     >>      >>
>     >>      >> Also if you are not on github, you can quickly get to the
> ifelse2()
>     >>      >> definition :
>     >>      >>
>     >>      >> https://gist.github.com/mmaechler/
> 9cfc3219c4b89649313bfe6853d878
>     >>      >> 94#file-ifelse-def-r-L168
>     >>      >>
>     >>      >> > Also, if you're considering an improved ifelse(), I'd
>     >>      >> > strongly urge you to consider adding an `na` argument,
>     >>      >>
>     >>      >> I now did (called it 'NA.').
>     >>      >>
>     >>      >> > so that you can use ifelse() to transform all three
>     >>      >> > possible values in a logical vector.
>     >>      >>
>     >>      >> > Hadley
>     >>      >> > -- http://hadley.nz
>     >>      >>
>     >>      >> For those who really hate GH (and don't want or cannot
> easily
>     >> follow the
>     >>      >> above URL), here's my current definition:
>     >>      >>
>     >>      >>
>     >>      >> ##' Martin Maechler, 14. Nov 2016 --- taking into account
> Duncan M.
>     >> and
>     >>      >> Hadley's
>     >>      >> ##' ideas in the R-devel thread starting at (my mom's 86th
>     >> birthday):
>     >>      >> ##' https://stat.ethz.ch/pipermail/r-devel/2016-August/
> 072970.html
>     >>      >> ifelse2 <- function (test, yes, no, NA. = NA) {
>     >>      >> if(!is.logical(test)) {
>     >>      >> if(is.atomic(test))
>     >>      >> storage.mode(test) <- "logical"
>     >>      >> else ## typically a "class"; storage.mode<-() typically
> fails
>     >>      >> test <- if(isS4(test)) methods::as(test, "logical") else
>     >>      >> as.logical(test)
>     >>      >> }
>     >>      >>
>     >>      >> ## No longer optimize the  "if (a) x else y"  cases:
>     >>      >> ## Only "non-good" R users use ifelse(.) instead of if(.)
> in these
>     >>      >> cases.
>     >>      >>
>     >>      >> ans <-
>     >>      >> tryCatch(rep(if(is.object(yes) && identical(class(yes),
> class(no)))
>     >>      >> ## as c(o) or o[0] may not work for the class
>     >>      >> yes else c(yes[0], no[0]), length.out =
>     >>      >> length(test)),
>     >>      >> error = function(e) { ## try asymmetric, yes-leaning
>     >>      >> r <- yes
>     >>      >> r[!test] <- no[!test]
>     >>      >> r
>     >>      >> })
>     >>      >> ok <- !(nas <- is.na(test))
>     >>      >> if (any(test[ok]))
>     >>      >> ans[test & ok] <- rep(yes, length.out = length(ans))[test &
> ok]
>     >>      >> if (any(!test[ok]))
>     >>      >> ans[!test & ok] <- rep(no, length.out = length(ans))[!test
> & ok]
>     >>      >> ans[nas] <- NA. # possibly coerced to class(ans)
>     >>      >> ans
>     >>      >> }
>     >>      >>
>     >>      >> ______________________________________________
>     >>      >> R-devel at r-project.org mailing list
>     >>      >> https://stat.ethz.ch/mailman/listinfo/r-devel
>     >>      >>
>     >>
>     >>
>     >>
>     >>      > --
>     >>      > Gabriel Becker, PhD
>     >>      > Associate Scientist (Bioinformatics)
>     >>      > Genentech Research
>     >>
>     >>      > [[alternative HTML version deleted]]
>     >>
>     >>
>
>
>     > --
>     > Gabriel Becker, PhD
>     > Associate Scientist (Bioinformatics)
>     > Genentech Research
>
>     >     [[alternative HTML version deleted]]
>
>     > ______________________________________________
>     > R-devel at r-project.org mailing list
>     > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

-- 
Gabriel Becker, PhD
Associate Scientist (Bioinformatics)
Genentech Research

	[[alternative HTML version deleted]]