[Rd] ifelse() woes ... can we agree on a ifelse2() ?

Hadley Wickham h.wickham at gmail.com
Mon Aug 15 14:51:35 CEST 2016


On Fri, Aug 12, 2016 at 11:31 AM, Hadley Wickham <h.wickham at gmail.com> wrote:
>>     >> One possibility would also be to consider  a "numbers-only" or
>>     >> rather "same type"-only {e.g., would also work for characters}
>>     >> version.
>>
>>     > I don't know what you mean by these.
>>
>> In the mean time, Bob Rudis mentioned   dplyr::if_else(),
>> which is very relevant, thank you Bob!
>>
>> As I have found, that actually works in such a "same type"-only way:
>> It does not try to coerce, but gives an error when the classes
>> differ, even in this somewhat debatable case :
>>
>>    > dplyr::if_else(c(TRUE, FALSE), 2:3, 0+10:11)
>>    Error: `false` has type 'double' not 'integer'
>>    >
>>
>> As documented, if_else() is clearly stricter than ifelse()
>> and e.g., also does no recycling (but of length() 1).
>
> I agree that if_else() is currently too strict - it's particularly
> annoying if you want to replace some values with a missing:
>
> x <- sample(10)
> if_else(x > 5, NA, x)
> #  Error: `false` has type 'integer' not 'logical'
>
> But I would like to make sure that this remains an error:
>
> if_else(x > 5, x, "BLAH")
>
> Because that seems more likely to be a user error (but reasonable
> people might certainly believe that it should just work)
>
> dplyr is more accommodating in other places (i.e. in bind_rows(),
> collapse() and the joins) but it's surprisingly hard to get all the
> details right. For example, what should the result of this call be?
>
> if_else(c(TRUE, FALSE), factor(c("a", "b")), factor(c("c", "b"))
>
> Strictly speaking I think you could argue it's an error, but that's
> not very user-friendly. Should it be a factor with the union of the
> levels? Should it be a character vector + warning? Should the
> behaviour change if one set of levels is a subset of the other set?
>
> There are similar issues for POSIXct (if the time zones are different,
> which should win?), and difftimes (similarly for units).  Ideally
> you'd like the behaviour to be extensible for new S3 classes, which
> suggests it should be a generic (and for the most general case, it
> would need to dispatch on both arguments).

One possible principle would be to use c() - i.e. construct out as

out <- c(yes[0], no[0]
length(out) <- max(length(yes), length(no))

But of course that wouldn't help with factor responses.

Also, if you're considering an improved ifelse(), I'd strongly urge
you to consider adding an `na` argument, so that you can use ifelse()
to transform all three possible values in a logical vector.

Hadley

-- 
http://hadley.nz



More information about the R-devel mailing list