[Rd] Time to revisit ifelse ?

Fri Aug 1 19:29:55 CEST 2025

Andre,

There is a new thread (of length one, sadly), which you should read:

     https://stat.ethz.ch/pipermail/r-devel/2025-July/084113.html

The function of mine that you have been testing was just a fast prototype,
and much work has been done in the mean time.  Can you give the current
proposal (ifelse::ifelse1) a try and let us know if anything stands out?

Mikael

On 2025-08-01 1:13 pm, GILLIBERT, Andre wrote:
> Martin Maechler <maechler using stat.math.ethz.ch> wrote:
>> I don't mind putting together a minimal package with some prototypes, tests,
>> comparisons, etc.  But perhaps we should aim for consensus on a few issues
>> beforehand.  (Sorry if these have been discussed to death already elsewhere.
>> In that case, links to relevant threads would be helpful ...)
>>
>>       1. Should the type and class attribute of the return value be exactly the
>>          type and class attribute of c(yes[0L], no[0L]), independent of 'test'?
>>          Or something else?
>>
>>       2. What should be the attributes of the return value (other than 'class')?
>>
>>          base::ifelse keeps attributes(test) if 'test' is atomic, which seems
>>          like desirable behaviour, though dplyr and data.table seem to think
>>          otherwise:
> 
> In my experience, base::ifelse keeping attributes of 'test' is useful for names.
> It may also be useful for dimensions, but for other attributes, it may be a dangerous feature.
> Otherwise, attributes of c(yes, no) should be mostly preserved in my opinion.
> 
>> 3. Should the new function be stricter and/or more verbose?  E.g., should
>>          it signal a condition if length(yes) or length(no) is not equal to 1
>>          nor length(test)?
> 
> To be consistent with base R, it should warn if length(yes), length(no) and length(test) are not divisors of the longest, otherwise silently repeat the three vectors to get the same sizes.
> This would work consistently with mathematical operators such as test+yes+no.
> 
> In my personal experience, the truncation of 'yes' and 'no' to length(test) if the most dangerous feature of ifelse().
> 
>>       4. Should the most common case, in which neither 'yes' nor 'no' has a
>>          'class' attribute, be handled in C?  The remaining cases might rely on
>>         method dispatch and thus require a separate "generic" implementation in
>>       R.  How much faster/more efficient would the C implementation have to
>>         be to justify the cost (more maintenance for R-core, more obfuscation
>>        for the average user)?
> 
> If the function is not much slower than today ifelse(), it is not worth rewriting in C in my opinion.
> 
> Thank you for an implementation!
> A few examples of misbehaviors (in my opinion):
> 
>> ifelse2(c(a=TRUE), factor("a"), factor("b"))
> Error in as.character.factor(x) : malformed factor
> 
>> ifelse2(TRUE, factor(c("a","b")), factor(c("b","a")))
> [1] a
> Levels: a b
> 
> I would expect this one to output
> [1] a b
> Levels: a b
> 
> I tried to develop a function that behaves like mathematical operators (e.g. test+yes+no) for length & dimensions coercion rules.
> Please, find the function and a few tests below:
> 
> ifelse2 <- function (test, yes, no) {
> 	# forces evaluation of arguments in order
> 	test
> 	yes
> 	no
> 
> 	if (is.atomic(test)) {
> 		if (!is.logical(test))
> 			storage.mode(test) <- "logical"
> 	}
> 	else test <- if (isS4(test)) methods::as(test, "logical") else as.logical(test)
> 
> 	ntest <- length(test)
> 	nyes <- length(yes)
> 	nno <- length(no)
> 
> 	nn <- c(ntest, nyes, nno)
>        nans <- max(nn)
> 
> 	ans <- rep(c(yes[0L], no[0L]), length.out=nans)
> 
> 	# check dimension consistency for arrays
> 	has.dim <- FALSE
> 	if (length(dim(test)) | length(dim(yes)) | length(dim(no))) {
> 		lparams <- list(test, yes, no)
> 		ldims <- lapply(lparams, dim)
> 		ldims <- ldims[!sapply(ldims, is.null)]
> 		ldimnames <- lapply(lparams, dimnames)
> 		ldimnames <- ldimnames[!sapply(ldimnames, is.null)]
> 
> 		rdim <- ldims[[1]]
> 		rdimnames <- ldimnames[[1]]
> 		for(d in ldims) {
> 			if (!identical(d, rdim)) {
> 				stop(gettext("non-conformable arrays"))
> 			}
> 		}
> 		has.dim <- TRUE
> 	}
> 
> 	if (any(nans %% nn)) {
> 		warning(gettext("longer object length is not a multiple of shorter object length"))
> 	}
> 
> 	if (ntest != nans) {test <- rep(test, length.out=nans)}
> 	if (nyes != nans) {yes <- rep(yes, length.out=nans)}
> 	if (nno != nans) {no <- rep(no, length.out=nans)}
> 
> 	idx <- which( test)
> 	ans[idx] <- yes[idx]
> 
> 	idx <- which(!test)
> 	ans[idx] <- no[idx]
> 
> 	if (has.dim) {
> 		dim(ans) <- rdim
> 		dimnames(ans) <- rdimnames
> 	}
> 
> 	if (!is.null(names(test))) {
> 		names(ans) <- names(test)
> 	}
> 
> 	ans
> }
> 
> 
> ifelse2(c(alpha=TRUE,beta=TRUE,gamma=FALSE),factor(c("A","B","C","X")),factor(c("A","B","C","D")))
> ifelse2(c(TRUE,FALSE), as.Date("2025-04-01"), c("2020-07-05", "2022-07-05"))
> ifelse2(c(a=TRUE, b=FALSE,c=TRUE,d=TRUE), list(42), list(40,45))
> ifelse2(rbind(alpha=c(a=TRUE, b=FALSE),beta=c(c=TRUE,d=FALSE)), list(1:10), list(2:20,3:30))
> a=rbind(alpha=c(a=TRUE, b=FALSE),beta=c(TRUE,TRUE))
> b=rbind(ALPHA=c(A=TRUE, B=FALSE),BETA=c(C=TRUE,D=TRUE))
> c=rbind(ALPHA2=c(A2=TRUE, B2=FALSE),BETA2=c(C2=TRUE,D2=TRUE))
> ifelse2(a,b,c)
> dimnames(a) <- NULL
> ifelse2(a,b,c)
> dimnames(b) <- NULL
> ifelse2(a,b,c)
>