[Rd] complex NA's match(), etc: not back-compatible change proposal
Martin Maechler
maechler at stat.math.ethz.ch
Wed May 11 10:00:44 CEST 2016
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Tue, 10 May 2016 16:08:39 +0200 writes:
> This is an RFC / announcement related to the 2nd part of PR#16885
> https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16885
> about complex NA's.
> The (somewhat rare) incompatibility in R's 3.3.0 match() behavior for the
> case of complex numbers with NA & NaN's {which has been fixed for R 3.3.0
> patched in the mean time} triggered some more comprehensive "research".
> I found that we have had a long-standing inconsistency at least between the
> documented and the real behavior. I am claiming that the documented
> behavior is desirable and hence R's current "real" behavior is bugous, and
> I am proposing to change it, in R-devel (to be 3.4.0) for now.
After the "roaring unanimous" assent (one private msg
encouraging me to go forward, no dissenting voice, hence an
"odds ratio" of +Inf in favor ;-)
I have now committed my proposal to R-devel (svn rev. 70597) and
some of us will be seeing the effect in package space within a
day or so, in the CRAN checks against R-devel (not for
bioconductor AFAIK; their checks using R-devel only when it less
than ca 6 months from release).
It's still worthwhile to discuss the issue, if you come late
to it, notably as ---paraphrasing Dirk on the R-package-devel list---
the release of 3.4.0 is almost a year away, and so now is the
best time to tinker with the API, in other words, consider breaking
rarely used legacy APIs..
Martin
> In help(match) we have been saying
> | Exactly what matches what is to some extent a matter of definition.
> | For all types, \code{NA} matches \code{NA} and no other value.
> | For real and complex values, \code{NaN} values are regarded
> | as matching any other \code{NaN} value, but not matching \code{NA}.
> for at least 10 years. But we don't do that at all in the
> complex case (and AFAIK never got a bug report about it).
> Also, e.g., print(.) or format(.) do simply use "NA" for all
> the different complex NA-containing numbers, where OTOH,
> non-NA NaN's { <=> !is.nan(z) & is.na(z) }
> in format() or print() do show the NaN in real and/or imaginary
> parts; for an example, look at the "format" column of the matrix
> below, after 'print(cbind' ...
> The current match()---and duplicated(), unique() which are based on the same
> C code---*do* distinguish almost all complex NA / NaN's which is
> NOT according to documentation. I have found that this is just because of
> of our hashing function for the complex case, chash() in R/src/main/unique.c,
> is bogous in the sense that it is not compatible with the above documentation
> and also not with the cequal() function (in the same file uniqu.c) for checking
> equality of complex numbers.
> As I have found,, a *simplified* version of the chash() function
> to make it compatible with cequal() does solve all the problems I've
> indicated, and the current plan is to commit that change --- after some
> discussion time, here on R-devel --- to the code base.
> My change passes 'make check-all' fine, but I'm 100% sure that there will
> be effects in package-space. ... one reason for this posting.
> As mentioned above, note that the chash() function has been in
> use for all three functions
> match()
> duplicated()
> unique()
> and the change will affect all three --- but just for the case of complex
> vectors with NA or NaN's.
> To show more, a small R session -- using my version of R-devel
> == the proposition:
> The R script ('complex-NA-short.R') for (a bit more than) the
> session is attached {{you can attach text/plain easily}}:
>> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
>> ## --- = NA_real_ but that does not exist e.g., in R 2.3.1
>> ## similarly, '1L', '2L', .. do not exist e.g., in R 2.3.1
>> (z <- z[is.na(z)])
> [1] NA NaN+ 0i NA NaN+ 1i NA NA NA NA
> [9] 0+NaNi 1+NaNi NA NaN+NaNi
>> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ?
> + r <- matrix( , length(x), length(y))
> + for(i in seq(along=x))
> + for(j in seq(along=y))
> + r[i,j] <- identical(z[i], z[j], ...)
> + r
> + }
>> ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ:
>> ## a version that works in older versions of R, where identical() had fewer arguments!
>> outerID.picky <- function(x,y) {
> + nF <- length(formals(identical)) - 2
> + do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF))))
> + }
>> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is a wild guess
>> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R]
> [1,] | . . . . . . . . . . .
> [2,] . | . . . . . . . . . .
> [3,] . . | . . . . . . . . .
> [4,] . . . | . . . . . . . .
> [5,] . . . . | . . . . . . .
> [6,] . . . . . | . . . . . .
> [7,] . . . . . . | . . . . .
> [8,] . . . . . . . | . . . .
> [9,] . . . . . . . . | . . .
> [10,] . . . . . . . . . | . .
> [11,] . . . . . . . . . . | .
> [12,] . . . . . . . . . . . |
>> try(# for older R versions
> + stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1))
> + )
>> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_
> [1] 1 2 1 2 1 1 1 1 2 2 1 2
>> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
>> print(cbind(format = format(z), t(zRI), mz), quote=FALSE)
> format Re Im mz
> [1,] NA <NA> 0 1
> [2,] NaN+ 0i NaN 0 2
> [3,] NA <NA> 1 1
> [4,] NaN+ 1i NaN 1 2
> [5,] NA 0 <NA> 1
> [6,] NA 1 <NA> 1
> [7,] NA <NA> <NA> 1
> [8,] NA NaN <NA> 1
> [9,] 0+NaNi 0 NaN 2
> [10,] 1+NaNi 1 NaN 2
> [11,] NA <NA> NaN 1
> [12,] NaN+NaNi NaN NaN 2
>>
> -------------------------------
> Note that 'mz <- match(z, z)' and hence the last column of the matrix above
> are very different in current R,
> distinguishing most kinds of NA / NaN against the documentation (and the
> real/numeric case).
> Martin Maechler
> R Core Team
> ### Basically a shortened version of the PR#16885 -- complex part b)
> ### of R/tests/reg-tests-1c.R
> ## b) complex 'x' with different kinds of NaN
> x0 <- c(0,1, NA, NaN); z <- outer(x0,x0, complex, length.out=1); rm(x0)
> ## --- = NA_real_ but that does not exist e.g., in R 2.3.1
> ## similarly, '1L', '2L', .. do not exist e.g., in R 2.3.1
> (z <- z[is.na(z)])
> outerID <- function(x,y, ...) { ## ugly; can we get outer() to work ?
> r <- matrix( , length(x), length(y))
> for(i in seq(along=x))
> for(j in seq(along=y))
> r[i,j] <- identical(z[i], z[j], ...)
> r
> }
> ## Very strictly - in the sense of identical() -- these 12 complex numbers all differ:
> ## a version that works in older versions of R, where identical() had fewer arguments!
> outerID.picky <- function(x,y) {
> nF <- length(formals(identical)) - 2
> do.call("outerID", c(list(x, y), as.list(rep(FALSE, nF))))
> }
> oldR <- !exists("getRversion") || getRversion() < "3.0.0" ## << FIXME: 3.0.0 is a wild guess
> symnum(id.z <- outerID.picky(z,z)) ## == Diagonal matrix [newer versions of R]
> try(# for older R versions
> stopifnot(identical(id.z, outerID(z,z)), oldR || identical(id.z, diag(12) == 1))
> )
> (mz <- match(z, z)) # currently different {NA,NaN} patterns differ - not in print()/format() _FIXME_
> zRI <- rbind(Re=Re(z), Im=Im(z)) # and see the pattern :
> print(cbind(format = format(z), t(zRI), mz), quote=FALSE)
> ## compute match(z[i], z) , for i = 1,2,..,12 :
> (m1z <- sapply(z, match, table = z))
> ## 1 2 1 2 2 2 1 2 2 2 1 2 # R 1.2.3 (2001-04-26)
> ## 1 2 3 4 1 3 7 8 2 4 8 7 # R 1.4.1 (2002-01-30)
> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 1.5.1 (2002-06-17)
> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 1.8.1 (2003-11-21)
> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.0.1 (2004-11-15)
> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.1.1 (2005-06-20)
> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.3.1 (2006-06-01)
> ## 1 2 3 4 1 3 7 8 2 4 8 12 # R 2.5.1 (2007-06-27)
> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 2.10.1 (2009-12-14)
> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 3.1.1 (2014-07-10)
> ## 1 2 3 4 1 3 7 4 2 4 4 12 # R 3.2.5 -- and 3.3.0 patched
> ## 1 2 1 2 1 1 1 1 2 2 1 2 # <<-- Martin's R-devel and proposed future R
> if(!exists("anyNA", mode="function")) anyNA <- function(x) any(is.na(x))
> stopifnot(apply(zRI, 2, anyNA)) # *all* are NA *or* NaN (or both)
> is.NA <- function(.) is.na(.) & !is.nan(.)
> (iNaN <- apply(zRI, 2, function(.) any(is.nan(.))))
> (iNA <- apply(zRI, 2, function(.) any(is.NA (.)))) # has non-NaN NA's
> ## In Martin's version of R-devel :
> stopifnot(identical(m1z == 1, iNA),
> identical(m1z == 2, !iNA))
> ## m1z uses match(x, *) with length(x) == 1 and failed in R 3.3.0
> stopifnot(identical(m1z, mz))
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list