[Rd] Recent changes to as.complex(NA_real_)

Martin Maechler m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Sep 25 16:05:33 CEST 2023


>>>>> Hervé Pagès 
>>>>>     on Sat, 23 Sep 2023 16:52:21 -0700 writes:

    > Hi Martin,
    > On 9/23/23 06:43, Martin Maechler wrote:
    >>>>>>> Hervé Pagès
    >>>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes:
    >> > The problem is that you have things that are
    >> > **semantically** different but look exactly the same:
    >> 
    >> > They look the same:
    >> 
    >> >> x
    >> > [1] NA
    >> >> y
    >> > [1] NA
    >> >> z
    >> > [1] NA
    >> 
    >> >> is.na(x)
    >> > [1] TRUE
    >> >> is.na(y)
    >> > [1] TRUE
    >> >> is.na(z)
    >> > [1] TRUE
    >> 
    >> >> str(x)
    >> >   cplx NA
    >> >> str(y)
    >> >   num NA
    >> >> str(z)
    >> >   cplx NA
    >> 
    >> > but they are semantically different e.g.
    >> 
    >> >> Re(x)
    >> > [1] NA
    >> >> Re(y)
    >> > [1] -0.5  # surprise!
    >> 
    >> >> Im(x)  # surprise!
    >> > [1] 2
    >> >> Im(z)
    >> > [1] NA
    >> 
    >> > so any expression involving Re() or Im() will produce
    >> > different results on input that look the same on the
    >> > surface.
    >> 
    >> > You can address this either by normalizing the internal
    >> > representation of complex NA to always be complex(r=NaN,
    >> > i=NA_real_), like for NA_complex_, or by allowing the
    >> > infinite variations that are currently allowed and at the
    >> > same time making sure that both Re() and Im()  always
    >> > return NA_real_ on a complex NA.
    >> 
    >> > My point is that the behavior of complex NA should be
    >> > predictable. Right now it's not. Once it's predictable
    >> > (with Re() and Im() both returning NA_real_ regardless of
    >> > internal representation), then it no longer matters what
    >> > kind of complex NA is returned by as.complex(NA_real_),
    >> > because they are no onger distinguishable.
    >> 
    >> > H.
    >> 
    >> > On 9/22/23 13:43, Duncan Murdoch wrote:
    >> >> Since the result of is.na(x) is the same on each of
    >> >> those, I don't see a problem.  As long as that is
    >> >> consistent, I don't see a problem. You shouldn't be using
    >> >> any other test for NA-ness.  You should never be
    >> >> expecting identical() to treat different types as the
    >> >> same (e.g.  identical(NA, NA_real_) is FALSE, as it
    >> >> should be).  If you are using a different test, that's
    >> >> user error.
    >> >>
    >> >> Duncan Murdoch
    >> >>
    >> >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote:
    >> >>> We could also question the value of having an infinite
    >> >>> number of NA representations in the complex space. For
    >> >>> example all these complex values are displayed the same
    >> >>> way (as NA), are considered NAs by is.na(), but are not
    >> >>> identical or semantically equivalent (from an Re() or
    >> >>> Im() point of view):
    >> >>>
    >> >>>       NA_real_ + 0i
    >> >>>
    >> >>>       complex(r=NA_real_, i=Inf)
    >> >>>
    >> >>>       complex(r=2, i=NA_real_)
    >> >>>
    >> >>>       complex(r=NaN, i=NA_real_)
    >> >>>
    >> >>> In other words, using a single representation for
    >> >>> complex NA (i.e.  complex(r=NA_real_, i=NA_real_)) would
    >> >>> avoid a lot of unnecessary complications and surprises.
    >> >>>
    >> >>> Once you do that, whether as.complex(NA_real_) should
    >> >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_,
    >> >>> i=NA_real_) becomes a moot point.
    >> >>>
    >> >>> Best,
    >> >>>
    >> >>> H.
    >> 
    >> Thank you, Hervé.
    >> Your proposition is yet another one,
    >> to declare that all complex NA's should be treated as identical
    >> (almost/fully?) everywhere.
    >> 
    >> This would be a possibility, but I think a drastic one.
    >> 
    >> I think there are too many cases, where I want to keep the
    >> information of the real part independent of the values of the
    >> imaginary part (e.g. think of the Riemann hypothesis), and
    >> typically vice versa.
    > Use NaN for that, not NA.

Aa..h, *that* is your point.

Well, I was on exactly this line till a few years ago.

However, very *sadly* to me, note how    example(complex)
nowadays ends :

##----------------------------------------------------------------------------
 showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z), Im(z)))
 
 ## The exact result of this *depends* on the platform, compiler, math-library:
 (NpNA <- NaN + NA_complex_) ; str(NpNA) # *behaves* as 'cplx NA' ..
 stopifnot(is.na(NpNA), is.na(NA_complex_), is.na(Re(NA_complex_)), is.na(Im(NA_complex_)))
 showC(NpNA)# but does not always show '(R = NaN, I = NA)'
 ## and this is not TRUE everywhere:
 identical(NpNA, NA_complex_)
 showC(NA_complex_) # always == (R = NA, I = NA)
##----------------------------------------------------------------------------

Unfortunately --- notably by the appearance of the new (M1, M1 pro, M2, ...)
processors, but not only ---
I (and others, but the real experts) have wrongly assumed  that
NA {which on the C-level is *one* of the many possible internal NaN's}
would be preserved in computations, as they are on the R level
-- well, typically, and as long as we've used intel-compatible
chips and gcc-compilers.

But modern speed optimizations (also seen in accelerated
Blas/Lapack ..) have noticed that no official C standard
requires such preservations (i.e., in our case of NA, *the* special NaN), 
and -- for speed reasons -- now on these accelerated platforms,
R-level NA's "suddenly" turn into R-level NaN's  (all are NaN on
the C level but "with different payload") from quite "trivial" computations.

Consequently, the strict distinction between  NA  and  NaN
even when they are so important for us statisticians / careful data analysts,
nowadays will tend to have to be dismissed eventually.

... and as I have mentioned also mentioned earlier in this thread,
I believe we should also print the complex values of z
fulfilling is.na(z)  by their  Re & Im, i.e., e.g.

 NA+iNA  (or NaN+iNA or NA+iNaN or NaN+iNaN
 NA+0i,  NaN+1i,  3+iNaN, 4+iNA  etc

but note that the exact printing itself should *not* become the topic of this
thread unless by mentioning that I strongly believe the print()ing
of complex vectors in R should change anway *and* for that reason,
the printing / "looks the same as" / ...  should not be strong
reasons in my view for deciding how  *coercion*,
notably  as.complex(.)   should work.

Martin


    >> With your proposal, for a (potentially large) vector of complex numbers,
    >> after
    >> Re(z)  <-  1/2
    >> 
    >> I could no longer rely on   Re(z) == 1/2,
    >> because it would be wrong for those z where (the imaginary part/ the number)
    >> was NA/NaN.

    > My proposal is to do this only if the Re and/or Im parts are NAs, not if 
    > they are NaNs.

    > BTW the difference between how NAs and NaNs are treated in complex 
    > vectors is another issue that adds to the confusion:

    >   > complex(r=NA, i=2)
    >   [1] NA
    >   > complex(r=NaN, i=2)
    >   [1] NaN+2i

    > Not displaying the real + imaginary parts in the NA case kind of 
    > suggests that somehow they are gone i.e. that Re(z) and Im(z) are both NA.

    > Note that my proposal is not to change the display but to change Re() 
    > and Im() to make them consistent with the display.

    > In your Re(z) <- 1/2 example (which seems to be theoretical only because 
    > I don't see `Re<-` in base R), any NA in 'z' would be replaced with 
    > complex(r=NaN, i=1/2), so you could rely on Re(z) == 1/2.

    >> Also, in a similar case, a
    >> 
    >> Im(z) <- NA
    >> 
    >> would have to "destroy" all real parts  Re(z);
    >> not really typically in memory, but effectively for the user,  Re(z)
    >> would be all NA/NaN.
    > Yes, setting a value to NA destroys it beyond repair in the sense that 
    > there's no way you can retrieve any original parts of it. I'm fine with 
    > that. I'm not fine with an NA being used to store hidden information.
    >> 
    >> And I think there are quite a few other situations
    >> where looking at Re() and Im() separately makes a lot of sense.
    > Still doable if the Re or Im parts contain NaNs.
    >> 
    >> Spencer also made a remark in this direction.
    >> 
    >> All in all I'd be very reluctant to move in this direction;
    >> but yes, I'm just one person ... let's continue musing and
    >> considering !

    > I understand the reluctance since this would not be a light move, but 
    > thanks for considering.

    > Best,

    > H.

    >> 
    >> Martin
    >> 
    >> >>> On 9/22/23 03:38, Martin Maechler wrote:
    >> >>>>>>>>> Mikael Jagan       on Thu, 21 Sep 2023 00:47:39
    >> >>>>>>>>> -0400 writes:
    >> >>>>       > Revisiting this thread from April:
    >> >>>>
    >> >>>> >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html
    >> >>>>
    >> >>>>       > where the decision (not yet backported) was
    >> >>>> made for       > as.complex(NA_real_) to give
    >> >>>> NA_complex_ instead of       > complex(r=NA_real_,
    >> >>>> i=0), to be consistent with       > help("as.complex")
    >> >>>> and as.complex(NA) and as.complex(NA_integer_).
    >> >>>>
    >> >>>>       > Was any consideration given to the alternative?
    >> >>>>       > That is, to changing as.complex(NA) and
    >> >>>> as.complex(NA_integer_) to       > give
    >> >>>> complex(r=NA_real_, i=0), consistent with       >
    >> >>>> as.complex(NA_real_), then amending help("as.complex")
    >> >>>>       > accordingly?
    >> >>>>
    >> >>>> Hmm, as, from R-core, mostly I was involved, I admit to
    >> >>>> say "no", to my knowledge the (above) alternative
    >> >>>> wasn't considered.
    >> >>>>
    >> >>>>     > The principle that     >
    >> >>>> Im(as.complex(<real=(double|integer|logical)>)) should
    >> >>>> be zero     > is quite fundamental, in my view, hence
    >> >>>> the "new" behaviour     > seems to really violate the
    >> >>>> principle of least surprise ...
    >> >>>>
    >> >>>> of course "least surprise"  is somewhat subjective.
    >> >>>> Still, I clearly agree that the above would be one
    >> >>>> desirable property.
    >> >>>>
    >> >>>> I think that any solution will lead to *some* surprise
    >> >>>> for some cases, I think primarily because there are
    >> >>>> *many* different values z  for which  is.na(z)  is
    >> >>>> true,  and in any case NA_complex_  is only of the
    >> >>>> many.
    >> >>>>
    >> >>>> I also agree with Mikael that we should reconsider the
    >> >>>> issue that was raised by Davis Vaughan here ("on
    >> >>>> R-devel") last April.
    >> >>>>
    >> >>>>       > Another (but maybe weaker) argument is that
    >> >>>>       > double->complex coercions happen more often
    >> >>>> than       > logical->complex and integer->complex
    >> >>>> ones. Changing the       > behaviour of the more
    >> >>>> frequently performed coercion is       > more likely to
    >> >>>> affect code "out there".
    >> >>>>
    >> >>>>       > Yet another argument is that one expects
    >> >>>>
    >> >>>>       >      identical(as.complex(NA_real_), NA_real_ +
    >> >>>> (0+0i))
    >> >>>>
    >> >>>>       > to be TRUE, i.e., that coercing from double to
    >> >>>> complex is       > equivalent to adding a complex
    >> >>>> zero.  The new behaviour       > makes the above FALSE,
    >> >>>> since NA_real_ + (0+0i) gives       >
    >> >>>> complex(r=NA_real_, i=0).
    >> >>>>
    >> >>>> No!  --- To my own surprise (!) --- in current R-devel
    >> >>>> the above is TRUE, and         NA_real_ + (0+0i)  , the
    >> >>>> same as         NA_real_ + 0i      , really gives
    >> >>>> complex(r=NA, i=NA) :
    >> >>>>
    >> >>>> Using showC() from ?complex
    >> >>>>
    >> >>>>     showC <- function(z) noquote(sprintf("(R = %g, I =
    >> >>>> %g)", Re(z), Im(z)))
    >> >>>>
    >> >>>> we see (in R-devel) quite consistently
    >> >>>>
    >> >>>>> showC(NA_real_ + 0i)
    >> >>>> [1] (R = NA, I = NA)
    >> >>>>> showC(NA       + 0i)  # NA is 'logical'
    >> >>>> [1] (R = NA, I = NA) where as in R 4.3.1 and
    >> >>>> "R-patched" -- *in*consistently
    >> >>>>
    >> >>>>> showC(NA_real_ + 0i)
    >> >>>> [1] (R = NA, I = 0)
    >> >>>>> showC(NA + 0i)
    >> >>>> [1] (R = NA, I = NA) .... and honestly, I do not see
    >> >>>> *where* (and when) we changed the underlying code (in
    >> >>>> arithmetic.c !?)  in R-devel to *also* produce
    >> >>>> NA_complex_  in such complex *arithmetic*
    >> >>>>
    >> >>>>
    >> >>>>       > Having said that, one might also (but more
    >> >>>> naively) expect
    >> >>>>
    >> >>>>       >
    >> >>>> identical(as.complex(as.double(NA_complex_)),
    >> >>>> NA_complex_)
    >> >>>>
    >> >>>>       > to be TRUE.
    >> >>>>
    >> >>>> as in current R-devel
    >> >>>>
    >> >>>>       > Under my proposal it continues to be FALSE.
    >> >>>>
    >> >>>> as in "R-release"
    >> >>>>
    >> >>>>       > Well, I'd prefer if it gave FALSE with a
    >> >>>> warning       > "imaginary parts discarded in
    >> >>>> coercion", but it seems that       >
    >> >>>> as.double(complex(r=a, i=b)) never warns when either of
    >> >>>>       > 'a' and 'b' is NA_real_ or NaN, even where
    >> >>>> "information"       > {nonzero 'b'} is clearly lost ...
    >> >>>>
    >> >>>> The question of *warning* here is related indeed, but I
    >> >>>> think we should try to look at it only *secondary* to
    >> >>>> your first proposal.
    >> >>>>
    >> >>>>       > Whatever decision is made about
    >> >>>> as.complex(NA_real_),       > maybe these points should
    >> >>>> be weighed before it becomes part of       > R-release
    >> >>>> ...
    >> >>>>
    >> >>>>       > Mikael
    >> >>>>
    >> >>>> Indeed.
    >> >>>>
    >> >>>> Can we please get other opinions / ideas here?
    >> >>>>
    >> >>>> Thank you in advance for your thoughts!  Martin
    >> >>>>
    >> >>>> ---
    >> >>>>
    >> >>>> PS:
    >> >>>>
    >> >>>>    Our *print()*ing  of complex NA's ("NA" here meaning
    >> >>>> NA or NaN)    is also unsatisfactory, e.g. in the case
    >> >>>> where all entries of a    vector are NA in the sense of
    >> >>>> is.na(.), but their    Re() and Im() are not all NA:
    >> >>>>     showC <- function(z) noquote(sprintf("(R = %g, I =
    >> >>>> %g)", Re(z), Im(z)))     z <- complex(, c(11, NA, NA),
    >> >>>> c(NA, 99, NA))     z     showC(z)
    >> >>>>
    >> >>>> gives
    >> >>>>
    >> >>>>     > z     [1] NA NA NA     > showC(z)     [1] (R =
    >> >>>> 11, I = NA) (R = NA, I = 99) (R = NA, I = NA)
    >> >>>>
    >> >>>> but that (printing of complex) *is* another issue, in
    >> >>>> which we have the re-opened bugzilla PR#16752
    >> >>>> ==>https://bugs.r-project.org/show_bug.cgi?id=16752
    >> >>>>
    >> >>>> on which we also worked during the R Sprint in Warwick
    >> >>>> three weeks ago, and where I want to commit changes in
    >> >>>> any case {but think we should change even a bit more
    >> >>>> than we got to during the Sprint}.
    >> >>>>
    >> >>>> ______________________________________________
    >> >>>>R-devel using r-project.org   mailing list
    >> >>>>https://stat.ethz.ch/mailman/listinfo/r-devel
    >> >>>
    >> >>
    >> > --
    >> > Hervé Pagès
    >> 
    >> > Bioconductor Core Teamhpages.on.github using gmail.com
    >> 
    >> 
    >> 
    > -- 
    > Hervé Pagès

    > Bioconductor Core Team
    > hpages.on.github using gmail.com

    > [[alternative HTML version deleted]]



More information about the R-devel mailing list