[Rd] Recent changes to as.complex(NA_real_)

Hervé Pagès hp@ge@@on@g|thub @end|ng |rom gm@||@com
Mon Sep 25 19:08:17 CEST 2023


On 9/25/23 07:05, Martin Maechler wrote:
>>>>>> Hervé Pagès
>>>>>>      on Sat, 23 Sep 2023 16:52:21 -0700 writes:
>      > Hi Martin,
>      > On 9/23/23 06:43, Martin Maechler wrote:
>      >>>>>>> Hervé Pagès
>      >>>>>>> on Fri, 22 Sep 2023 16:55:05 -0700 writes:
>      >> > The problem is that you have things that are
>      >> > **semantically** different but look exactly the same:
>      >>
>      >> > They look the same:
>      >>
>      >> >> x
>      >> > [1] NA
>      >> >> y
>      >> > [1] NA
>      >> >> z
>      >> > [1] NA
>      >>
>      >> >> is.na(x)
>      >> > [1] TRUE
>      >> >> is.na(y)
>      >> > [1] TRUE
>      >> >> is.na(z)
>      >> > [1] TRUE
>      >>
>      >> >> str(x)
>      >> >   cplx NA
>      >> >> str(y)
>      >> >   num NA
>      >> >> str(z)
>      >> >   cplx NA
>      >>
>      >> > but they are semantically different e.g.
>      >>
>      >> >> Re(x)
>      >> > [1] NA
>      >> >> Re(y)
>      >> > [1] -0.5  # surprise!
>      >>
>      >> >> Im(x)  # surprise!
>      >> > [1] 2
>      >> >> Im(z)
>      >> > [1] NA
>      >>
>      >> > so any expression involving Re() or Im() will produce
>      >> > different results on input that look the same on the
>      >> > surface.
>      >>
>      >> > You can address this either by normalizing the internal
>      >> > representation of complex NA to always be complex(r=NaN,
>      >> > i=NA_real_), like for NA_complex_, or by allowing the
>      >> > infinite variations that are currently allowed and at the
>      >> > same time making sure that both Re() and Im()  always
>      >> > return NA_real_ on a complex NA.
>      >>
>      >> > My point is that the behavior of complex NA should be
>      >> > predictable. Right now it's not. Once it's predictable
>      >> > (with Re() and Im() both returning NA_real_ regardless of
>      >> > internal representation), then it no longer matters what
>      >> > kind of complex NA is returned by as.complex(NA_real_),
>      >> > because they are no onger distinguishable.
>      >>
>      >> > H.
>      >>
>      >> > On 9/22/23 13:43, Duncan Murdoch wrote:
>      >> >> Since the result of is.na(x) is the same on each of
>      >> >> those, I don't see a problem.  As long as that is
>      >> >> consistent, I don't see a problem. You shouldn't be using
>      >> >> any other test for NA-ness.  You should never be
>      >> >> expecting identical() to treat different types as the
>      >> >> same (e.g.  identical(NA, NA_real_) is FALSE, as it
>      >> >> should be).  If you are using a different test, that's
>      >> >> user error.
>      >> >>
>      >> >> Duncan Murdoch
>      >> >>
>      >> >> On 22/09/2023 2:41 p.m., Hervé Pagès wrote:
>      >> >>> We could also question the value of having an infinite
>      >> >>> number of NA representations in the complex space. For
>      >> >>> example all these complex values are displayed the same
>      >> >>> way (as NA), are considered NAs by is.na(), but are not
>      >> >>> identical or semantically equivalent (from an Re() or
>      >> >>> Im() point of view):
>      >> >>>
>      >> >>>       NA_real_ + 0i
>      >> >>>
>      >> >>>       complex(r=NA_real_, i=Inf)
>      >> >>>
>      >> >>>       complex(r=2, i=NA_real_)
>      >> >>>
>      >> >>>       complex(r=NaN, i=NA_real_)
>      >> >>>
>      >> >>> In other words, using a single representation for
>      >> >>> complex NA (i.e.  complex(r=NA_real_, i=NA_real_)) would
>      >> >>> avoid a lot of unnecessary complications and surprises.
>      >> >>>
>      >> >>> Once you do that, whether as.complex(NA_real_) should
>      >> >>> return complex(r=NA_real_, i=0) or complex(r=NA_real_,
>      >> >>> i=NA_real_) becomes a moot point.
>      >> >>>
>      >> >>> Best,
>      >> >>>
>      >> >>> H.
>      >>
>      >> Thank you, Hervé.
>      >> Your proposition is yet another one,
>      >> to declare that all complex NA's should be treated as identical
>      >> (almost/fully?) everywhere.
>      >>
>      >> This would be a possibility, but I think a drastic one.
>      >>
>      >> I think there are too many cases, where I want to keep the
>      >> information of the real part independent of the values of the
>      >> imaginary part (e.g. think of the Riemann hypothesis), and
>      >> typically vice versa.
>      > Use NaN for that, not NA.
>
> Aa..h, *that* is your point.
>
> Well, I was on exactly this line till a few years ago.
>
> However, very *sadly* to me, note how    example(complex)
> nowadays ends :
>
> ##----------------------------------------------------------------------------
>   showC <- function(z) noquote(sprintf("(R = %g, I = %g)", Re(z), Im(z)))
>   
>   ## The exact result of this *depends* on the platform, compiler, math-library:
>   (NpNA <- NaN + NA_complex_) ; str(NpNA) # *behaves* as 'cplx NA' ..
>   stopifnot(is.na(NpNA), is.na(NA_complex_), is.na(Re(NA_complex_)), is.na(Im(NA_complex_)))
>   showC(NpNA)# but does not always show '(R = NaN, I = NA)'
>   ## and this is not TRUE everywhere:
>   identical(NpNA, NA_complex_)
>   showC(NA_complex_) # always == (R = NA, I = NA)
> ##----------------------------------------------------------------------------
>
> Unfortunately --- notably by the appearance of the new (M1, M1 pro, M2, ...)
> processors, but not only ---
> I (and others, but the real experts) have wrongly assumed  that
> NA {which on the C-level is *one* of the many possible internal NaN's}
> would be preserved in computations, as they are on the R level
> -- well, typically, and as long as we've used intel-compatible
> chips and gcc-compilers.
>
> But modern speed optimizations (also seen in accelerated
> Blas/Lapack ..) have noticed that no official C standard
> requires such preservations (i.e., in our case of NA, *the* special NaN),
> and -- for speed reasons -- now on these accelerated platforms,
> R-level NA's "suddenly" turn into R-level NaN's  (all are NaN on
> the C level but "with different payload") from quite "trivial" computations.
>
> Consequently, the strict distinction between  NA  and  NaN
> even when they are so important for us statisticians / careful data analysts,
> nowadays will tend to have to be dismissed eventually.

I see. Thanks for pointing that out.

This would actually be an argument in favor of preserving the current 
as.complex(NA_real_) behavior. If the difference between NA and NaN is 
fading away then there's no reason to change as.complex(NA_real_) to 
make it behave radically differently from as.complex(NaN).

>
> ... and as I have mentioned also mentioned earlier in this thread,
> I believe we should also print the complex values of z
> fulfilling is.na(z)  by their  Re & Im, i.e., e.g.
>
>   NA+iNA  (or NaN+iNA or NA+iNaN or NaN+iNaN
>   NA+0i,  NaN+1i,  3+iNaN, 4+iNA  etc

oops, I should have paid more attention to your first post in this 
thread, sorry for that.

And yes, I totally agree with improving the printing of complexes when 
Re and/or Im is NA

Best,

H.

>
> but note that the exact printing itself should *not* become the topic of this
> thread unless by mentioning that I strongly believe the print()ing
> of complex vectors in R should change anway *and* for that reason,
> the printing / "looks the same as" / ...  should not be strong
> reasons in my view for deciding how  *coercion*,
> notably  as.complex(.)   should work.
>
> Martin
>
>
>      >> With your proposal, for a (potentially large) vector of complex numbers,
>      >> after
>      >> Re(z)  <-  1/2
>      >>
>      >> I could no longer rely on   Re(z) == 1/2,
>      >> because it would be wrong for those z where (the imaginary part/ the number)
>      >> was NA/NaN.
>
>      > My proposal is to do this only if the Re and/or Im parts are NAs, not if
>      > they are NaNs.
>
>      > BTW the difference between how NAs and NaNs are treated in complex
>      > vectors is another issue that adds to the confusion:
>
>      >   > complex(r=NA, i=2)
>      >   [1] NA
>      >   > complex(r=NaN, i=2)
>      >   [1] NaN+2i
>
>      > Not displaying the real + imaginary parts in the NA case kind of
>      > suggests that somehow they are gone i.e. that Re(z) and Im(z) are both NA.
>
>      > Note that my proposal is not to change the display but to change Re()
>      > and Im() to make them consistent with the display.
>
>      > In your Re(z) <- 1/2 example (which seems to be theoretical only because
>      > I don't see `Re<-` in base R), any NA in 'z' would be replaced with
>      > complex(r=NaN, i=1/2), so you could rely on Re(z) == 1/2.
>
>      >> Also, in a similar case, a
>      >>
>      >> Im(z) <- NA
>      >>
>      >> would have to "destroy" all real parts  Re(z);
>      >> not really typically in memory, but effectively for the user,  Re(z)
>      >> would be all NA/NaN.
>      > Yes, setting a value to NA destroys it beyond repair in the sense that
>      > there's no way you can retrieve any original parts of it. I'm fine with
>      > that. I'm not fine with an NA being used to store hidden information.
>      >>
>      >> And I think there are quite a few other situations
>      >> where looking at Re() and Im() separately makes a lot of sense.
>      > Still doable if the Re or Im parts contain NaNs.
>      >>
>      >> Spencer also made a remark in this direction.
>      >>
>      >> All in all I'd be very reluctant to move in this direction;
>      >> but yes, I'm just one person ... let's continue musing and
>      >> considering !
>
>      > I understand the reluctance since this would not be a light move, but
>      > thanks for considering.
>
>      > Best,
>
>      > H.
>
>      >>
>      >> Martin
>      >>
>      >> >>> On 9/22/23 03:38, Martin Maechler wrote:
>      >> >>>>>>>>> Mikael Jagan       on Thu, 21 Sep 2023 00:47:39
>      >> >>>>>>>>> -0400 writes:
>      >> >>>>       > Revisiting this thread from April:
>      >> >>>>
>      >> >>>> >https://stat.ethz.ch/pipermail/r-devel/2023-April/082545.html
>      >> >>>>
>      >> >>>>       > where the decision (not yet backported) was
>      >> >>>> made for       > as.complex(NA_real_) to give
>      >> >>>> NA_complex_ instead of       > complex(r=NA_real_,
>      >> >>>> i=0), to be consistent with       > help("as.complex")
>      >> >>>> and as.complex(NA) and as.complex(NA_integer_).
>      >> >>>>
>      >> >>>>       > Was any consideration given to the alternative?
>      >> >>>>       > That is, to changing as.complex(NA) and
>      >> >>>> as.complex(NA_integer_) to       > give
>      >> >>>> complex(r=NA_real_, i=0), consistent with       >
>      >> >>>> as.complex(NA_real_), then amending help("as.complex")
>      >> >>>>       > accordingly?
>      >> >>>>
>      >> >>>> Hmm, as, from R-core, mostly I was involved, I admit to
>      >> >>>> say "no", to my knowledge the (above) alternative
>      >> >>>> wasn't considered.
>      >> >>>>
>      >> >>>>     > The principle that     >
>      >> >>>> Im(as.complex(<real=(double|integer|logical)>)) should
>      >> >>>> be zero     > is quite fundamental, in my view, hence
>      >> >>>> the "new" behaviour     > seems to really violate the
>      >> >>>> principle of least surprise ...
>      >> >>>>
>      >> >>>> of course "least surprise"  is somewhat subjective.
>      >> >>>> Still, I clearly agree that the above would be one
>      >> >>>> desirable property.
>      >> >>>>
>      >> >>>> I think that any solution will lead to *some* surprise
>      >> >>>> for some cases, I think primarily because there are
>      >> >>>> *many* different values z  for which  is.na(z)  is
>      >> >>>> true,  and in any case NA_complex_  is only of the
>      >> >>>> many.
>      >> >>>>
>      >> >>>> I also agree with Mikael that we should reconsider the
>      >> >>>> issue that was raised by Davis Vaughan here ("on
>      >> >>>> R-devel") last April.
>      >> >>>>
>      >> >>>>       > Another (but maybe weaker) argument is that
>      >> >>>>       > double->complex coercions happen more often
>      >> >>>> than       > logical->complex and integer->complex
>      >> >>>> ones. Changing the       > behaviour of the more
>      >> >>>> frequently performed coercion is       > more likely to
>      >> >>>> affect code "out there".
>      >> >>>>
>      >> >>>>       > Yet another argument is that one expects
>      >> >>>>
>      >> >>>>       >      identical(as.complex(NA_real_), NA_real_ +
>      >> >>>> (0+0i))
>      >> >>>>
>      >> >>>>       > to be TRUE, i.e., that coercing from double to
>      >> >>>> complex is       > equivalent to adding a complex
>      >> >>>> zero.  The new behaviour       > makes the above FALSE,
>      >> >>>> since NA_real_ + (0+0i) gives       >
>      >> >>>> complex(r=NA_real_, i=0).
>      >> >>>>
>      >> >>>> No!  --- To my own surprise (!) --- in current R-devel
>      >> >>>> the above is TRUE, and         NA_real_ + (0+0i)  , the
>      >> >>>> same as         NA_real_ + 0i      , really gives
>      >> >>>> complex(r=NA, i=NA) :
>      >> >>>>
>      >> >>>> Using showC() from ?complex
>      >> >>>>
>      >> >>>>     showC <- function(z) noquote(sprintf("(R = %g, I =
>      >> >>>> %g)", Re(z), Im(z)))
>      >> >>>>
>      >> >>>> we see (in R-devel) quite consistently
>      >> >>>>
>      >> >>>>> showC(NA_real_ + 0i)
>      >> >>>> [1] (R = NA, I = NA)
>      >> >>>>> showC(NA       + 0i)  # NA is 'logical'
>      >> >>>> [1] (R = NA, I = NA) where as in R 4.3.1 and
>      >> >>>> "R-patched" -- *in*consistently
>      >> >>>>
>      >> >>>>> showC(NA_real_ + 0i)
>      >> >>>> [1] (R = NA, I = 0)
>      >> >>>>> showC(NA + 0i)
>      >> >>>> [1] (R = NA, I = NA) .... and honestly, I do not see
>      >> >>>> *where* (and when) we changed the underlying code (in
>      >> >>>> arithmetic.c !?)  in R-devel to *also* produce
>      >> >>>> NA_complex_  in such complex *arithmetic*
>      >> >>>>
>      >> >>>>
>      >> >>>>       > Having said that, one might also (but more
>      >> >>>> naively) expect
>      >> >>>>
>      >> >>>>       >
>      >> >>>> identical(as.complex(as.double(NA_complex_)),
>      >> >>>> NA_complex_)
>      >> >>>>
>      >> >>>>       > to be TRUE.
>      >> >>>>
>      >> >>>> as in current R-devel
>      >> >>>>
>      >> >>>>       > Under my proposal it continues to be FALSE.
>      >> >>>>
>      >> >>>> as in "R-release"
>      >> >>>>
>      >> >>>>       > Well, I'd prefer if it gave FALSE with a
>      >> >>>> warning       > "imaginary parts discarded in
>      >> >>>> coercion", but it seems that       >
>      >> >>>> as.double(complex(r=a, i=b)) never warns when either of
>      >> >>>>       > 'a' and 'b' is NA_real_ or NaN, even where
>      >> >>>> "information"       > {nonzero 'b'} is clearly lost ...
>      >> >>>>
>      >> >>>> The question of *warning* here is related indeed, but I
>      >> >>>> think we should try to look at it only *secondary* to
>      >> >>>> your first proposal.
>      >> >>>>
>      >> >>>>       > Whatever decision is made about
>      >> >>>> as.complex(NA_real_),       > maybe these points should
>      >> >>>> be weighed before it becomes part of       > R-release
>      >> >>>> ...
>      >> >>>>
>      >> >>>>       > Mikael
>      >> >>>>
>      >> >>>> Indeed.
>      >> >>>>
>      >> >>>> Can we please get other opinions / ideas here?
>      >> >>>>
>      >> >>>> Thank you in advance for your thoughts!  Martin
>      >> >>>>
>      >> >>>> ---
>      >> >>>>
>      >> >>>> PS:
>      >> >>>>
>      >> >>>>    Our *print()*ing  of complex NA's ("NA" here meaning
>      >> >>>> NA or NaN)    is also unsatisfactory, e.g. in the case
>      >> >>>> where all entries of a    vector are NA in the sense of
>      >> >>>> is.na(.), but their    Re() and Im() are not all NA:
>      >> >>>>     showC <- function(z) noquote(sprintf("(R = %g, I =
>      >> >>>> %g)", Re(z), Im(z)))     z <- complex(, c(11, NA, NA),
>      >> >>>> c(NA, 99, NA))     z     showC(z)
>      >> >>>>
>      >> >>>> gives
>      >> >>>>
>      >> >>>>     > z     [1] NA NA NA     > showC(z)     [1] (R =
>      >> >>>> 11, I = NA) (R = NA, I = 99) (R = NA, I = NA)
>      >> >>>>
>      >> >>>> but that (printing of complex) *is* another issue, in
>      >> >>>> which we have the re-opened bugzilla PR#16752
>      >> >>>> ==>https://bugs.r-project.org/show_bug.cgi?id=16752
>      >> >>>>
>      >> >>>> on which we also worked during the R Sprint in Warwick
>      >> >>>> three weeks ago, and where I want to commit changes in
>      >> >>>> any case {but think we should change even a bit more
>      >> >>>> than we got to during the Sprint}.
>      >> >>>>
>      >> >>>> ______________________________________________
>      >> >>>>R-devel using r-project.org    mailing list
>      >> >>>>https://stat.ethz.ch/mailman/listinfo/r-devel
>      >> >>>
>      >> >>
>      >> > --
>      >> > Hervé Pagès
>      >>
>      >> > Bioconductor CoreTeamhpages.on.github using gmail.com
>      >>
>      >>
>      >>
>      > --
>      > Hervé Pagès
>
>      > Bioconductor Core Team
>      >hpages.on.github using gmail.com
>
>      > [[alternative HTML version deleted]]
>
-- 
Hervé Pagès

Bioconductor Core Team
hpages.on.github using gmail.com

	[[alternative HTML version deleted]]



More information about the R-devel mailing list