[Rd] Do rowMeans and colMeans of complex vars need adjusting following r88444?
Martin Maechler
m@ech|er @end|ng |rom @t@t@m@th@ethz@ch
Mon Aug 25 11:54:51 CEST 2025
>>>>> Dirk Eddelbuettel
>>>>> on Sun, 24 Aug 2025 08:33:58 -0500 writes:
> In SVN commit r88444, Martin made a change following Mikael's PR #18918. The
> one-line synopsis is 'subassignment <complex>[i] <- NA should only touch the
> real part' and you can see it all at [1].
> Imaginary parts now get a zero.
Indeed, and this in itself is a bit dubious:
>From the commit message you cite above one could/would expect that the
imaginary parts should *stay* unchanged, i.e., remain '2' in
your example below (and remain '0' when they are, as in the PR#18919...)
> I am wondering if that cause rowMeans and colMeans to be off?
Well, I can argue they are not off ... {but I will eventually
agree with you that we *have* a problem!}
One clue is that the complex 'x' matrices now *differ* between
R-release (*and* R-patched) and R-devel
*but* they print identically.... and that is part of the confusion in this case.
(or *was* adding to the confusion at least).
format() and print() should and do go hand-in-hand, here as well,
and for (probably mostly historical reasons), R&R and then
R-core had decided to format/print all complex NAs the same ...
the reasoning being that
'NA means "Not Available"' and for complex data (one complex
seen as "one complex number" rather than "two real numbers")
why should on be bothered about the Re/Im representation of a
complex. ...
[We have been on that topic before, notably in bugzilla, as well].
So in your example below,
>> x <- matrix(1:9 + 2i, 3)
>> x[c(2,4,6,8)] <- NA
>>
>> x
> [,1] [,2] [,3]
> [1,] 1+2i NA 7+2i
> [2,] NA 5+2i NA
> [3,] 3+2i NA 9+2i
>>
in R 4.5.1,
> Im(x)
[,1] [,2] [,3]
[1,] 2 NA 2
[2,] NA 2 NA
[3,] 2 NA 2
>
whereas in R-devel
> Im(x)
[,1] [,2] [,3]
[1,] 2 0 2
[2,] 0 2 0
[3,] 2 0 2
>
.... and indeed, you *did* implicitly acknowledge this difference, above.
Consequently, of course, rowMeans(x) or colMeans(x) and many
other matrix functions/functionals of 'x' will differ, between
R-release (& -patched) and R-devel ...
as the 'x' differ .. in their imaginary parts.
... but hang on ...
rowMeans() and colMeans() work "separately" for the real and
imaginary parts, and (as seen above) the imaginary part has no
NA's and the number of obs per row/column in the imaginary part
is always 3, such that the Im() parts of the colSums() result
are divided by 3, here:
>> rowMeans(x, TRUE) # this now differs from R-release
> [1] 4+1.333333i 5+0.666667i 6+1.333333i
>>
> But in R 4.5.1 we get the (here constant) imaginary part as constant just as
> we do when we do this 'by hand' as rowSum() appears fine:
>> rowSums(x, TRUE)
> [1] 8+4i 5+2i 12+4i
>> apply(x, 1, \(x) sum(is.finite(x))) # row count of finite elems
> [1] 2 1 2
>>
>> rowSums(x, TRUE) / apply(x, 1, \(x) sum(is.finite(x)))
> [1] 4+2i 5+2i 6+2i
>>
> I could be off my rocker here as I don't use complex variables much and am a
> little rustic but a rudimentary check suggests my reasoning applies: means of
> real and imaginary parts (taken across rows or columns) should be the sum
> divided by the number of non-NA elements. Right now they aren't.
well, see above,they *are* __if__ you look at "number of
non-NA elements" "coordinate-wisely" or separately for Re() and Im().
I still agree we should address this: We do have a discrepancy
with mean() i.e., mean.default() which does "exactly" what you
do "by hand" above, and hence using is.na() for the full
complex vector, and not *separately* for Re() and Im() parts;
... and I do tend to agree that colMeans(*, na.rm=TRUE) etc
probably should be adapted to *not* work coordinate-wise but
drop all "complex NAs" both for Re and Im.
In addition, back to the original
PR #18918 (--> https://bugs.r-project.org/show_bug.cgi?id=18918 ),
I will *also* take up my "is a bit dubious" from above.
Martin
More information about the R-devel
mailing list