[Rd] xyTable(x,y) versus table(x,y) with NAs

Viechtbauer, Wolfgang (NP) wo||g@ng@v|echtb@uer @end|ng |rom m@@@tr|chtun|ver@|ty@n|
Tue Apr 25 13:45:45 CEST 2023


Nice! Would this be something to consider as either a permanent fix to xyTable() (to me, the function is right now behaving in a rather unexpected manner, if not to say, buggy) or via an argument (for backwards compatability)?

Best,
Wolfgang

>-----Original Message-----
>From: Serguei Sokol [mailto:sokol using insa-toulouse.fr]
>Sent: Tuesday, 25 April, 2023 11:35
>To: Viechtbauer, Wolfgang (NP); r-devel using r-project.org
>Subject: Re: [Rd] xyTable(x,y) versus table(x,y) with NAs
>
>I correct myself. Obviously, the line
>
>first[is.na(first) | isFALSE(first)] <- FALSE
>
>should read
>
>first[is.na(first)] <- FALSE
>
>Serguei.
>
>Le 25/04/2023 à 11:30, Serguei Sokol a écrit :
>> Le 25/04/2023 à 10:24, Viechtbauer, Wolfgang (NP) a écrit :
>>> Hi all,
>>>
>>> Posted this many years ago
>>> (https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html),
>>> but either this slipped under the radar or my feeble mind is unable
>>> to understand what xyTable() is doing here and nobody bothered to
>>> correct me. I now stumbled again across this issue.
>>>
>>> x <- c(1, 1, 2, 2,  2, 3)
>>> y <- c(1, 2, 1, 3, NA, 3)
>>> table(x, y, useNA="always")
>>> xyTable(x, y)
>>>
>>> Why does xyTable() report that there are NA instances of (2,3)? I
>>> could understand the logic that the NA could be anything, including a
>>> 3, so the $number value for (2,3) is therefore unknown, but then the
>>> same should apply so (2,1), but here $number is 1, so the logic is
>>> then inconsistent.
>>>
>>> I stared at the xyTable code for a while and I suspect this is coming
>>> from order() using na.last=TRUE by default, but in any case, to me
>>> the behavior above is surprising.
>> Not really. The variable 'first' in xyTable() is supposed to detect
>> positions of first values in repeated pair sequences. Then it is used
>> to retained only their indexes in a vector of type 1:n. Finally, by
>> taking diff(), a number of repeated pairs is obtained. However, as
>> 'first' will contain one NA  for your example, the diff() call will
>> produce two NAs by taking the difference with precedent and following
>> number. Hence, the result.
>>
>> Here is a slightly modified code ox xyTable to handle NA too.
>>
>> xyTableNA <- function (x, y = NULL, digits)
>> {
>>     x <- xy.coords(x, y, setLab = FALSE)
>>     y <- signif(x$y, digits = digits)
>>     x <- signif(x$x, digits = digits)
>>     n <- length(x)
>>     number <- if (n > 0) {
>>         orderxy <- order(x, y)
>>         x <- x[orderxy]
>>         y <- y[orderxy]
>>         first <- c(TRUE, (x[-1L] != x[-n]) | (y[-1L] != y[-n]))
>>         firstNA <- c(TRUE, xor(is.na(x[-1L]), is.na(x[-n])) |
>> xor(is.na(y[-1L]), is.na(y[-n])))
>>         first[firstNA] <- TRUE
>>         first[is.na(first) | isFALSE(first)] <- FALSE
>>         x <- x[first]
>>         y <- y[first]
>>         diff(c((1L:n)[first], n + 1L))
>>     }
>>     else integer()
>>     list(x = x, y = y, number = number)
>> }
>>
>> Best,
>> Serguei.
>>>
>>> Best,
>>> Wolfgang


More information about the R-devel mailing list