[Rd] xyTable(x,y) versus table(x,y) with NAs

Serguei Sokol @oko| @end|ng |rom |n@@-tou|ou@e@|r
Tue Apr 25 11:35:03 CEST 2023


I correct myself. Obviously, the line

first[is.na(first) | isFALSE(first)] <- FALSE

should read

first[is.na(first)] <- FALSE

Serguei.

Le 25/04/2023 à 11:30, Serguei Sokol a écrit :
> Le 25/04/2023 à 10:24, Viechtbauer, Wolfgang (NP) a écrit :
>> Hi all,
>>
>> Posted this many years ago 
>> (https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html), 
>> but either this slipped under the radar or my feeble mind is unable 
>> to understand what xyTable() is doing here and nobody bothered to 
>> correct me. I now stumbled again across this issue.
>>
>> x <- c(1, 1, 2, 2,  2, 3)
>> y <- c(1, 2, 1, 3, NA, 3)
>> table(x, y, useNA="always")
>> xyTable(x, y)
>>
>> Why does xyTable() report that there are NA instances of (2,3)? I 
>> could understand the logic that the NA could be anything, including a 
>> 3, so the $number value for (2,3) is therefore unknown, but then the 
>> same should apply so (2,1), but here $number is 1, so the logic is 
>> then inconsistent.
>>
>> I stared at the xyTable code for a while and I suspect this is coming 
>> from order() using na.last=TRUE by default, but in any case, to me 
>> the behavior above is surprising.
> Not really. The variable 'first' in xyTable() is supposed to detect 
> positions of first values in repeated pair sequences. Then it is used 
> to retained only their indexes in a vector of type 1:n. Finally, by 
> taking diff(), a number of repeated pairs is obtained. However, as 
> 'first' will contain one NA  for your example, the diff() call will 
> produce two NAs by taking the difference with precedent and following 
> number. Hence, the result.
>
> Here is a slightly modified code ox xyTable to handle NA too.
>
> xyTableNA <- function (x, y = NULL, digits)
> {
>     x <- xy.coords(x, y, setLab = FALSE)
>     y <- signif(x$y, digits = digits)
>     x <- signif(x$x, digits = digits)
>     n <- length(x)
>     number <- if (n > 0) {
>         orderxy <- order(x, y)
>         x <- x[orderxy]
>         y <- y[orderxy]
>         first <- c(TRUE, (x[-1L] != x[-n]) | (y[-1L] != y[-n]))
>         firstNA <- c(TRUE, xor(is.na(x[-1L]), is.na(x[-n])) | 
> xor(is.na(y[-1L]), is.na(y[-n])))
>         first[firstNA] <- TRUE
>         first[is.na(first) | isFALSE(first)] <- FALSE
>         x <- x[first]
>         y <- y[first]
>         diff(c((1L:n)[first], n + 1L))
>     }
>     else integer()
>     list(x = x, y = y, number = number)
> }
>
> Best,
> Serguei.
>
>>
>> Best,
>> Wolfgang
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>


-- 
Serguei Sokol
Ingenieur de recherche INRAE

Cellule Mathématiques
TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04

tel: +33 5 61 55 98 49
email: sokol using insa-toulouse.fr
http://www.toulouse-biotechnology-institute.fr/en/technology_platforms/mathematics-cell.html



More information about the R-devel mailing list