[Rd] xyTable(x,y) versus table(x,y) with NAs
Serguei Sokol
@oko| @end|ng |rom |n@@-tou|ou@e@|r
Tue Apr 25 11:35:03 CEST 2023
I correct myself. Obviously, the line
first[is.na(first) | isFALSE(first)] <- FALSE
should read
first[is.na(first)] <- FALSE
Serguei.
Le 25/04/2023 à 11:30, Serguei Sokol a écrit :
> Le 25/04/2023 à 10:24, Viechtbauer, Wolfgang (NP) a écrit :
>> Hi all,
>>
>> Posted this many years ago
>> (https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html),
>> but either this slipped under the radar or my feeble mind is unable
>> to understand what xyTable() is doing here and nobody bothered to
>> correct me. I now stumbled again across this issue.
>>
>> x <- c(1, 1, 2, 2, 2, 3)
>> y <- c(1, 2, 1, 3, NA, 3)
>> table(x, y, useNA="always")
>> xyTable(x, y)
>>
>> Why does xyTable() report that there are NA instances of (2,3)? I
>> could understand the logic that the NA could be anything, including a
>> 3, so the $number value for (2,3) is therefore unknown, but then the
>> same should apply so (2,1), but here $number is 1, so the logic is
>> then inconsistent.
>>
>> I stared at the xyTable code for a while and I suspect this is coming
>> from order() using na.last=TRUE by default, but in any case, to me
>> the behavior above is surprising.
> Not really. The variable 'first' in xyTable() is supposed to detect
> positions of first values in repeated pair sequences. Then it is used
> to retained only their indexes in a vector of type 1:n. Finally, by
> taking diff(), a number of repeated pairs is obtained. However, as
> 'first' will contain one NA for your example, the diff() call will
> produce two NAs by taking the difference with precedent and following
> number. Hence, the result.
>
> Here is a slightly modified code ox xyTable to handle NA too.
>
> xyTableNA <- function (x, y = NULL, digits)
> {
> x <- xy.coords(x, y, setLab = FALSE)
> y <- signif(x$y, digits = digits)
> x <- signif(x$x, digits = digits)
> n <- length(x)
> number <- if (n > 0) {
> orderxy <- order(x, y)
> x <- x[orderxy]
> y <- y[orderxy]
> first <- c(TRUE, (x[-1L] != x[-n]) | (y[-1L] != y[-n]))
> firstNA <- c(TRUE, xor(is.na(x[-1L]), is.na(x[-n])) |
> xor(is.na(y[-1L]), is.na(y[-n])))
> first[firstNA] <- TRUE
> first[is.na(first) | isFALSE(first)] <- FALSE
> x <- x[first]
> y <- y[first]
> diff(c((1L:n)[first], n + 1L))
> }
> else integer()
> list(x = x, y = y, number = number)
> }
>
> Best,
> Serguei.
>
>>
>> Best,
>> Wolfgang
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
--
Serguei Sokol
Ingenieur de recherche INRAE
Cellule Mathématiques
TBI, INSA/INRAE UMR 792, INSA/CNRS UMR 5504
135 Avenue de Rangueil
31077 Toulouse Cedex 04
tel: +33 5 61 55 98 49
email: sokol using insa-toulouse.fr
http://www.toulouse-biotechnology-institute.fr/en/technology_platforms/mathematics-cell.html
More information about the R-devel
mailing list