[Rd] xyTable(x,y) versus table(x,y) with NAs

Serguei Sokol @oko| @end|ng |rom |n@@-tou|ou@e@|r
Tue Apr 25 18:03:47 CEST 2023


Le 25/04/2023 à 17:39, Bill Dunlap a écrit :
> x <- c(1, 1, 2, 2,  2, 3)
> y <- c(1, 2, 1, 3, NA, 3)
>> str(xyTable(x,y))
> List of 3
>   $ x     : num [1:6] 1 1 2 2 NA 3
>   $ y     : num [1:6] 1 2 1 3 NA 3
>   $ number: int [1:6] 1 1 1 NA NA 1
>
>
> How many (2,3)s do we have?  At least one, the third entry, but the fourth
> entry, (2,NA), is possibly a (2,3) so we don't know and make the count NA.
> I suspect this is not the intended logic, but a byproduct of finding value
> changes in a sorted vector with the idiom x[-1]!=x[-length(x).  Also the
> following does follow that logic:
>
>> x <- c(1, 1, 2, 2,  5, 6)
>> y <- c(2, 2, 2, 4, NA, 3)
>> str(xyTable(x,y))
> List of 3
>   $ x     : num [1:5] 1 2 2 5 6
>   $ y     : num [1:5] 2 2 4 NA 3
>   $ number: int [1:5] 2 1 1 1 1
Not really. If we take

   x <- c(1, 1, 2, 2,  5, 6, 5, 5)
   y <- c(2, 2, 2, 4, NA, 3, 3, 4)

we get

   str(xyTable(x,y))

List of 3
  $ x     : num [1:7] 1 2 2 5 5 NA 6
  $ y     : num [1:7] 2 2 4 3 4 NA 3
  $ number: int [1:7] 2 1 1 1 NA NA 1

How many (5, 3) we have? At least 1 but (5, NA) is possibly (5,3) so we 
should have NA but we have 1.
How many (5, 4) we have? At least 1 but (5, NA) is possibly (5,4) and we 
do get NA. So restored logic is not consistent.
Without talking about a pair (NA, NA) appeared and not producing (5, NA) 
pair.

Best,
Serguei.


>
>
>
> table() does not use this logic, as one NA in a vector would make all the
> counts NA.  Should xyTable have a way to handle NAs the way table() does?
>
> -Bill
>
> On Tue, Apr 25, 2023 at 1:26 AM Viechtbauer, Wolfgang (NP) <
> wolfgang.viechtbauer using maastrichtuniversity.nl> wrote:
>
>> Hi all,
>>
>> Posted this many years ago (
>> https://stat.ethz.ch/pipermail/r-devel/2017-December/075224.html), but
>> either this slipped under the radar or my feeble mind is unable to
>> understand what xyTable() is doing here and nobody bothered to correct me.
>> I now stumbled again across this issue.
>>
>> x <- c(1, 1, 2, 2,  2, 3)
>> y <- c(1, 2, 1, 3, NA, 3)
>> table(x, y, useNA="always")
>> xyTable(x, y)
>>
>> Why does xyTable() report that there are NA instances of (2,3)? I could
>> understand the logic that the NA could be anything, including a 3, so the
>> $number value for (2,3) is therefore unknown, but then the same should
>> apply so (2,1), but here $number is 1, so the logic is then inconsistent.
>>
>> I stared at the xyTable code for a while and I suspect this is coming from
>> order() using na.last=TRUE by default, but in any case, to me the behavior
>> above is surprising.
>>
>> Best,
>> Wolfgang
>>
>> ______________________________________________
>> R-devel using r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
> 	[[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel



More information about the R-devel mailing list