[Rd] table(exclude = NULL) always includes NA

Martin Maechler maechler at stat.math.ethz.ch
Tue Aug 9 15:35:41 CEST 2016


>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>>     on Sun, 7 Aug 2016 15:32:19 +0000 writes:

> This is an example from https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html .

> With R 2.7.2:

> > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> > table(a, b, exclude = NULL)
>       b
> a      1 2
>   1    1 1
>   2    2 0
>   3    1 0
>   <NA> 1 0

> With R 3.3.1:

> > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> > table(a, b, exclude = NULL)
>       b
> a      1 2 <NA>
>   1    1 1    0
>   2    2 0    0
>   3    1 0    0
>   <NA> 1 0    0
> > table(a, b, useNA = "ifany")
>       b
> a      1 2
>   1    1 1
>   2    2 0
>   3    1 0
>   <NA> 1 0
> > table(a, b, exclude = NULL, useNA = "ifany")
>       b
> a      1 2 <NA>
>   1    1 1    0
>   2    2 0    0
>   3    1 0    0
>   <NA> 1 0    0

> For the example, in R 3.3.1, the result of 'table' with
> exclude = NULL includes NA even if NA is not present. It is
> different from R 2.7.2, that comes from factor(exclude = NULL), 
> that includes NA only if NA is present.

I agree that this (R 3.3.1 behavior) seems undesirable and looks
wrong, and the old (<= 2.2.7) behavior for  table(a,b,
exclude=NULL) seems desirable to me.


> >From R 3.3.1 help on 'table', in "Details" section:
> 'useNA' controls if the table includes counts of 'NA' values: the allowed values correspond to never, only if the count is positive and even for zero counts.  This is overridden by specifying 'exclude = NULL'.

> Specifying 'exclude = NULL' overrides 'useNA' to what value? The documentation doesn't say. Looking at the code of function 'table', the value is "always".

Yes, it should be documented what happens for this case,
(but read on ...)

> For the example, in R 3.3.1, the result like in R 2.7.2 can be obtained with useNA = "ifany" and 'exclude' unspecified.

Yes.  What should we do?
I currently think that we'd want to change the line

     useNA <- if (!missing(exclude) && is.null(exclude)) "always"

to

     useNA <- if (!missing(exclude) && is.null(exclude)) "ifany" # was "always"


which would not even contradict documentation, as indeed you
mentioned above, the exact action here had not been documented.

The change above at least does not break any of the standard R
tests ('make check-all', i.e., including the recommended
packages), which for me confirms that it may be "what is
best"...

----

Thank you for mentioning the important consequence for summary(<logical>).
They can helping insight what a "probably best" behavior should
be for these cases of table().

Martin Maechler,
ETH Zurich

> The result of 'summary' of a logical vector is affected. As mentioned in http://stackoverflow.com/questions/26775501/r-dropping-nas-in-logical-column-levels , in the code of function 'summary.default', for logical, table(object, exclude = NULL) is used.

> With R 2.7.2:

> > log <- c(NA, logical(4), NA, !logical(2), NA)
> > summary(log)
>    Mode   FALSE    TRUE    NA's
> logical       4       2       3
> > summary(log[!is.na(log)])
>    Mode   FALSE    TRUE
> logical       4       2
> > summary(TRUE)
>    Mode    TRUE
> logical       1

> With R 3.3.1:

> > log <- c(NA, logical(4), NA, !logical(2), NA)
> > summary(log)
>    Mode   FALSE    TRUE    NA's
> logical       4       2       3
> > summary(log[!is.na(log)])
>    Mode   FALSE    TRUE    NA's
> logical       4       2       0
> > summary(TRUE)
>    Mode    TRUE    NA's
> logical       1       0

> In R 3.3.1, "NA's' is always in the result of 'summary' of a logical vector. It is unlike 'summary' of a numeric vector.
> On the other hand, in R 3.3.1, FALSE is not in the result of 'summary' of a logical vector that doesn't  contain FALSE.

> I prefer the result of 'summary' of a logical vector like in R 2.7.2, or, alternatively, the result that always includes all possible values: FALSE, TRUE, NA.

I tend to agree, and strongly prefer the 'R(<=2.7.2)'-behavior
for table() {and hence summary(<logical>)}.



More information about the R-devel mailing list