[Rd] table(exclude = NULL) always includes NA

Suharto Anggono Suharto Anggono suharto_anggono at yahoo.com
Sun Aug 7 17:32:19 CEST 2016


This is an example from https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html .

With R 2.7.2:

> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> table(a, b, exclude = NULL)
      b
a      1 2
  1    1 1
  2    2 0
  3    1 0
  <NA> 1 0

With R 3.3.1:

> a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
> table(a, b, exclude = NULL)
      b
a      1 2 <NA>
  1    1 1    0
  2    2 0    0
  3    1 0    0
  <NA> 1 0    0
> table(a, b, useNA = "ifany")
      b
a      1 2
  1    1 1
  2    2 0
  3    1 0
  <NA> 1 0
> table(a, b, exclude = NULL, useNA = "ifany")
      b
a      1 2 <NA>
  1    1 1    0
  2    2 0    0
  3    1 0    0
  <NA> 1 0    0

For the example, in R 3.3.1, the result of 'table' with exclude = NULL includes NA even if NA is not present. It is different from R 2.7.2, that comes from factor(exclude = NULL), that includes NA only if NA is present.

>From R 3.3.1 help on 'table', in "Details" section:
'useNA' controls if the table includes counts of 'NA' values: the allowed values correspond to never, only if the count is positive and even for zero counts.  This is overridden by specifying 'exclude = NULL'.

Specifying 'exclude = NULL' overrides 'useNA' to what value? The documentation doesn't say. Looking at the code of function 'table', the value is "always".

For the example, in R 3.3.1, the result like in R 2.7.2 can be obtained with useNA = "ifany" and 'exclude' unspecified.


The result of 'summary' of a logical vector is affected. As mentioned in http://stackoverflow.com/questions/26775501/r-dropping-nas-in-logical-column-levels , in the code of function 'summary.default', for logical, table(object, exclude = NULL) is used.

With R 2.7.2:

> log <- c(NA, logical(4), NA, !logical(2), NA)
> summary(log)
   Mode   FALSE    TRUE    NA's
logical       4       2       3
> summary(log[!is.na(log)])
   Mode   FALSE    TRUE
logical       4       2
> summary(TRUE)
   Mode    TRUE
logical       1

With R 3.3.1:

> log <- c(NA, logical(4), NA, !logical(2), NA)
> summary(log)
   Mode   FALSE    TRUE    NA's
logical       4       2       3
> summary(log[!is.na(log)])
   Mode   FALSE    TRUE    NA's
logical       4       2       0
> summary(TRUE)
   Mode    TRUE    NA's
logical       1       0

In R 3.3.1, "NA's' is always in the result of 'summary' of a logical vector. It is unlike 'summary' of a numeric vector.
On the other hand, in R 3.3.1, FALSE is not in the result of 'summary' of a logical vector that doesn't  contain FALSE.

I prefer the result of 'summary' of a logical vector like in R 2.7.2, or, alternatively, the result that always includes all possible values: FALSE, TRUE, NA.



More information about the R-devel mailing list