[Rd] table(exclude = NULL) always includes NA
Suharto Anggono Suharto Anggono
suharto_anggono at yahoo.com
Sun Aug 14 05:42:08 CEST 2016
useNA <- if (missing(useNA) && !missing(exclude) && !(NA %in% exclude)) "ifany"
An example where it change 'table' result for non-factor input, from https://stat.ethz.ch/pipermail/r-help/2005-April/069053.html :
x <- c(1,2,3,3,NA)
table(as.integer(x), exclude=NaN)
I bring the example up, in case that the change in result is not intended.
--------------------------------------------
On Sat, 13/8/16, Martin Maechler <maechler at stat.math.ethz.ch> wrote:
Subject: Re: [Rd] table(exclude = NULL) always includes NA
To: "Martin Maechler" <maechler at stat.math.ethz.ch>
@r-project.org
Date: Saturday, 13 August, 2016, 4:29 AM
>>>>> Martin Maechler <maechler at stat.math.ethz.ch>
>>>>> on Fri, 12 Aug 2016 10:12:01 +0200 writes:
>>>>> Suharto Anggono Suharto Anggono via R-devel <r-devel at r-project.org>
>>>>> on Thu, 11 Aug 2016 16:19:49 +0000 writes:
>> I stand corrected. The part "If set to 'NULL', it implies
>> 'useNA="always"'." is even in the documentation in R
>> 2.8.0. It was my fault not to check carefully. I wonder,
>> why "always" was chosen for 'useNA' for exclude = NULL.
> me too. "ifany" would seem more logical, and I am
> considering changing to that as a 2nd step (if the 1st
> step, below) shows to be feasible.
>> Why exclude = NULL is so special? What about another
>> 'exclude' of length zero, like character(0) (not c(),
>> because c() is NULL)? I thought that, too. But then, I
>> have no opinion about making it general.
> As mentioned, I entirely agree with that {and you are
> right about c() !!}.
>> It fits my expectation to override 'useNA' only if the
>> user doesn't explicitly specify 'useNA'.
>> Thank you for looking into this.
> you are welcome. As first step, I plan to commit the
> change to (*)
> useNA <- if (missing(useNA) && !missing(exclude) && !(NA
> %in% exclude)) "always"
> as proposed yesterday, and I'll eventually see / be
> notified about the effect in CRAN space.
and as I'm finding now, 20 minutes too late, doing step 1
without doing step 2 is not feasible.
It gives many 0 counts for <NA> e.g. for exclude = "foo".
> --
> (*) slightly more efficiently, I'll be using match()
> directly instead of %in%
>> My points: Could R 2.7.2 behavior of table(<non-factor>,
>> exclude = NULL) be brought back? But R 3.3.1 behavior is
>> in R since version 2.8.0, rather long.
> you are right... but then, the places / cases where the
> behavior would change back should be quite rare.
>> If not, I suggest changing summary(<logical>).
>> --------------------------------------------
> Thank you for your feedback, Suharto! Martin
>> On Thu, 11/8/16, Martin Maechler
>> <maechler at stat.math.ethz.ch> wrote:
>>
>> Subject: Re: [Rd] table(exclude = NULL) always includes
>> NA
>>
>> @r-project.org Cc: "Martin Maechler"
>> <maechler at stat.math.ethz.ch> Date: Thursday, 11 August,
>> 2016, 12:39 AM
>>
>> >>>>> Martin Maechler <maechler at stat.math.ethz.ch> >>>>>
>> on Tue, 9 Aug 2016 15:35:41 +0200 writes:
>>
>> >>>>> Suharto Anggono Suharto Anggono via R-devel
>> <r-devel at r-project.org> >>>>> on Sun, 7 Aug 2016 15:32:19
>> +0000 writes:
>>
>> > > This is an example from
>> https://stat.ethz.ch/pipermail/r-help/2007-May/132573.html
>> .
>> >
>> > > With R 2.7.2:
>> >
>> > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
>> > > > table(a, b, exclude = NULL) > > b > > a 1 2 > > 1 1
>> 1 > > 2 2 0 > > 3 1 0 > > <NA> 1 0
>> >
>> > > With R 3.3.1:
>> >
>> > > > a <- c(1, 1, 2, 2, NA, 3); b <- c(2, 1, 1, 1, 1, 1)
>> > > > table(a, b, exclude = NULL) > > b > > a 1 2 <NA> >
>> > 1 1 1 0 > > 2 2 0 0 > > 3 1 0 0 > > <NA> 1 0 0 > > >
>> table(a, b, useNA = "ifany") > > b > > a 1 2 > > 1 1 1 >
>> > 2 2 0 > > 3 1 0 > > <NA> 1 0 > > > table(a, b, exclude
>> = NULL, useNA = "ifany") > > b > > a 1 2 <NA> > > 1 1 1 0
>> > > 2 2 0 0 > > 3 1 0 0 > > <NA> 1 0 0
>> >
>> > > For the example, in R 3.3.1, the result of 'table'
>> with > > exclude = NULL includes NA even if NA is not
>> present. It is > > different from R 2.7.2, that comes
>> from factor(exclude = NULL), > > that includes NA only if
>> NA is present.
>> >
>> > I agree that this (R 3.3.1 behavior) seems undesirable
>> and looks > wrong, and the old (<= 2.2.7) behavior for
>> table(a,b, > exclude=NULL) seems desirable to me.
>> >
>> >
>> > > >From R 3.3.1 help on 'table', in "Details" section:
>> > > 'useNA' controls if the table includes counts of 'NA'
>> values: the allowed values correspond to never, only if
>> the count is positive and even for zero counts. This is
>> overridden by specifying 'exclude = NULL'.
>> >
>> > > Specifying 'exclude = NULL' overrides 'useNA' to what
>> value? The documentation doesn't say. Looking at the code
>> of function 'table', the value is "always".
>> >
>> > Yes, it should be documented what happens for this
>> case, > (but read on ...)
>>
>> and it is *not* true that the documentation does not say,
>> since 2013, it has contained
>>
>> exclude: levels to remove for all factors in ‘...’. If
>> set to ‘NULL’, it implies ‘useNA = "always"’. See
>> ‘Details’ for its interpretation for non-factor
>> arguments.
>>
>>
>> > > For the example, in R 3.3.1, the result like in R
>> 2.7.2 can be obtained with useNA = "ifany" and 'exclude'
>> unspecified.
>> >
>> > Yes. What should we do? > I currently think that we'd
>> want to change the line
>> >
>> > useNA <- if (!missing(exclude) && is.null(exclude))
>> "always"
>> >
>> > to
>> >
>> > useNA <- if (!missing(exclude) && is.null(exclude))
>> "ifany" # was "always"
>> >
>> >
>> > which would not even contradict documentation, as
>> indeed you > mentioned above, the exact action here had
>> not been documented.
>>
>> The last part ("which ..") above is wrong, as mentioned
>> earlier.
>>
>> The above change entails behaviour which looks better to
>> me; however, the change *is* "against the current
>> documentation". and after experimentation (a "complete
>> factorial design" of argument settings), I'm not entirely
>> happy with the result.... and one reason is that 'exclude
>> = NULL' and (e.g.) 'exclude = c()' are (still) handled
>> differently: From a usual interpreation, both should mean
>> "do not exclude any factor entries (and levels) from
>> tabulation" but one of the two changes the default of
>> 'useNA' and the other does not. If we want a change
>> anyway (and have to update the doc), it could be "more
>> logical" to replace the line above by
>>
>> useNA <- if (missing(useNA) && !missing(exclude) && !(NA
>> %in% exclude)) "always"
>>
>> notably, replacing 'useNA' *only* if it has not been
>> specified, which seems much closer to "typically
>> expected" behavior..
>>
>>
>>
>>
>> > The change above at least does not break any of the
>> standard R > tests ('make check-all', i.e., including the
>> recommended > packages), which for me confirms that it
>> may be "what is > best"...
>> >
>> > ----
>> >
>> > Thank you for mentioning the important consequence for
>> summary(<logical>). > They can helping insight what a
>> "probably best" behavior should > be for these cases of
>> table().
>> >
>> > Martin Maechler, > ETH Zurich
>> >
>> > > The result of 'summary' of a logical vector is
>> affected. As mentioned in
>> http://stackoverflow.com/questions/26775501/r-dropping-nas-in-logical-column-levels
>> , in the code of function 'summary.default', for logical,
>> table(object, exclude = NULL) is used.
>> >
>> > > With R 2.7.2:
>> >
>> > > > log <- c(NA, logical(4), NA, !logical(2), NA) > > >
>> summary(log) > > Mode FALSE TRUE NA's > > logical 4 2 3 >
>> > > summary(log[!is.na(log)]) > > Mode FALSE TRUE > >
>> logical 4 2 > > > summary(TRUE) > > Mode TRUE > > logical
>> 1
>> >
>> > > With R 3.3.1:
>> >
>> > > > log <- c(NA, logical(4), NA, !logical(2), NA) > > >
>> summary(log) > > Mode FALSE TRUE NA's > > logical 4 2 3 >
>> > > summary(log[!is.na(log)]) > > Mode FALSE TRUE NA's >
>> > logical 4 2 0 > > > summary(TRUE) > > Mode TRUE NA's >
>> > logical 1 0
>> >
>> > > In R 3.3.1, "NA's' is always in the result of
>> 'summary' of a logical vector. It is unlike 'summary' of
>> a numeric vector. > > On the other hand, in R 3.3.1,
>> FALSE is not in the result of 'summary' of a logical
>> vector that doesn't contain FALSE.
>> >
>> > > I prefer the result of 'summary' of a logical vector
>> like in R 2.7.2, or, alternatively, the result that
>> always includes all possible values: FALSE, TRUE, NA.
>> >
>> > I tend to agree, and strongly prefer the
>> 'R(<=2.7.2)'-behavior > for table() {and hence
>> summary(<logical>)}.
>>
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
More information about the R-devel
mailing list