[Rd] xtabs(), factors and NAs
Milan Bouchet-Valat
nalimilan at club.fr
Sat Jan 21 14:42:56 CET 2017
Le vendredi 20 janvier 2017 à 18:59 +0100, Martin Maechler a écrit :
> > > > > > > > > > > > Milan Bouchet-Valat <nalimilan at club.fr>
> > > > > > on Thu, 19 Jan 2017 13:58:31 +0100 writes:
> > Hi all,
> > I know this issue has been discussed a few times in the past already,
> > but Martin Maechler suggested in a bug report [1] that I raise it here.
> >
> > Basically, there is no (easy) way of printing NAs for all variables
> > when calling xtabs() on factors. Passing 'exclude=NULL,
> > na.action=na.pass' works for character vectors, but not for factors.
> >
>
> [ yes, but your example below is *not* showing that ... so may be
> a bit confusing !] {Reason: stringsAsFactors etc}
Yes, sorry, that illustrates why one should never try to make an
example prettier in the last minute. For reference, here's the correct
example:
> test <- data.frame(x=c("a",NA), stringsAsFactors=FALSE)
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
a <NA>
1 1
> test <- data.frame(x=factor(c("a",NA)))
> xtabs(~ x, exclude=NULL, na.action=na.pass, data=test)
x
a
1
> > > test <- data.frame(x=c("a",NA))
> > > xtabs(~ x, exclude=NULL,
> >
> > na.action=na.pass, data=test)
> > x
> > a
> > 1
> >
> > > test <- data.frame(x=factor(c("a",NA)))
> > > xtabs(~ x, exclude=NULL,
> >
> > na.action=na.pass, data=test)
> > x
> > a
> > 1
> >
> >
> > Even if it's documented, this inconsistency is annoying. When checking
> > data, it is often useful to print all NA values temporarily, without
> > calling addNA() individually on all crossed variables.
>
> {Note this is not (just) about print()ing; the issue is
> about the resulting *object*.}
> >
> > Would it make sense to add a new argument similar to table()'s useNA
> > which would behave the same for all input vector types?
>
> You have to be aware that table() has been changed since R
> 3.3.2, i.e., is different in R-devel and hence will be different
> in R 3.4.0.
> table()'s handling of NAs has become very involved /
> sophisticated(*), and currently I'd rather like to keep
> xtabs()'s behavior much simpler.
>
> Interestingly, after starting to play with data containing NA's and
> xtabs(*, na.action=na.pass)
> I have already detected bugs (for sparse=TRUE) and cases where
> the current xtabs() behavior seems dubious to me.
> So, the issue is --- as so often --- more involved than assumed initially.
>
> We (R core) will probably do something, but do need more time
> before we can promise anything more...
OK, thanks. Given for how long this behavior has existed, there's
certainly no hurry...
Regards
> Thank you for raising the issue,
> Martin Maechler, ETH Zurich
>
>
> *) R-devel sources always current at
> https://svn.r-project.org/R/trunk/src/library/base/R/table.R
>
> >
> > Regards
> > [1] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=14630
More information about the R-devel
mailing list