[Rd] na.omit inconsistent with is.na on list

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Sun Aug 15 02:15:13 CEST 2021


I understand what is.na does, the issue I have is that its task is not
equivalent to the conceptual task na.omit is doing, in my opinion, as
illustrated by what the data.frame method does.

Thus what i was getting at above about it not being clear that lst[is.na(lst)]
being the correct thing for na.omit to do

~G

~G

On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 using gmail.com> wrote:

> Some relevant information from ?is.na: the behavior for lists is
> documented,
>
>      For is.na, elementwise the result is false unless that element
>      is a length-one atomic vector and the single element of that
>      vector is regarded as NA or NaN (note that any is.na method
>      for the class of the element is ignored).
>
> Also there are other functions anyNA and is.na<- which are consistent with
> is.na. That is, anyNA only returns TRUE if the list has an element which
> is
> a scalar NA. And is.na<- sets list elements to logical NA to indicate
> missingness.
>
> On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <hugh.parsonage using gmail.com>
> wrote:
>
> > The data.frame method deliberately skips non-atomic columns before
> > invoking is.na(x) so I think it is fair to assume this behaviour is
> > intentional and assumed.
> >
> > Not so clear to me that there is a sensible answer for list columns.
> > (List columns seem to collide with the expectation that in each
> > variable every observation will be of the same type)
> >
> > Consider your list L as
> >
> > L <- list(NULL, NA, c(NA, NA))
> >
> > Seems like every observation could have a claim to be 'missing' here.
> > Concretely, if a data.frame had a list column representing the lat-lon
> > of an observation, we might only be able to represent missing values
> > like c(NA, NA).
> >
> > On Fri, 13 Aug 2021 at 17:27, Iñaki Ucar <iucar using fedoraproject.org>
> wrote:
> > >
> > > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker using gmail.com>
> > wrote:
> > > >
> > > > Hi Toby,
> > > >
> > > > This definitely appears intentional, the first  expression of
> > > > stats:::na.omit.default is
> > > >
> > > >    if (!is.atomic(object))
> > > >
> > > >         return(object)
> > >
> > > I don't follow your point. This only means that the *default* method
> > > is not intended for non-atomic cases, but it doesn't mean it shouldn't
> > > exist a method for lists.
> > >
> > > > So it is explicitly just returning the object in non-atomic cases,
> > which
> > > > includes lists. I was not involved in this decision (obviously) but
> my
> > > > guess is that it is due to the fact that what constitutes an
> > observation
> > > > "being complete" in unclear in the list case. What should
> > > >
> > > > na.omit(list(5, NA, c(NA, 5)))
> > > >
> > > > return? Just the first element, or the first and the last? It seems,
> at
> > > > least to me, unclear. A small change to the documentation to to add
> > "atomic
> > >
> > > > is.na(list(5, NA, c(NA, 5)))
> > > [1] FALSE  TRUE FALSE
> > >
> > > Following Toby's argument, it's clear to me: the first and the last.
> > >
> > > Iñaki
> > >
> > > > (in the sense of is.atomic returning \code{TRUE})" in front of
> > "vectors"
> > > > or similar  where what types of objects are supported seems
> justified,
> > > > though, imho, as the current documentation is either ambiguous or
> > > > technically incorrect, depending on what we take "vector" to mean.
> > > >
> > > > Best,
> > > > ~G
> > > >
> > > > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 using gmail.com>
> > wrote:
> > > >
> > > > > Also, the na.omit method for data.frame with list column seems to
> be
> > > > > inconsistent with is.na,
> > > > >
> > > > > > L <- list(NULL, NA, 0)
> > > > > > str(f <- data.frame(I(L)))
> > > > > 'data.frame': 3 obs. of  1 variable:
> > > > >  $ L:List of 3
> > > > >   ..$ : NULL
> > > > >   ..$ : logi NA
> > > > >   ..$ : num 0
> > > > >   ..- attr(*, "class")= chr "AsIs"
> > > > > > is.na(f)
> > > > >          L
> > > > > [1,] FALSE
> > > > > [2,]  TRUE
> > > > > [3,] FALSE
> > > > > > na.omit(f)
> > > > >    L
> > > > > 1
> > > > > 2 NA
> > > > > 3  0
> > > > >
> > > > > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 using gmail.com>
> > wrote:
> > > > >
> > > > > > na.omit is documented as "na.omit returns the object with
> > incomplete
> > > > > cases
> > > > > > removed." and "At present these will handle vectors," so I
> > expected that
> > > > > > when it is used on a list, it should return the same thing as if
> we
> > > > > subset
> > > > > > via is.na; however I observed the following,
> > > > > >
> > > > > > > L <- list(NULL, NA, 0)
> > > > > > > str(L[!is.na(L)])
> > > > > > List of 2
> > > > > >  $ : NULL
> > > > > >  $ : num 0
> > > > > > > str(na.omit(L))
> > > > > > List of 3
> > > > > >  $ : NULL
> > > > > >  $ : logi NA
> > > > > >  $ : num 0
> > > > > >
> > > > > > Should na.omit be fixed so that it returns a result that is
> > consistent
> > > > > > with is.na? I assume that is.na is the canonical definition of
> > what
> > > > > > should be considered a missing value in R.
> > > > > >
> > > > >
> > > > >         [[alternative HTML version deleted]]
> > > > >
> > > > > ______________________________________________
> > > > > R-devel using r-project.org mailing list
> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > > > >
> > > >
> > > >         [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-devel using r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> > >
> > >
> > > --
> > > Iñaki Úcar
> > >
> > > ______________________________________________
> > > R-devel using r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > ______________________________________________
> > R-devel using r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list