[Rd] na.omit inconsistent with is.na on list

Gabriel Becker g@bembecker @end|ng |rom gm@||@com
Mon Aug 16 21:21:45 CEST 2021


Hi Toby,

Right, my point is that is.na being equivalent to "is an incomplete case"
is really only true for atomic vectors. I don't see it being the case for
lists, given what is.na does for lists. This is all just  my opinion, but
that's my take: vec[!is.na(vec)] happens to be the same as na.omit(vec) for
atomics, but in general the operations are not equivalent and I wouldn't
expect them to be.

Best,
~G

On Mon, Aug 16, 2021 at 10:54 AM Toby Hocking <tdhock5 using gmail.com> wrote:

> To clarify, ?is.na docs say that 'na.omit' returns the object with
> incomplete cases removed.
> If we take is.na to be the definition of "incomplete cases" then a list
> element with scalar NA is incomplete.
> About the data.frame method, in my opinion it is highly
> confusing/inconsistent for na.omit to keep rows with incomplete cases in
> list columns, but not in columns which are atomic vectors,
>
> > (f.num <- data.frame(num=c(1,NA,2)))
>   num
> 1   1
> 2  NA
> 3   2
> > is.na(f.num)
>        num
> [1,] FALSE
> [2,]  TRUE
> [3,] FALSE
> > na.omit(f.num)
>   num
> 1   1
> 3   2
>
> > (f.list <- data.frame(list=I(list(1,NA,2))))
>   list
> 1    1
> 2   NA
> 3    2
> > is.na(f.list)
>       list
> [1,] FALSE
> [2,]  TRUE
> [3,] FALSE
> > na.omit(f.list)
>   list
> 1    1
> 2   NA
> 3    2
>
> On Sat, Aug 14, 2021 at 5:15 PM Gabriel Becker <gabembecker using gmail.com>
> wrote:
>
> > I understand what is.na does, the issue I have is that its task is not
> > equivalent to the conceptual task na.omit is doing, in my opinion, as
> > illustrated by what the data.frame method does.
> >
> > Thus what i was getting at above about it not being clear that lst[is.na
> (lst)]
> > being the correct thing for na.omit to do
> >
> > ~G
> >
> > ~G
> >
> > On Sat, Aug 14, 2021, 1:49 PM Toby Hocking <tdhock5 using gmail.com> wrote:
> >
> >> Some relevant information from ?is.na: the behavior for lists is
> >> documented,
> >>
> >>      For is.na, elementwise the result is false unless that element
> >>      is a length-one atomic vector and the single element of that
> >>      vector is regarded as NA or NaN (note that any is.na method
> >>      for the class of the element is ignored).
> >>
> >> Also there are other functions anyNA and is.na<- which are consistent
> >> with
> >> is.na. That is, anyNA only returns TRUE if the list has an element
> which
> >> is
> >> a scalar NA. And is.na<- sets list elements to logical NA to indicate
> >> missingness.
> >>
> >> On Fri, Aug 13, 2021 at 1:10 AM Hugh Parsonage <
> hugh.parsonage using gmail.com>
> >> wrote:
> >>
> >> > The data.frame method deliberately skips non-atomic columns before
> >> > invoking is.na(x) so I think it is fair to assume this behaviour is
> >> > intentional and assumed.
> >> >
> >> > Not so clear to me that there is a sensible answer for list columns.
> >> > (List columns seem to collide with the expectation that in each
> >> > variable every observation will be of the same type)
> >> >
> >> > Consider your list L as
> >> >
> >> > L <- list(NULL, NA, c(NA, NA))
> >> >
> >> > Seems like every observation could have a claim to be 'missing' here.
> >> > Concretely, if a data.frame had a list column representing the lat-lon
> >> > of an observation, we might only be able to represent missing values
> >> > like c(NA, NA).
> >> >
> >> > On Fri, 13 Aug 2021 at 17:27, Iñaki Ucar <iucar using fedoraproject.org>
> >> wrote:
> >> > >
> >> > > On Thu, 12 Aug 2021 at 22:20, Gabriel Becker <gabembecker using gmail.com
> >
> >> > wrote:
> >> > > >
> >> > > > Hi Toby,
> >> > > >
> >> > > > This definitely appears intentional, the first  expression of
> >> > > > stats:::na.omit.default is
> >> > > >
> >> > > >    if (!is.atomic(object))
> >> > > >
> >> > > >         return(object)
> >> > >
> >> > > I don't follow your point. This only means that the *default* method
> >> > > is not intended for non-atomic cases, but it doesn't mean it
> shouldn't
> >> > > exist a method for lists.
> >> > >
> >> > > > So it is explicitly just returning the object in non-atomic cases,
> >> > which
> >> > > > includes lists. I was not involved in this decision (obviously)
> but
> >> my
> >> > > > guess is that it is due to the fact that what constitutes an
> >> > observation
> >> > > > "being complete" in unclear in the list case. What should
> >> > > >
> >> > > > na.omit(list(5, NA, c(NA, 5)))
> >> > > >
> >> > > > return? Just the first element, or the first and the last? It
> >> seems, at
> >> > > > least to me, unclear. A small change to the documentation to to
> add
> >> > "atomic
> >> > >
> >> > > > is.na(list(5, NA, c(NA, 5)))
> >> > > [1] FALSE  TRUE FALSE
> >> > >
> >> > > Following Toby's argument, it's clear to me: the first and the last.
> >> > >
> >> > > Iñaki
> >> > >
> >> > > > (in the sense of is.atomic returning \code{TRUE})" in front of
> >> > "vectors"
> >> > > > or similar  where what types of objects are supported seems
> >> justified,
> >> > > > though, imho, as the current documentation is either ambiguous or
> >> > > > technically incorrect, depending on what we take "vector" to mean.
> >> > > >
> >> > > > Best,
> >> > > > ~G
> >> > > >
> >> > > > On Wed, Aug 11, 2021 at 10:16 PM Toby Hocking <tdhock5 using gmail.com>
> >> > wrote:
> >> > > >
> >> > > > > Also, the na.omit method for data.frame with list column seems
> to
> >> be
> >> > > > > inconsistent with is.na,
> >> > > > >
> >> > > > > > L <- list(NULL, NA, 0)
> >> > > > > > str(f <- data.frame(I(L)))
> >> > > > > 'data.frame': 3 obs. of  1 variable:
> >> > > > >  $ L:List of 3
> >> > > > >   ..$ : NULL
> >> > > > >   ..$ : logi NA
> >> > > > >   ..$ : num 0
> >> > > > >   ..- attr(*, "class")= chr "AsIs"
> >> > > > > > is.na(f)
> >> > > > >          L
> >> > > > > [1,] FALSE
> >> > > > > [2,]  TRUE
> >> > > > > [3,] FALSE
> >> > > > > > na.omit(f)
> >> > > > >    L
> >> > > > > 1
> >> > > > > 2 NA
> >> > > > > 3  0
> >> > > > >
> >> > > > > On Wed, Aug 11, 2021 at 9:58 PM Toby Hocking <tdhock5 using gmail.com
> >
> >> > wrote:
> >> > > > >
> >> > > > > > na.omit is documented as "na.omit returns the object with
> >> > incomplete
> >> > > > > cases
> >> > > > > > removed." and "At present these will handle vectors," so I
> >> > expected that
> >> > > > > > when it is used on a list, it should return the same thing as
> >> if we
> >> > > > > subset
> >> > > > > > via is.na; however I observed the following,
> >> > > > > >
> >> > > > > > > L <- list(NULL, NA, 0)
> >> > > > > > > str(L[!is.na(L)])
> >> > > > > > List of 2
> >> > > > > >  $ : NULL
> >> > > > > >  $ : num 0
> >> > > > > > > str(na.omit(L))
> >> > > > > > List of 3
> >> > > > > >  $ : NULL
> >> > > > > >  $ : logi NA
> >> > > > > >  $ : num 0
> >> > > > > >
> >> > > > > > Should na.omit be fixed so that it returns a result that is
> >> > consistent
> >> > > > > > with is.na? I assume that is.na is the canonical definition
> of
> >> > what
> >> > > > > > should be considered a missing value in R.
> >> > > > > >
> >> > > > >
> >> > > > >         [[alternative HTML version deleted]]
> >> > > > >
> >> > > > > ______________________________________________
> >> > > > > R-devel using r-project.org mailing list
> >> > > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> > > > >
> >> > > >
> >> > > >         [[alternative HTML version deleted]]
> >> > > >
> >> > > > ______________________________________________
> >> > > > R-devel using r-project.org mailing list
> >> > > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Iñaki Úcar
> >> > >
> >> > > ______________________________________________
> >> > > R-devel using r-project.org mailing list
> >> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >> > ______________________________________________
> >> > R-devel using r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >> >
> >>
> >>         [[alternative HTML version deleted]]
> >>
> >> ______________________________________________
> >> R-devel using r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-devel using r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

	[[alternative HTML version deleted]]



More information about the R-devel mailing list