[R] unexpected behavior in apply

Jiefei Wang @zwj|08 @end|ng |rom gm@||@com
Fri Oct 8 20:21:35 CEST 2021


Ok, it turns out that this is documented, even though it looks surprising.

First of all, the apply function will try to convert any object with
the dim attribute to a matrix(my intuition agrees with you that there
should be no conversion), so the first step of the apply function is

> as.matrix.data.frame(d)
     d1  d2  d3
[1,] "a" "1" NA
[2,] "b" "2" NA
[3,] "c" "3" " 6"

Since the data frame `d` is a mixture of character and non-character
values, the non-character value will be converted to the character
using the function `format`. However, the problem is that the NA value
will also be formatted to the character

> format(c(NA, 6))
[1] "NA" " 6"

That's where the space comes from. It is purely for making the result
pretty... The character NA will be removed later, but the space is not
stripped. I would say this is not a good design, and it might be worth
not including the NA value in the format function. At the current
stage, I will suggest using the function `lapply` to do what you want.

> lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
$d1
[1] FALSE
$d2
[1] TRUE
$d3
[1] FALSE

Everything should work as you expect.

Best,
Jiefei

On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 using gmail.com> wrote:
>
> Hi,
>
> I guess this can tell you what happens behind the scene
>
>
> > d<-data.frame(d1 = letters[1:3],
> +               d2 = c(1,2,3),
> +               d3 = c(NA,NA,6))
> > apply(d, 2, FUN=function(x)x)
>      d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> > "a"<=3
> [1] FALSE
> > "2"<=3
> [1] TRUE
> > "6"<=3
> [1] FALSE
>
> Note that there is an additional space in the character value " 6",
> that's why your comparison fails. I do not understand why but this
> might be a bug in R
>
> Best,
> Jiefei
>
> On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> <r-help using r-project.org> wrote:
> >
> > Hello,
> >
> > I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(NA,NA,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1 NA
> > 2  b  2 NA
> > 3  c  3  6
> > >
> > > # results are incorrect
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d1    d2    d3
> > FALSE  TRUE  TRUE
> > >
> > > # results are correct
> > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d2    d3
> >  TRUE FALSE
> > >
> > > # results are correct
> > > for(i in names(d)){
> > +   print(all(d[!is.na(d[,i]),i] <= 3))
> > + }
> > [1] FALSE
> > [1] TRUE
> > [1] FALSE
> >
> >
> > Finally, if I remove the NA values from d3 and include the character column in apply, it is correct.
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(4,5,6))
> > >
> > > d
> >   d1 d2 d3
> > 1  a  1  4
> > 2  b  2  5
> > 3  c  3  6
> > >
> > > # results are correct
> > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> >    d1    d2    d3
> > FALSE  TRUE FALSE
> >
> >
> > Can someone help me understand what's happening?
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list