[R] unexpected behavior in apply

PIKAL Petr petr@p|k@| @end|ng |rom prechez@@cz
Mon Oct 11 11:15:27 CEST 2021


Hi

it is not surprising at all.

from apply documentation

Arguments
X	
an array, including a matrix.

data.frame is not matrix or array (even if it rather resembles one)

So if you put a cake into oven you cannot expect getting fried potatoes from
it.

For data frames sapply or lapply is preferable as it is designed for lists
and data frame is (again from documentation)

A data frame is a list of variables of the same number of rows with unique
row names, given class "data.frame".

> sapply(d,function(x) all(x[!is.na(x)]<=3))
   d1    d2    d3 
FALSE  TRUE FALSE 

Cheers
Petr


> -----Original Message-----
> From: R-help <r-help-bounces using r-project.org> On Behalf Of Jiefei Wang
> Sent: Friday, October 8, 2021 8:22 PM
> To: Derickson, Ryan, VHA NCOD <Ryan.Derickson using va.gov>
> Cc: r-help using r-project.org
> Subject: Re: [R] unexpected behavior in apply
> 
> Ok, it turns out that this is documented, even though it looks surprising.
> 
> First of all, the apply function will try to convert any object with the
dim
> attribute to a matrix(my intuition agrees with you that there should be no
> conversion), so the first step of the apply function is
> 
> > as.matrix.data.frame(d)
>      d1  d2  d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
> 
> Since the data frame `d` is a mixture of character and non-character
values,
> the non-character value will be converted to the character using the
function
> `format`. However, the problem is that the NA value will also be formatted
to
> the character
> 
> > format(c(NA, 6))
> [1] "NA" " 6"
> 
> That's where the space comes from. It is purely for making the result
pretty...
> The character NA will be removed later, but the space is not stripped. I
would
> say this is not a good design, and it might be worth not including the NA
value
> in the format function. At the current stage, I will suggest using the
function
> `lapply` to do what you want.
> 
> > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
> $d1
> [1] FALSE
> $d2
> [1] TRUE
> $d3
> [1] FALSE
> 
> Everything should work as you expect.
> 
> Best,
> Jiefei
> 
> On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 using gmail.com> wrote:
> >
> > Hi,
> >
> > I guess this can tell you what happens behind the scene
> >
> >
> > > d<-data.frame(d1 = letters[1:3],
> > +               d2 = c(1,2,3),
> > +               d3 = c(NA,NA,6))
> > > apply(d, 2, FUN=function(x)x)
> >      d1  d2  d3
> > [1,] "a" "1" NA
> > [2,] "b" "2" NA
> > [3,] "c" "3" " 6"
> > > "a"<=3
> > [1] FALSE
> > > "2"<=3
> > [1] TRUE
> > > "6"<=3
> > [1] FALSE
> >
> > Note that there is an additional space in the character value " 6",
> > that's why your comparison fails. I do not understand why but this
> > might be a bug in R
> >
> > Best,
> > Jiefei
> >
> > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> > <r-help using r-project.org> wrote:
> > >
> > > Hello,
> > >
> > > I'm seeing unexpected behavior when using apply() compared to a for
> loop when a character vector is part of the data subjected to the apply
> statement. Below, I check whether all non-missing values are <= 3. If I
> include a character column, apply incorrectly returns TRUE for d3. If I
only
> pass the numeric columns to apply, it is correct for d3. If I use a for
loop, it is
> correct.
> > >
> > > > d<-data.frame(d1 = letters[1:3],
> > > +               d2 = c(1,2,3),
> > > +               d3 = c(NA,NA,6))
> > > >
> > > > d
> > >   d1 d2 d3
> > > 1  a  1 NA
> > > 2  b  2 NA
> > > 3  c  3  6
> > > >
> > > > # results are incorrect
> > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > >    d1    d2    d3
> > > FALSE  TRUE  TRUE
> > > >
> > > > # results are correct
> > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > >    d2    d3
> > >  TRUE FALSE
> > > >
> > > > # results are correct
> > > > for(i in names(d)){
> > > +   print(all(d[!is.na(d[,i]),i] <= 3)) }
> > > [1] FALSE
> > > [1] TRUE
> > > [1] FALSE
> > >
> > >
> > > Finally, if I remove the NA values from d3 and include the character
> column in apply, it is correct.
> > >
> > > > d<-data.frame(d1 = letters[1:3],
> > > +               d2 = c(1,2,3),
> > > +               d3 = c(4,5,6))
> > > >
> > > > d
> > >   d1 d2 d3
> > > 1  a  1  4
> > > 2  b  2  5
> > > 3  c  3  6
> > > >
> > > > # results are correct
> > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > >    d1    d2    d3
> > > FALSE  TRUE FALSE
> > >
> > >
> > > Can someone help me understand what's happening?
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-
> guide.html
> and provide commented, minimal, self-contained, reproducible code.


More information about the R-help mailing list