[R] [EXTERNAL] Re: unexpected behavior in apply
Grzegorz Smoliński
g@@mo||n@k|1 @end|ng |rom gm@||@com
Fri Oct 8 20:51:57 CEST 2021
This will work as well:
d<-data.frame(d1 = letters[1:3],
d2 = c(1,2,3),
d3 = c(NA_character_,NA_character_,6))
apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
d1 d2 d3
FALSE TRUE FALSE
i.e. when NA changed do NA_character_
pt., 8 paź 2021 o 20:44 Derickson, Ryan, VHA NCOD via R-help
<r-help using r-project.org> napisał(a):
>
> This is interesting and does seem suboptimal. Especially because if I start with a matrix from the beginning, it behaves as expected.
>
> > d<-data.frame(d1 = letters[1:3],
> + d2 = c("1","2","3"),
> + d3 = c(NA,NA,"6"))
> >
> > str(d)
> 'data.frame': 3 obs. of 3 variables:
> $ d1: chr "a" "b" "c"
> $ d2: chr "1" "2" "3"
> $ d3: chr NA NA "6"
> >
> > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> d1 d2 d3
> FALSE TRUE FALSE
>
>
>
>
> -----Original Message-----
> From: Jiefei Wang <szwjf08 using gmail.com>
> Sent: Friday, October 8, 2021 2:22 PM
> To: Derickson, Ryan, VHA NCOD <Ryan.Derickson using va.gov>
> Cc: r-help using r-project.org
> Subject: [EXTERNAL] Re: [R] unexpected behavior in apply
>
> Ok, it turns out that this is documented, even though it looks surprising.
>
> First of all, the apply function will try to convert any object with the dim attribute to a matrix(my intuition agrees with you that there should be no conversion), so the first step of the apply function is
>
> > as.matrix.data.frame(d)
> d1 d2 d3
> [1,] "a" "1" NA
> [2,] "b" "2" NA
> [3,] "c" "3" " 6"
>
> Since the data frame `d` is a mixture of character and non-character values, the non-character value will be converted to the character using the function `format`. However, the problem is that the NA value will also be formatted to the character
>
> > format(c(NA, 6))
> [1] "NA" " 6"
>
> That's where the space comes from. It is purely for making the result pretty... The character NA will be removed later, but the space is not stripped. I would say this is not a good design, and it might be worth not including the NA value in the format function. At the current stage, I will suggest using the function `lapply` to do what you want.
>
> > lapply(d, FUN=function(x)all(x[!is.na(x)] <= 3))
> $d1
> [1] FALSE
> $d2
> [1] TRUE
> $d3
> [1] FALSE
>
> Everything should work as you expect.
>
> Best,
> Jiefei
>
> On Sat, Oct 9, 2021 at 2:03 AM Jiefei Wang <szwjf08 using gmail.com> wrote:
> >
> > Hi,
> >
> > I guess this can tell you what happens behind the scene
> >
> >
> > > d<-data.frame(d1 = letters[1:3],
> > + d2 = c(1,2,3),
> > + d3 = c(NA,NA,6))
> > > apply(d, 2, FUN=function(x)x)
> > d1 d2 d3
> > [1,] "a" "1" NA
> > [2,] "b" "2" NA
> > [3,] "c" "3" " 6"
> > > "a"<=3
> > [1] FALSE
> > > "2"<=3
> > [1] TRUE
> > > "6"<=3
> > [1] FALSE
> >
> > Note that there is an additional space in the character value " 6",
> > that's why your comparison fails. I do not understand why but this
> > might be a bug in R
> >
> > Best,
> > Jiefei
> >
> > On Sat, Oct 9, 2021 at 1:49 AM Derickson, Ryan, VHA NCOD via R-help
> > <r-help using r-project.org> wrote:
> > >
> > > Hello,
> > >
> > > I'm seeing unexpected behavior when using apply() compared to a for loop when a character vector is part of the data subjected to the apply statement. Below, I check whether all non-missing values are <= 3. If I include a character column, apply incorrectly returns TRUE for d3. If I only pass the numeric columns to apply, it is correct for d3. If I use a for loop, it is correct.
> > >
> > > > d<-data.frame(d1 = letters[1:3],
> > > + d2 = c(1,2,3),
> > > + d3 = c(NA,NA,6))
> > > >
> > > > d
> > > d1 d2 d3
> > > 1 a 1 NA
> > > 2 b 2 NA
> > > 3 c 3 6
> > > >
> > > > # results are incorrect
> > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > > d1 d2 d3
> > > FALSE TRUE TRUE
> > > >
> > > > # results are correct
> > > > apply(d[,2:3], 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > > d2 d3
> > > TRUE FALSE
> > > >
> > > > # results are correct
> > > > for(i in names(d)){
> > > + print(all(d[!is.na(d[,i]),i] <= 3)) }
> > > [1] FALSE
> > > [1] TRUE
> > > [1] FALSE
> > >
> > >
> > > Finally, if I remove the NA values from d3 and include the character column in apply, it is correct.
> > >
> > > > d<-data.frame(d1 = letters[1:3],
> > > + d2 = c(1,2,3),
> > > + d3 = c(4,5,6))
> > > >
> > > > d
> > > d1 d2 d3
> > > 1 a 1 4
> > > 2 b 2 5
> > > 3 c 3 6
> > > >
> > > > # results are correct
> > > > apply(d, 2, FUN=function(x)all(x[!is.na(x)] <= 3))
> > > d1 d2 d3
> > > FALSE TRUE FALSE
> > >
> > >
> > > Can someone help me understand what's happening?
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
> > > at.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7C%7Cd4c50
> > > d8f8da547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7
> > > C0%7C637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=3KAp
> > > Y5pdxAh5BzVZvjyrQKTpqkigQmW8N7pmU7DQGcU%3D&reserved=0
> > > PLEASE do read the posting guide
> > > https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > > .r-project.org%2Fposting-guide.html&data=04%7C01%7C%7Cd4c50d8f8d
> > > a547cbf36108d98a88880c%7Ce95f1b23abaf45ee821db7ab251ab3bf%7C0%7C0%7C
> > > 637693141284202940%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> > > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=mgrquTpZU
> > > SQt7cGywiHtaKWrdqAjvaG4gFx9aD7nRlA%3D&reserved=0
> > > and provide commented, minimal, self-contained, reproducible code.
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list