[R] R code for if-then-do code blocks
Richard M. Heiberger
rmh @ending from temple@edu
Mon Dec 17 18:49:42 CET 2018
this can be dome even faster, and I think more easily read, using only base R
d1 <- data.frame(workshop=rep(1:2,4),
gender=rep(c("f","m"),each=4))
## needed by vector and rowbased, not needed by rmh
library(tibble)
library(plyr)
library(magrittr)
microbenchmark(
vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
) },
rowbased = {d1 %>%
rownames_to_column("ID") %>%
mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
ddply("ID",
within,
if (gender == "f" & workshop == 1) {
test1 <- 1
test1 <- 6 + test1
test2 <- 2 + test1
test4 <- 1
test5 <- 1
} else {
test1 <- test2 <- test4 <- test5 <- 0
})},
rmh={
data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset <- d3$gender=="f" & d3$workshop==1
test1 <- 1
d3[myRowSubset, "test1"] <- test1 + 6
d3[myRowSubset, "test2"] <- test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] <- test1
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a
rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b
rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a
On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
<r-help using r-project.org> wrote:
>
> Dear Paul,
>
> R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> R in a SAS way will lead to very slow code.
>
> Your examples can be written vectorised
>
> d1 %>%
> rownames_to_column("ID") %>%
> mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
> )
>
> Here is a speed comparison.
>
> library(microbenchmark)
> microbenchmark(
> vector = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
> ) },
> rowbased = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
> test1 <- 1
> test1 <- 6 + test1
> test2 <- 2 + test1
> test4 <- 1
> test5 <- 1
> } else {
> test1 <- test2 <- test4 <- test5 <- 0
> })}
> )
>
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> r-help using r-project.org>:
>
> > Hello All,
> >
> > Season's greetings!
> >
> > Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > code blocks. I've been trying to do likewise in R as that seems to be the
> > most reliable way to get the same result.
> >
> > Below is some toy data and some code that does work. There are some things
> > I don't necessarily like about the code though. So I was hoping some people
> > could help make it better. One thing I don't like is that the within
> > function reverses the order of the computed columns such that test1:test5
> > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > not to have to do so.
> >
> > Another, perhaps very small thing, is the need to calculate an ID
> > variable that becomes the basis for a grouping.
> >
> > I did considerable Internet searching for R code that conditionally
> > computes blocks of code. I didn't find much though and so am wondering if
> > my search terms were not sufficient or if there is some other reason. It
> > occurred to me that maybe if-then-do code blocks like we often see in SAS
> > as are frowned upon and therefore not much implemented.
> >
> > I'd be interested in seeing more R-compatible approaches if this is the
> > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > better to let R be R. Trouble is I'm not always sure how to do that.
> >
> > Thanks,
> >
> > Paul
> >
> >
> > d1 <- data.frame(workshop=rep(1:2,4),
> > gender=rep(c("f","m"),each=4))
> >
> > library(tibble)
> > library(plyr)
> >
> > d2 <- d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > ddply("ID",
> > within,
> > if (gender == "f" & workshop == 1) {
> > test1 <- 1
> > test1 <- 6 + test1
> > test2 <- 2 + test1
> > test4 <- 1
> > test5 <- 1
> > } else {
> > test1 <- test2 <- test4 <- test5 <- 0
> > })
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list