[R] R code for if-then-do code blocks
Richard M. Heiberger
rmh @ending from temple@edu
Tue Dec 18 01:02:07 CET 2018
I got another 10% savings with this example by using only one
subscripting adjustment.
I also fixed a typo in my previous posting (which didn't affect the timing).
microbenchmark(
rmh={
d3 <-data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset <- d3$gender=="f" & d3$workshop==1
test1 <- 1
d3[myRowSubset, "test1"] <- test1 + 6
d3[myRowSubset, "test2"] <- test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] <- test1
},
rmh4={
d4 <- data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
myRowSubset <- d4$gender=="f" & d4$workshop==1
test1 <- 1
d4[myRowSubset, c("test1", "test2", "test4", "test5")] <-
matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE)
}
)
Unit: microseconds
expr min lq mean median uq max neval cld
rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071 100 b
rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545 100 a
On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger <rmh using temple.edu> wrote:
>
> this can be dome even faster, and I think more easily read, using only base R
>
> d1 <- data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> ## needed by vector and rowbased, not needed by rmh
> library(tibble)
> library(plyr)
> library(magrittr)
>
> microbenchmark(
> vector = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
> ) },
> rowbased = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
> test1 <- 1
> test1 <- 6 + test1
> test2 <- 2 + test1
> test4 <- 1
> test5 <- 1
> } else {
> test1 <- test2 <- test4 <- test5 <- 0
> })},
> rmh={
> data.frame(ID=rownames(d1),
> d1,
> test1=0,
> test2=0,
> test4=0,
> test5=0)
> myRowSubset <- d3$gender=="f" & d3$workshop==1
> test1 <- 1
> d3[myRowSubset, "test1"] <- test1 + 6
> d3[myRowSubset, "test2"] <- test1 + 6 + 2
> d3[myRowSubset, c("test4", "test5")] <- test1
> }
> )
>
> Unit: microseconds
> expr min lq mean median uq max neval cld
> vector 1281.994 1468.102 1669.266 1573.043 1750.354 3171.777 100 a
> rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034 100 b
> rmh 925.571 1056.530 1167.568 1116.425 1221.457 1968.199 100 a
> On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
> <r-help using r-project.org> wrote:
> >
> > Dear Paul,
> >
> > R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> > R in a SAS way will lead to very slow code.
> >
> > Your examples can be written vectorised
> >
> > d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(
> > test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> > test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> > test5 = test4
> > )
> >
> > Here is a speed comparison.
> >
> > library(microbenchmark)
> > microbenchmark(
> > vector = {d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(
> > test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> > test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> > test5 = test4
> > ) },
> > rowbased = {d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > ddply("ID",
> > within,
> > if (gender == "f" & workshop == 1) {
> > test1 <- 1
> > test1 <- 6 + test1
> > test2 <- 2 + test1
> > test4 <- 1
> > test5 <- 1
> > } else {
> > test1 <- test2 <- test4 <- test5 <- 0
> > })}
> > )
> >
> >
> > Best regards,
> >
> > Thierry
> >
> > ir. Thierry Onkelinx
> > Statisticus / Statistician
> >
> > Vlaamse Overheid / Government of Flanders
> > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> > FOREST
> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > thierry.onkelinx using inbo.be
> > Havenlaan 88 bus 73, 1000 Brussel
> > www.inbo.be
> >
> > ///////////////////////////////////////////////////////////////////////////////////////////
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a post-mortem examination: he may be able to say
> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of data.
> > ~ John Tukey
> > ///////////////////////////////////////////////////////////////////////////////////////////
> >
> > <https://www.inbo.be>
> >
> >
> > Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> > r-help using r-project.org>:
> >
> > > Hello All,
> > >
> > > Season's greetings!
> > >
> > > Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > > code blocks. I've been trying to do likewise in R as that seems to be the
> > > most reliable way to get the same result.
> > >
> > > Below is some toy data and some code that does work. There are some things
> > > I don't necessarily like about the code though. So I was hoping some people
> > > could help make it better. One thing I don't like is that the within
> > > function reverses the order of the computed columns such that test1:test5
> > > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > > not to have to do so.
> > >
> > > Another, perhaps very small thing, is the need to calculate an ID
> > > variable that becomes the basis for a grouping.
> > >
> > > I did considerable Internet searching for R code that conditionally
> > > computes blocks of code. I didn't find much though and so am wondering if
> > > my search terms were not sufficient or if there is some other reason. It
> > > occurred to me that maybe if-then-do code blocks like we often see in SAS
> > > as are frowned upon and therefore not much implemented.
> > >
> > > I'd be interested in seeing more R-compatible approaches if this is the
> > > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > > better to let R be R. Trouble is I'm not always sure how to do that.
> > >
> > > Thanks,
> > >
> > > Paul
> > >
> > >
> > > d1 <- data.frame(workshop=rep(1:2,4),
> > > gender=rep(c("f","m"),each=4))
> > >
> > > library(tibble)
> > > library(plyr)
> > >
> > > d2 <- d1 %>%
> > > rownames_to_column("ID") %>%
> > > mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > > ddply("ID",
> > > within,
> > > if (gender == "f" & workshop == 1) {
> > > test1 <- 1
> > > test1 <- 6 + test1
> > > test2 <- 2 + test1
> > > test4 <- 1
> > > test5 <- 1
> > > } else {
> > > test1 <- test2 <- test4 <- test5 <- 0
> > > })
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list