[R] R code for if-then-do code blocks

Tue Dec 18 01:02:07 CET 2018

I got another 10% savings with this example by using only one
subscripting adjustment.
I also fixed a typo in my previous posting (which didn't affect the timing).

microbenchmark(
 rmh={
    d3 <-data.frame(ID=rownames(d1),
               d1,
               test1=0,
               test2=0,
               test4=0,
               test5=0)
    myRowSubset <- d3$gender=="f" & d3$workshop==1
    test1 <- 1
    d3[myRowSubset, "test1"] <- test1 + 6
    d3[myRowSubset, "test2"] <- test1 + 6 + 2
    d3[myRowSubset, c("test4", "test5")] <- test1
  },
 rmh4={
   d4 <- data.frame(ID=rownames(d1),
                    d1,
                    test1=0,
                    test2=0,
                    test4=0,
                    test5=0)
   myRowSubset <- d4$gender=="f" & d4$workshop==1
   test1 <- 1
   d4[myRowSubset, c("test1", "test2", "test4", "test5")] <-
     matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE)
 }
)

Unit: microseconds
 expr     min       lq     mean   median       uq      max neval cld
  rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071   100   b
 rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545   100  a

On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger <rmh using temple.edu> wrote:
>
> this can be dome even faster, and I think more easily read, using only base R
>
> d1 <- data.frame(workshop=rep(1:2,4),
>                 gender=rep(c("f","m"),each=4))
>
> ## needed by vector and rowbased, not needed by rmh
> library(tibble)
> library(plyr)
> library(magrittr)
>
> microbenchmark(
>   vector = {d1 %>%
>     rownames_to_column("ID") %>%
>     mutate(
>       test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>       test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>       test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>       test5 = test4
>     ) },
>   rowbased = {d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
>         within,
>         if (gender == "f" & workshop == 1) {
>           test1 <- 1
>           test1 <- 6 + test1
>           test2 <- 2 + test1
>           test4 <- 1
>           test5 <- 1
>         } else {
>           test1 <- test2 <- test4 <- test5 <- 0
>         })},
>   rmh={
>     data.frame(ID=rownames(d1),
>                d1,
>                test1=0,
>                test2=0,
>                test4=0,
>                test5=0)
>     myRowSubset <- d3$gender=="f" & d3$workshop==1
>     test1 <- 1
>     d3[myRowSubset, "test1"] <- test1 + 6
>     d3[myRowSubset, "test2"] <- test1 + 6 + 2
>     d3[myRowSubset, c("test4", "test5")] <- test1
>   }
> )
>
> Unit: microseconds
>      expr      min       lq      mean   median        uq        max neval cld
>    vector 1281.994 1468.102  1669.266 1573.043  1750.354   3171.777   100  a
>  rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034   100   b
>       rmh  925.571 1056.530  1167.568 1116.425  1221.457   1968.199   100  a
> On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
> <r-help using r-project.org> wrote:
> >
> > Dear Paul,
> >
> > R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> > R in a SAS way will lead to very slow code.
> >
> > Your examples can be written vectorised
> >
> > d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(
> >     test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> >     test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> >     test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> >     test5 = test4
> >   )
> >
> > Here is a speed comparison.
> >
> > library(microbenchmark)
> > microbenchmark(
> >   vector = {d1 %>%
> >     rownames_to_column("ID") %>%
> >     mutate(
> >       test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> >       test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> >       test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> >       test5 = test4
> >     ) },
> >   rowbased = {d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> >   ddply("ID",
> >         within,
> >         if (gender == "f" & workshop == 1) {
> >           test1 <- 1
> >           test1 <- 6 + test1
> >           test2 <- 2 + test1
> >           test4 <- 1
> >           test5 <- 1
> >         } else {
> >           test1 <- test2 <- test4 <- test5 <- 0
> >         })}
> > )
> >
> >
> > Best regards,
> >
> > Thierry
> >
> > ir. Thierry Onkelinx
> > Statisticus / Statistician
> >
> > Vlaamse Overheid / Government of Flanders
> > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> > FOREST
> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > thierry.onkelinx using inbo.be
> > Havenlaan 88 bus 73, 1000 Brussel
> > www.inbo.be
> >
> > ///////////////////////////////////////////////////////////////////////////////////////////
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a post-mortem examination: he may be able to say
> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of data.
> > ~ John Tukey
> > ///////////////////////////////////////////////////////////////////////////////////////////
> >
> > <https://www.inbo.be>
> >
> >
> > Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> > r-help using r-project.org>:
> >
> > > Hello All,
> > >
> > > Season's greetings!
> > >
> > >  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > > code blocks. I've been trying to do likewise in R as that seems to be the
> > > most reliable way to get the same result.
> > >
> > > Below is some toy data and some code that does work. There are some things
> > > I don't necessarily like about the code though. So I was hoping some people
> > > could help make it better. One thing I don't like is that the within
> > > function reverses the order of the computed columns such that test1:test5
> > > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > > not to have to do so.
> > >
> > >  Another, perhaps very small thing, is the need to calculate an ID
> > > variable that becomes the basis for a grouping.
> > >
> > > I did considerable Internet searching for R code that conditionally
> > > computes blocks of code. I didn't find much though and so am wondering if
> > > my search terms were not sufficient or if there is some other reason. It
> > > occurred to me that maybe if-then-do code blocks like we often see in SAS
> > > as are frowned upon and therefore not much implemented.
> > >
> > > I'd be interested in seeing more R-compatible approaches if this is the
> > > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > > better to let R be R. Trouble is I'm not always sure how to do that.
> > >
> > > Thanks,
> > >
> > > Paul
> > >
> > >
> > > d1 <- data.frame(workshop=rep(1:2,4),
> > >                 gender=rep(c("f","m"),each=4))
> > >
> > > library(tibble)
> > > library(plyr)
> > >
> > > d2 <- d1 %>%
> > >   rownames_to_column("ID") %>%
> > >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> > >   ddply("ID",
> > >         within,
> > >         if (gender == "f" & workshop == 1) {
> > >           test1 <- 1
> > >           test1 <- 6 + test1
> > >           test2 <- 2 + test1
> > >           test4 <- 1
> > >           test5 <- 1
> > >         } else {
> > >           test1 <- test2 <- test4 <- test5 <- 0
> > >         })
> > >
> > > ______________________________________________
> > > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.