[R] R code for if-then-do code blocks

Richard M. Heiberger rmh @ending from temple@edu
Mon Dec 17 18:49:42 CET 2018


this can be dome even faster, and I think more easily read, using only base R

d1 <- data.frame(workshop=rep(1:2,4),
                gender=rep(c("f","m"),each=4))

## needed by vector and rowbased, not needed by rmh
library(tibble)
library(plyr)
library(magrittr)

microbenchmark(
  vector = {d1 %>%
    rownames_to_column("ID") %>%
    mutate(
      test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
      test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
      test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
      test5 = test4
    ) },
  rowbased = {d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
        within,
        if (gender == "f" & workshop == 1) {
          test1 <- 1
          test1 <- 6 + test1
          test2 <- 2 + test1
          test4 <- 1
          test5 <- 1
        } else {
          test1 <- test2 <- test4 <- test5 <- 0
        })},
  rmh={
    data.frame(ID=rownames(d1),
               d1,
               test1=0,
               test2=0,
               test4=0,
               test5=0)
    myRowSubset <- d3$gender=="f" & d3$workshop==1
    test1 <- 1
    d3[myRowSubset, "test1"] <- test1 + 6
    d3[myRowSubset, "test2"] <- test1 + 6 + 2
    d3[myRowSubset, c("test4", "test5")] <- test1
  }
)

Unit: microseconds
     expr      min       lq      mean   median        uq        max neval cld
   vector 1281.994 1468.102  1669.266 1573.043  1750.354   3171.777   100  a
 rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034   100   b
      rmh  925.571 1056.530  1167.568 1116.425  1221.457   1968.199   100  a
On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
<r-help using r-project.org> wrote:
>
> Dear Paul,
>
> R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> R in a SAS way will lead to very slow code.
>
> Your examples can be written vectorised
>
> d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(
>     test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>     test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>     test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>     test5 = test4
>   )
>
> Here is a speed comparison.
>
> library(microbenchmark)
> microbenchmark(
>   vector = {d1 %>%
>     rownames_to_column("ID") %>%
>     mutate(
>       test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>       test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>       test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>       test5 = test4
>     ) },
>   rowbased = {d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
>         within,
>         if (gender == "f" & workshop == 1) {
>           test1 <- 1
>           test1 <- 6 + test1
>           test2 <- 2 + test1
>           test4 <- 1
>           test5 <- 1
>         } else {
>           test1 <- test2 <- test4 <- test5 <- 0
>         })}
> )
>
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkelinx using inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///////////////////////////////////////////////////////////////////////////////////////////
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///////////////////////////////////////////////////////////////////////////////////////////
>
> <https://www.inbo.be>
>
>
> Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> r-help using r-project.org>:
>
> > Hello All,
> >
> > Season's greetings!
> >
> >  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > code blocks. I've been trying to do likewise in R as that seems to be the
> > most reliable way to get the same result.
> >
> > Below is some toy data and some code that does work. There are some things
> > I don't necessarily like about the code though. So I was hoping some people
> > could help make it better. One thing I don't like is that the within
> > function reverses the order of the computed columns such that test1:test5
> > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > not to have to do so.
> >
> >  Another, perhaps very small thing, is the need to calculate an ID
> > variable that becomes the basis for a grouping.
> >
> > I did considerable Internet searching for R code that conditionally
> > computes blocks of code. I didn't find much though and so am wondering if
> > my search terms were not sufficient or if there is some other reason. It
> > occurred to me that maybe if-then-do code blocks like we often see in SAS
> > as are frowned upon and therefore not much implemented.
> >
> > I'd be interested in seeing more R-compatible approaches if this is the
> > case. I've learned that it's a mistake to try and make R be like SAS. It's
> > better to let R be R. Trouble is I'm not always sure how to do that.
> >
> > Thanks,
> >
> > Paul
> >
> >
> > d1 <- data.frame(workshop=rep(1:2,4),
> >                 gender=rep(c("f","m"),each=4))
> >
> > library(tibble)
> > library(plyr)
> >
> > d2 <- d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> >   ddply("ID",
> >         within,
> >         if (gender == "f" & workshop == 1) {
> >           test1 <- 1
> >           test1 <- 6 + test1
> >           test2 <- 2 + test1
> >           test4 <- 1
> >           test5 <- 1
> >         } else {
> >           test1 <- test2 <- test4 <- test5 <- 0
> >         })
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list