[R] recode: how to avoid nested ifelse

Neal Fultz nfultz at gmail.com
Sat Jun 8 21:17:55 CEST 2013


rowSums and Reduce will have the same problems with bad data you alluded to earlier, eg
cg = 1, hs = 0

But that's something to check for with crosstabs anyway.


Side note: you should check out the microbenchmark pkg, it's quite handy.


R>require(microbenchmark)
R>microbenchmark(
+   f1(cg,hs,es),
+   f2(cg,hs,es),
+   f3(cg,hs,es),
+   f4(cg,hs,es)
+ )
Unit: microseconds
           expr       min         lq     median         uq       max neval
 f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112   100
 f2(cg, hs, es)   730.665   755.5750   811.7445   934.3320  6179.798   100
 f3(cg, hs, es)    85.029   101.6785   129.8605   196.2835  2820.187   100
 f4(cg, hs, es)   762.232   804.4850   843.7170  1079.0800 24869.548   100

On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote:
> I still argue for na.rm=FALSE, but that is cute, also substantially faster
> 
> f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
> f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
> f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
> f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3))
> 
> es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
> hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
> cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)
> 
> system.time(replicate(1000, f1(cg, hs, es)))
> system.time(replicate(1000, f2(cg, hs, es)))
> system.time(replicate(1000, f3(cg, hs, es)))
> system.time(replicate(1000, f4(cg, hs, es)))
> 
> > system.time(replicate(1000, f1(cg, hs, es)))
>    user  system elapsed
>   22.73    0.03   22.76
> > system.time(replicate(1000, f2(cg, hs, es)))
>    user  system elapsed
>    0.92    0.04    0.95
> > system.time(replicate(1000, f3(cg, hs, es)))
>    user  system elapsed
>    0.19    0.02    0.20
>  > system.time(replicate(1000, f4(cg, hs, es)))
>    user  system elapsed
>    0.95    0.03    0.98
> 
> 
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> 
> 
> 
> 
> On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfultz at gmail.com> wrote:
> > I would do this to get the highest non-missing level:
> >
> > x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)
> >
> > rock chalk...
> >
> > -nfultz
> >
> > On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
> >> Hi Paul,
> >>
> >> Unless you have truly offended the data generating oracle*, the
> >> pattern: NA, 1, NA, should be a data entry error --- graduating HS
> >> implies graduating ES, no?  I would argue fringe cases like that
> >> should be corrected in the data, not through coding work arounds.
> >> Then you can just do:
> >>
> >> x <- do.call(paste0, list(es, hs, cg))
> >>
> >> > table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", "es","hs", "cg")))
> >> none   es   hs   cg
> >>    4    1    1    2
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> *Drawn from comments by Judea Pearl one lively session.
> >>
> >>
> >> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> >> > In our Summer Stats Institute, I was asked a question that amounts to
> >> > reversing the effect of the contrasts function (reconstruct an ordinal
> >> > predictor from a set of binary columns). The best I could think of was to
> >> > link together several ifelse functions, and I don't think I want to do this
> >> > if the example became any more complicated.
> >> >
> >> > I'm unable to remember a less error prone method :). But I expect you might.
> >> >
> >> > Here's my working example code
> >> >
> >> > ## Paul Johnson <pauljohn at ku.edu>
> >> > ## 2013-06-07
> >> >
> >> > ## We need to create an ordinal factor from these indicators
> >> > ## completed elementary school
> >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1)
> >> > ## completed high school
> >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
> >> > ## completed college graduate
> >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
> >> >
> >> > ed <- ifelse(cg == 1, 3,
> >> >              ifelse(hs == 1, 2,
> >> >                     ifelse(es == 1, 1, 0)))
> >> >
> >> > edf <- factor(ed, levels = 0:3,  labels = c("none", "es", "hs", "cg"))
> >> > data.frame(es, hs, cg, ed, edf)
> >> >
> >> > ## Looks OK, but what if there are missings?
> >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
> >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
> >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
> >> > ed <- ifelse(cg == 1, 3,
> >> >              ifelse(hs == 1, 2,
> >> >                     ifelse(es == 1, 1, 0)))
> >> > cbind(es, hs, cg, ed)
> >> >
> >> > ## That's bad, ifelse returns NA too frequently.
> >> > ## Revise (becoming tedious!)
> >> >
> >> > ed <- ifelse(!is.na(cg) & cg == 1, 3,
> >> >              ifelse(!is.na(hs) & hs == 1, 2,
> >> >                     ifelse(!is.na(es) & es == 1, 1,
> >> >                            ifelse(is.na(es), NA, 0))))
> >> > cbind(es, hs, cg, ed)
> >> >
> >> >
> >> > ## Does the project director want us to worry about
> >> > ## logical inconsistencies, such as es = 0 but cg = 1?
> >> > ## I hope not.
> >> >
> >> > Thanks in advance, I hope you are having a nice summer.
> >> >
> >> > pj
> >> >
> >> > --
> >> > Paul E. Johnson
> >> > Professor, Political Science      Assoc. Director
> >> > 1541 Lilac Lane, Room 504      Center for Research Methods
> >> > University of Kansas                 University of Kansas
> >> > http://pj.freefaculty.org               http://quant.ku.edu
> >> >
> >> >         [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Joshua Wiley
> >> Ph.D. Student, Health Psychology
> >> University of California, Los Angeles
> >> http://joshuawiley.com/
> >> Senior Analyst - Elkhart Group Ltd.
> >> http://elkhartgroup.com
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://joshuawiley.com/
> Senior Analyst - Elkhart Group Ltd.
> http://elkhartgroup.com



More information about the R-help mailing list