[R] recode: how to avoid nested ifelse
Neal Fultz
nfultz at gmail.com
Sat Jun 8 21:17:55 CEST 2013
rowSums and Reduce will have the same problems with bad data you alluded to earlier, eg
cg = 1, hs = 0
But that's something to check for with crosstabs anyway.
Side note: you should check out the microbenchmark pkg, it's quite handy.
R>require(microbenchmark)
R>microbenchmark(
+ f1(cg,hs,es),
+ f2(cg,hs,es),
+ f3(cg,hs,es),
+ f4(cg,hs,es)
+ )
Unit: microseconds
expr min lq median uq max neval
f1(cg, hs, es) 23029.848 25279.9660 27024.9640 29996.6810 55444.112 100
f2(cg, hs, es) 730.665 755.5750 811.7445 934.3320 6179.798 100
f3(cg, hs, es) 85.029 101.6785 129.8605 196.2835 2820.187 100
f4(cg, hs, es) 762.232 804.4850 843.7170 1079.0800 24869.548 100
On Fri, Jun 07, 2013 at 08:03:26PM -0700, Joshua Wiley wrote:
> I still argue for na.rm=FALSE, but that is cute, also substantially faster
>
> f1 <- function(x1, x2, x3) do.call(paste0, list(x1, x2, x3))
> f2 <- function(x1, x2, x3) pmax(3*x3, 2*x2, es, 0, na.rm=FALSE)
> f3 <- function(x1, x2, x3) Reduce(`+`, list(x1, x2, x3))
> f4 <- function(x1, x2, x3) rowSums(cbind(x1, x2, x3))
>
> es <- rep(c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA), 1000)
> hs <- rep(c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA), 1000)
> cg <- rep(c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA), 1000)
>
> system.time(replicate(1000, f1(cg, hs, es)))
> system.time(replicate(1000, f2(cg, hs, es)))
> system.time(replicate(1000, f3(cg, hs, es)))
> system.time(replicate(1000, f4(cg, hs, es)))
>
> > system.time(replicate(1000, f1(cg, hs, es)))
> user system elapsed
> 22.73 0.03 22.76
> > system.time(replicate(1000, f2(cg, hs, es)))
> user system elapsed
> 0.92 0.04 0.95
> > system.time(replicate(1000, f3(cg, hs, es)))
> user system elapsed
> 0.19 0.02 0.20
> > system.time(replicate(1000, f4(cg, hs, es)))
> user system elapsed
> 0.95 0.03 0.98
>
>
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
>
>
>
> On Fri, Jun 7, 2013 at 7:25 PM, Neal Fultz <nfultz at gmail.com> wrote:
> > I would do this to get the highest non-missing level:
> >
> > x <- pmax(3*cg, 2*hs, es, 0, na.rm=TRUE)
> >
> > rock chalk...
> >
> > -nfultz
> >
> > On Fri, Jun 07, 2013 at 06:24:50PM -0700, Joshua Wiley wrote:
> >> Hi Paul,
> >>
> >> Unless you have truly offended the data generating oracle*, the
> >> pattern: NA, 1, NA, should be a data entry error --- graduating HS
> >> implies graduating ES, no? I would argue fringe cases like that
> >> should be corrected in the data, not through coding work arounds.
> >> Then you can just do:
> >>
> >> x <- do.call(paste0, list(es, hs, cg))
> >>
> >> > table(factor(x, levels = c("000", "100", "110", "111"), labels = c("none", "es","hs", "cg")))
> >> none es hs cg
> >> 4 1 1 2
> >>
> >> Cheers,
> >>
> >> Josh
> >>
> >> *Drawn from comments by Judea Pearl one lively session.
> >>
> >>
> >> On Fri, Jun 7, 2013 at 6:13 PM, Paul Johnson <pauljohn32 at gmail.com> wrote:
> >> > In our Summer Stats Institute, I was asked a question that amounts to
> >> > reversing the effect of the contrasts function (reconstruct an ordinal
> >> > predictor from a set of binary columns). The best I could think of was to
> >> > link together several ifelse functions, and I don't think I want to do this
> >> > if the example became any more complicated.
> >> >
> >> > I'm unable to remember a less error prone method :). But I expect you might.
> >> >
> >> > Here's my working example code
> >> >
> >> > ## Paul Johnson <pauljohn at ku.edu>
> >> > ## 2013-06-07
> >> >
> >> > ## We need to create an ordinal factor from these indicators
> >> > ## completed elementary school
> >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1)
> >> > ## completed high school
> >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0)
> >> > ## completed college graduate
> >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0)
> >> >
> >> > ed <- ifelse(cg == 1, 3,
> >> > ifelse(hs == 1, 2,
> >> > ifelse(es == 1, 1, 0)))
> >> >
> >> > edf <- factor(ed, levels = 0:3, labels = c("none", "es", "hs", "cg"))
> >> > data.frame(es, hs, cg, ed, edf)
> >> >
> >> > ## Looks OK, but what if there are missings?
> >> > es <- c(0, 0, 1, 0, 1, 0, 1, 1, NA, NA)
> >> > hs <- c(0, 0, 1, 0, 1, 0, 1, 0, 1, NA)
> >> > cg <- c(0, 0, 0, 0, 1, 0, 1, 0, NA, NA)
> >> > ed <- ifelse(cg == 1, 3,
> >> > ifelse(hs == 1, 2,
> >> > ifelse(es == 1, 1, 0)))
> >> > cbind(es, hs, cg, ed)
> >> >
> >> > ## That's bad, ifelse returns NA too frequently.
> >> > ## Revise (becoming tedious!)
> >> >
> >> > ed <- ifelse(!is.na(cg) & cg == 1, 3,
> >> > ifelse(!is.na(hs) & hs == 1, 2,
> >> > ifelse(!is.na(es) & es == 1, 1,
> >> > ifelse(is.na(es), NA, 0))))
> >> > cbind(es, hs, cg, ed)
> >> >
> >> >
> >> > ## Does the project director want us to worry about
> >> > ## logical inconsistencies, such as es = 0 but cg = 1?
> >> > ## I hope not.
> >> >
> >> > Thanks in advance, I hope you are having a nice summer.
> >> >
> >> > pj
> >> >
> >> > --
> >> > Paul E. Johnson
> >> > Professor, Political Science Assoc. Director
> >> > 1541 Lilac Lane, Room 504 Center for Research Methods
> >> > University of Kansas University of Kansas
> >> > http://pj.freefaculty.org http://quant.ku.edu
> >> >
> >> > [[alternative HTML version deleted]]
> >> >
> >> > ______________________________________________
> >> > R-help at r-project.org mailing list
> >> > https://stat.ethz.ch/mailman/listinfo/r-help
> >> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> > and provide commented, minimal, self-contained, reproducible code.
> >>
> >>
> >>
> >> --
> >> Joshua Wiley
> >> Ph.D. Student, Health Psychology
> >> University of California, Los Angeles
> >> http://joshuawiley.com/
> >> Senior Analyst - Elkhart Group Ltd.
> >> http://elkhartgroup.com
> >>
> >> ______________________________________________
> >> R-help at r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-help
> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> >> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Joshua Wiley
> Ph.D. Student, Health Psychology
> University of California, Los Angeles
> http://joshuawiley.com/
> Senior Analyst - Elkhart Group Ltd.
> http://elkhartgroup.com
More information about the R-help
mailing list