[R] regex challenge

William Dunlap wdunlap at tibco.com
Fri Aug 16 17:52:20 CEST 2013


The following makes the name converter function an argument to ff (and restores the colon operator to the list of formula operators), but I'm not sure what you need the converter to do.

ff <- function(expr, convertName = function(name)paste0(toupper(name), "z")) {
    if (is.call(expr) && is.name(expr[[1]]) && is.element(as.character(expr[[1]]), c("~","+","-","*","/","%in%","(", ":"))) {
        for(i in seq_along(expr)[-1]) {
            expr[[i]] <- Recall(expr[[i]], convertName = convertName)
        }
    } else if (is.name(expr)) {
        expr <- as.name(convertName(expr))
    }
    expr
}

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Frank Harrell
> Sent: Thursday, August 15, 2013 7:47 PM
> To: RHELP
> Subject: Re: [R] regex challenge
> 
> Bill that is very impresive.  The only problem I'm having is that I want
> the paste0(toupper(...)) to be a general function that returns a
> character string that is a legal part of a formula object that can't be
> converted to a 'name'.
> 
> Frank
> 
> 
> -------------------------------
> Oops, I left "(" out of the list of operators.
> 
> 
> ff <- function(expr) {
>      if (is.call(expr) && is.name(expr[[1]]) &&
>           is.element(as.character(expr[[1]]),
> c("~","+","-","*","/","%in%","("))) {
>          for(i in seq_along(expr)[-1]) {
>              expr[[i]] <- Recall(expr[[i]])
>          }
>      } else if (is.name(expr)) {
>          expr <- as.name(paste0(toupper(as.character(expr)), "z"))
>      }
>      expr
> }
> 
>  > ff(a)
> CATz + (AGEz + Heading("Females") * (sex == "Female") * SBPz) *
>      Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
>      COUNTRYz * Heading() * SEXz
> 
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
> 
> 
>  > -----Original Message-----
>  > From: [hidden email] [mailto:[hidden email]] On Behalf
>  > Of William Dunlap
>  > Sent: Thursday, August 15, 2013 6:03 PM
>  > To: Frank Harrell; RHELP
>  > Subject: Re: [R] regex challenge
>  >
>  > Try this one
>  >
>  > ff <- function (expr)
>  > {
>  >     if (is.call(expr) && is.name(expr[[1]]) &&
>  >          is.element(as.character(expr[[1]]),  c("~", "+", "-", "*",
> "/", ":", "%in%"))) {
>  >         # the above list should cover the standard formula operators.
>  >         for (i in seq_along(expr)[-1]) {
>  >             expr[[i]] <- Recall(expr[[i]])
>  >         }
>  >     }
>  >     else if (is.name(expr)) {
>  >        # the conversion itself
>  >         expr <- as.name(paste0(toupper(as.character(expr)), "z"))
>  >     }
>  >     expr
>  > }
>  >
>  > > ff(a)
>  > CATz + (age + Heading("Females") * (sex == "Female") * sbp) *
>  >     Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() *
>  >     COUNTRYz * Heading() * SEXz
>  >
>  > Bill Dunlap
>  > Spotfire, TIBCO Software
>  > wdunlap tibco.com
>  >
>  >
>  > > -----Original Message-----
>  > > From: [hidden email] [mailto:[hidden email]] On Behalf
>  > > Of Frank Harrell
>  > > Sent: Thursday, August 15, 2013 4:45 PM
>  > > To: RHELP
>  > > Subject: Re: [R] regex challenge
>  > >
>  > > I really appreciate the excellent ideas from Bill Dunlap and Greg
> Snow.
>  > >   Both suggestions almost work perfectly.  Greg's recognizes
> expressions
>  > > such as sex=='female' but not ones such as age > 21, age < 21, a - b >
>  > > 0, and possibly other legal R expressions.  Bill's idea is similar to
>  > > what Duncan Murdoch suggested to me.  Bill's doesn't catch the case
> when
>  > > a variable appears both in an expression and as a regular variable
> (sex
>  > > in the example below):
>  > >
>  > > f <- function(formula) {
>  > >    trms <- terms(formula)
>  > >    variables <- as.list(attr(trms, "variables"))[-1]
>  > >    ## the 'variables' attribute is stored as a call to list(),
>  > >    ## so we changed the call to a list and removed the first element
>  > >    ## to get the variables themselves.
>  > >    if (attr(trms, "response") == 1) {
>  > >      ## terms does not pull apart right hand side of formula,
>  > >      ## so we assume each non-function is to be renamed.
>  > >      responseVars <- lapply(all.vars(variables[[1]]), as.name)
>  > >      variables <- variables[-1]
>  > >    } else {
>  > >      responseVars <- list()
>  > >    }
>  > >    ## omit non-name variables from list of ones to change.
>  > >    ## This is where you could expand calls to certain functions.
>  > >    variables <- variables[vapply(variables, is.name, TRUE)]
>  > >    variables <- c(responseVars, variables) # all are names now
>  > >    names(variables) <- vapply(variables, as.character, "")
>  > >    newVars <- lapply(variables, function(v) as.name(paste0(toupper(v),
>  > > "z")))
>  > >    formula(do.call("substitute", list(formula, newVars)),
>  > > env=environment(formula))
>  > > }
>  > >
>  > > a <- cat + (age + Heading("Females") * (sex == "Female") * sbp) *
>  > >      Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
>  > >      country * Heading() * sex
>  > > f(a)
>  > >
>  > > Output:
>  > >
>  > > CATz + (AGEz + Heading("Females") * (SEXz == "Female") * SBPz) *
>  > >      Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
>  > >      COUNTRYz * Heading() * SEXz
>  > >
>  > > The method also doesn't work if I replace sex == 'Female' with x3 > 4,
>  > > converting to X3z > 4.  I'm not clear on how to code what kind of
>  > > expressions to ignore.
>  > >
>  > > Thanks!
>  > > Frank
>  > >
>  > > ______________________________________________
>  > > [hidden email] mailing list
>  > > https://stat.ethz.ch/mailman/listinfo/r-help
>  > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>  > > and provide commented, minimal, self-contained, reproducible code.
>  >
>  > ______________________________________________
>  > [hidden email] mailing list
>  > https://stat.ethz.ch/mailman/listinfo/r-help
>  > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>  > and provide commented, minimal, self-contained, reproducible code.
> ... [show rest of quote]
> 
> --
> Frank E Harrell Jr Professor and Chairman      School of Medicine
>                     Department of Biostatistics Vanderbilt University
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



More information about the R-help mailing list