[R] regex challenge
Frank Harrell
f.harrell at Vanderbilt.Edu
Fri Aug 16 19:06:57 CEST 2013
Thanks Bill. The problem is one of the results of convertName might be
'Heading("Age in Years")*age' (this is for the tables package), and
as.name converts this to `Heading("...")*age` and the backticks cause
the final formula to have a mixture of regular elements and ` ` quoted
expression elements, making the formula invalid.
Best,
Frank
-------------------------------------------------------------------
The following makes the name converter function an argument to ff (and
restores the colon operator to the list of formula operators), but I'm
not sure what you need the converter to do.
ff <- function(expr, convertName = function(name)paste0(toupper(name),
"z")) {
if (is.call(expr) && is.name(expr[[1]]) &&
is.element(as.character(expr[[1]]), c("~","+","-","*","/","%in%","(",
":"))) {
for(i in seq_along(expr)[-1]) {
expr[[i]] <- Recall(expr[[i]], convertName = convertName)
}
} else if (is.name(expr)) {
expr <- as.name(convertName(expr))
}
expr
}
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf
> Of Frank Harrell
> Sent: Thursday, August 15, 2013 7:47 PM
> To: RHELP
> Subject: Re: [R] regex challenge
>
> Bill that is very impresive. The only problem I'm having is that I want
> the paste0(toupper(...)) to be a general function that returns a
> character string that is a legal part of a formula object that can't be
> converted to a 'name'.
>
> Frank
>
>
> -------------------------------
> Oops, I left "(" out of the list of operators.
>
>
> ff <- function(expr) {
> if (is.call(expr) && is.name(expr[[1]]) &&
> is.element(as.character(expr[[1]]),
> c("~","+","-","*","/","%in%","("))) {
> for(i in seq_along(expr)[-1]) {
> expr[[i]] <- Recall(expr[[i]])
> }
> } else if (is.name(expr)) {
> expr <- as.name(paste0(toupper(as.character(expr)), "z"))
> }
> expr
> }
>
> > ff(a)
> CATz + (AGEz + Heading("Females") * (sex == "Female") * SBPz) *
> Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~ Heading() *
> COUNTRYz * Heading() * SEXz
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: [hidden email] [mailto:[hidden email]] On Behalf
> > Of William Dunlap
> > Sent: Thursday, August 15, 2013 6:03 PM
> > To: Frank Harrell; RHELP
> > Subject: Re: [R] regex challenge
> >
> > Try this one
> >
> > ff <- function (expr)
> > {
> > if (is.call(expr) && is.name(expr[[1]]) &&
> > is.element(as.character(expr[[1]]), c("~", "+", "-", "*",
> "/", ":", "%in%"))) {
> > # the above list should cover the standard formula operators.
> > for (i in seq_along(expr)[-1]) {
> > expr[[i]] <- Recall(expr[[i]])
> > }
> > }
> > else if (is.name(expr)) {
> > # the conversion itself
> > expr <- as.name(paste0(toupper(as.character(expr)), "z"))
> > }
> > expr
> > }
> >
> > > ff(a)
> > CATz + (age + Heading("Females") * (sex == "Female") * sbp) *
> > Heading() * Gz + (age + sbp) * Heading() * TRIOz ~ Heading() *
> > COUNTRYz * Heading() * SEXz
> >
> > Bill Dunlap
> > Spotfire, TIBCO Software
> > wdunlap tibco.com
> >
> >
> > > -----Original Message-----
> > > From: [hidden email] [mailto:[hidden email]] On Behalf
> > > Of Frank Harrell
> > > Sent: Thursday, August 15, 2013 4:45 PM
> > > To: RHELP
> > > Subject: Re: [R] regex challenge
> > >
> > > I really appreciate the excellent ideas from Bill Dunlap and Greg
> Snow.
> > > Both suggestions almost work perfectly. Greg's recognizes
> expressions
> > > such as sex=='female' but not ones such as age > 21, age < 21, a
- b >
> > > 0, and possibly other legal R expressions. Bill's idea is
similar to
> > > what Duncan Murdoch suggested to me. Bill's doesn't catch the case
> when
> > > a variable appears both in an expression and as a regular variable
> (sex
> > > in the example below):
> > >
> > > f <- function(formula) {
> > > trms <- terms(formula)
> > > variables <- as.list(attr(trms, "variables"))[-1]
> > > ## the 'variables' attribute is stored as a call to list(),
> > > ## so we changed the call to a list and removed the first
element
> > > ## to get the variables themselves.
> > > if (attr(trms, "response") == 1) {
> > > ## terms does not pull apart right hand side of formula,
> > > ## so we assume each non-function is to be renamed.
> > > responseVars <- lapply(all.vars(variables[[1]]), as.name)
> > > variables <- variables[-1]
> > > } else {
> > > responseVars <- list()
> > > }
> > > ## omit non-name variables from list of ones to change.
> > > ## This is where you could expand calls to certain functions.
> > > variables <- variables[vapply(variables, is.name, TRUE)]
> > > variables <- c(responseVars, variables) # all are names now
> > > names(variables) <- vapply(variables, as.character, "")
> > > newVars <- lapply(variables, function(v)
as.name(paste0(toupper(v),
> > > "z")))
> > > formula(do.call("substitute", list(formula, newVars)),
> > > env=environment(formula))
> > > }
> > >
> > > a <- cat + (age + Heading("Females") * (sex == "Female") * sbp) *
> > > Heading() * g + (age + sbp) * Heading() * trio ~ Heading() *
> > > country * Heading() * sex
> > > f(a)
> > >
> > > Output:
> > >
> > > CATz + (AGEz + Heading("Females") * (SEXz == "Female") * SBPz) *
> > > Heading() * Gz + (AGEz + SBPz) * Heading() * TRIOz ~
Heading() *
> > > COUNTRYz * Heading() * SEXz
> > >
> > > The method also doesn't work if I replace sex == 'Female' with
x3 > 4,
> > > converting to X3z > 4. I'm not clear on how to code what kind of
> > > expressions to ignore.
> > >
> > > Thanks!
> > > Frank
> > >
> > > ______________________________________________
> > > [hidden email] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> > ______________________________________________
> > [hidden email] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> ... [show rest of quote]
More information about the R-help
mailing list