[R] regex challenge
William Dunlap
wdunlap at tibco.com
Thu Aug 15 16:48:13 CEST 2013
I think substitute() or bquote() will do a better job here than gsub() be
they work on the parsed formula rather than on the raw string. The
terms() function will interpret the formula-specific operators like "+"
and ":" to come up with a list of the 'variables' (or 'terms') in the formula
E.g., with the 'f' given below we get
> f(y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i)
Y1z + Y2z ~ Az * (Bz + Cz) + Dz + Fz * (h == 3) + (sex == "male") * Iz
Is that what you wanted?
If you only wanted to keep intact the expressions of the form
var==value
(calls to `==`) but transform things like log(a) to log(Az) you
could extend this code to do that as well.
f <- function(formula) {
trms <- terms(formula)
variables <- as.list(attr(trms, "variables"))[-1]
# the 'variables' attribute is stored as a call to list(),
# so we changed the call to a list and removed the first element
# to get the variables themselves.
if (attr(trms, "response") == 1) {
# terms does not pull apart right hand side of formula,
# so we assume each non-function is to be renamed.
responseVars <- lapply(all.vars(variables[[1]]), as.name)
variables <- variables[-1]
} else {
responseVars <- list()
}
# omit non-name variables from list of ones to change.
# This is where you could expand calls to certain functions.
variables <- variables[vapply(variables, is.name, TRUE)]
variables <- c(responseVars, variables) # all are names now
names(variables) <- vapply(variables, as.character, "")
newVars <- lapply(variables, function(v) as.name(paste0(toupper(v), "z")))
formula(do.call("substitute", list(formula, newVars)), env=environment(formula))
}
Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
> -----Original Message-----
> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf
> Of Frank Harrell
> Sent: Wednesday, August 14, 2013 8:14 PM
> To: RHELP
> Subject: [R] regex challenge
>
> I would like to be able to use gsub or gsubfn to process a formula and
> to translate the variables but to ignore expressions in the formula.
> Supposing that the R formula has already been transformed into a
> character string and that the transformation is to convert variable
> names to upper case and to append z to the names, an example would be to
> convert y1 + y2 ~ a*(b + c) + d + f * (h == 3) + (sex == 'male')*i to
> Y1z + Y2z ~ Az*(Bz + Cz) + Dz + Fz * (h == 3) + (sex == 'male')*Iz. Any
> expression that is not just a simple variable name would be left alone.
>
> Does anyone want to try their hand at creating a regex that would
> accomplish this?
>
> Thanks
> Frank
> --
> Frank E Harrell Jr Professor and Chairman School of Medicine
> Department of Biostatistics Vanderbilt University
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list