[R] regular expressions, sub

Gabor Grothendieck ggrothendieck at gmail.com
Fri Jan 27 20:49:49 CET 2006


In this post:

	http://finzi.psych.upenn.edu/R/Rhelp02a/archive/30590.html

Thomas Lumley provided a function to traverse a formula recursively.
We can modify it as shown to transform ln(m)^n to ln^n(m) producing
proc2.  We then bundle everything up into proc3 which uses substitute
to translate log to ln and remove (, the calls proc2 to do the aforementioned
substitute and finally we use simple character processing to clean up the
rest.

Although this is substantially longer in terms of lines of code
we did not have to write many of them because proc2 is actually
just a modification of the code in the indicated post and the
character processing becomes extremely simple.  Also its more
powerful able to handle expressions like:

	log(D) ~ log(log(N)^2)^3




proc2 <-function(formula){
	process<-function(expr){
	    if (length(expr)==1)
	      return(expr)
	   if(length(expr)==2) {
	      expr[[2]] <- process(expr[[2]])
	      return(expr)
	   }
	   if ( expr[[1]]==as.name("^") && length(expr[[2]])==2 &&
		      expr[[2]][[1]] == as.name("ln") &&
		      class(idx <- expr[[3]]) == "numeric") {
		expr <- as.call(list(as.name(paste("ln",idx,sep = "^")),
		   expr[[2]][[2]]))
		expr[[2]] <- process(expr[[2]])
		return(expr)
	   }
	   expr[[2]]<-process(expr[[2]])
	   expr[[3]]<-process(expr[[3]])
	   return(expr)
	  }
   formula[[3]]<-process(formula[[3]])
   formula
}

proc3 <- function(f) {

	# replace log with ln
	result <- do.call("substitute", list(f, list(log = as.name("ln"))))

	# remove I
	result <- do.call("substitute", list(result, list(I = as.name("("))))

	# transform ln(m)^n to ln^n(m)
	result <- proc2(result)

	# now clean up using simple character substitutions
	result <- deparse(result)

	# ( -> space
	result <- gsub("[(]", " ", result)

	# remove " and )
	gsub("[\")]", "", result)
}

# tests

proc3( log(D) ~ log(N)+I(log(N)^2)+log(t) )     # "ln D ~ ln N +  ln^2 N + ln t"

proc3( log(D) ~ log(log(N)^2)^3)       # "ln D ~ ln^3 ln^2 N"



On 1/27/06, Christian Hoffmann <christian.hoffmann at wsl.ch> wrote:
> Hi,
>
> I am trying to use sub, regexpr on expressions like
>
>    log(D) ~ log(N)+I(log(N)^2)+log(t)
>
> being a model specification.
>
> The aim is to produce:
>
>    "ln D ~ ln N + ln^2 N + ln t"
>
> The variable names N, t may change, the number of terms too.
>
> I succeded only partially, help on regular expressions is hard to
> understand for me, examples on my case are rare. The help page on R-help
> for grep etc. and "regular expressions"
>
> What I am doing:
>
> (f <- log(D) ~ log(N)+I(log(N)^2)+log(t))
> (ft <- sub("","",f))   # creates string with parts of formula, how to do
> it simpler?
> (fu <- paste(ft[c(2,1,3)],collapse=" "))  # converts to one string
>
> Then I want to use \1 for backreferences something like
>
> (fv <- sub("log( [:alpha:] N  )^ [:alpha:)","ln \\1^\\2",fu))
>
> to change "log(g)^7" to "ln^7 g",
>
> and to eliminate I(): sub("I(blabla)","\\1",fv)  # I(xxx) -> xxx
>
> The special characters are making trouble, sub acceps "(", ")" only in
> pairs. Code for experimentation:
>
> trysub <- function(s,t,e) {
> ii<-0; for (i1 in c(TRUE,FALSE)) for (i2 in c(TRUE,FALSE)) for (i3 in
> c(TRUE,FALSE)) for (i4 in c(TRUE,FALSE))
> print(paste(ii<-ii+1,ifelse(i1,"  "," ~"),"ext",ifelse(i2,"  ","
> ~"),"perl",ifelse(i3,"  "," ~"),"fixed ",ifelse(i4,"  "," ~"),"useBytes:
> ", try(sub(s,t,e, extended=i1, perl=i2, fixed=i3,
> useBytes=i4)),sep=""));invisible(0) }
>
> trysub("I(log(N)^2)","ln n^2",fu) # A: desired result for cases
> 5,6,13..16, the rest unsubstituted
>
> trysub("log(","ln ",fu)           # B: no substitutions; errors for
> cases 1..4,7.. 12   # typical errors:
> "3  ext  perl ~fixed   useBytes: Error in sub.perl(pattern, replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression 'log('\n"
>
> trysub("log\(","ln ",fu)          # C: same as A
>
> trysub("log\\(","ln ",fu)         # D: no substitutions; errors for
> cases 15,16        # typical errors:
> "15 ~ext ~perl ~fixed   useBytes: Error in sub(pattern, replacement, x,
> ignore.case, extended, fixed, useBytes) : \n\tinvalid regular expression
> 'log\\('\n"
>
> trysub("log\\(([:alpha:]+)\\)","ln \1",fu) # no substitutions, no errors
> # E: typical errors:
> "3  ext  perl ~fixed   useBytes: Error in sub.perl(pattern, replacement,
> x, ignore.case, useBytes) : \n\tinvalid regular expression
> 'log\\(([:alpha:]+)\\)'\n"
>
>
>
> Thanks for help
> Christian
>
> PS. The explanations in the documents
> --
> Dr. Christian W. Hoffmann,
> Swiss Federal Research Institute WSL
> Mathematics + Statistical Computing
> Zuercherstrasse 111
> CH-8903 Birmensdorf, Switzerland
>
> Tel +41-44-7392-277  (office)   -111(exchange)
> Fax +41-44-7392-215  (fax)
> christian.hoffmann at wsl.ch
> http://www.wsl.ch/staff/christian.hoffmann
>
> International Conference 5.-7.6.2006 Ekaterinburg Russia
> "Climate changes and their impact on boreal and temperate forests"
> http://ecoinf.uran.ru/conference/
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>




More information about the R-help mailing list