[R] Search and extract string function

Marc Schwartz marc_schwartz at me.com
Thu Jul 15 17:42:50 CEST 2010


On Jul 15, 2010, at 9:48 AM, AndrewPage wrote:

> 
> Hi all,
> 
> I'm trying to write a function that will search and extract from a long
> character string, but with a twist: I want to use the characters before and
> the characters after what I want to extract as reference points.  For
> example, say I'm working with data entries that looks like this:
> 
> Drink=Coffee:Location=Office:Time=Morning:Market=Flat
> 
> Drink=Water:Location=Office:Time=Afternoon:Market=Up
> 
> Drink=Water:Location=Gym:Time=Evening:Market=Closed
> 
> Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed
> 
> 
> ...
> 
> For my function, I'd like to find what's located between "Location=", and
> ":Time=" in every instance, and extract it, to return something like
> "Office, Office, Gym, Restaurant".
> 
> In a previous discussion I found
> (http://tolstoy.newcastle.edu.au/R/help/05/03/0344.html), someone wrote a
> function where you could find and substitute characters in a string, based
> on "pre" and "post" variables:
> 
> interp <- function(x, e = parent.frame(), pre = "\\$", post = "" ) {
> 	for(el in ls(e)) {
> 		tag <- paste(pre, el, post, sep = "") 
> 		if (length(grep(tag, x))) x <- gsub(tag, eval(parse(text = el), e), x)
> 		}
> 	x
> }
> 
> I'm not sure how to modify it, however, to do what I want it to do.  Any
> suggestions?
> 
> Thanks in advance,
> 
> Andrew


> Vec
[1] "Drink=Coffee:Location=Office:Time=Morning:Market=Flat"        
[2] "Drink=Water:Location=Office:Time=Afternoon:Market=Up"         
[3] "Drink=Water:Location=Gym:Time=Evening:Market=Closed"          
[4] "Drink=Wine:Location=Restaurant:Time=LateEvening:Market=Closed"


> gsub(".*Location=(.+):Time=.*", "\\1", Vec)
[1] "Office"     "Office"     "Gym"        "Restaurant"


This returns the back reference within the parens, found between the two bounding sets of characters.

HTH,

Marc Schwartz



More information about the R-help mailing list