[R] extracting quoted text from character string

Gabor Grothendieck ggrothendieck at myway.com
Tue Oct 14 04:00:02 CEST 2003



The first line in the body of the function splits the input, s, 
using the separator and makes it a list.

The second line and third lines define a regexp which matches 
leading and trailing whitespace and define a logical vector which
selects the odd positioned elements.

Since we know that the odd positioned elements are not between
quotes, in the fourth line we remove any leading and 
trailing whitespace from them and split them on whitespace.

In the fifth line we convert s from list to vector.

test <- function(s,sep="'") {
   s <- lapply(strsplit(s,sep)[[1]],c)
   re <- "^[[:space:]]+|[[:space:]]+$"
   odd <- seq(along=s)%%2 == 1
   s[odd] <- strsplit( gsub(re,"",unlist(s)[odd]), "[[:space:]]+" )
   unlist(s)
}

test(line)
test(bad.line)



---
From: Corey Moffet <cmoffet at nwrc.ars.usda.gov>
 
Hello all,

I am trying to solve a problem, and my solution is rather ugly and not very
general. The posts for "[R] help with gsub and grep functions" seemed
relevent
and gave me hope for a more refined and more general solution.

The Problem:

line <- "'this text has spaces' 'thisNot' 3 4 5 6 7 8 9 10"
bad.line <- "'this text has spaces' thisNot 3 4 5 6 7 8 9 10"

The desired result of a process on 'line' or "bad.line":

> parts <- some.function(line)

> parts
[1] "this text has spaces"
[2] "thisNot"
[3] "3"
[4] "4"
[5] "5"
[6] "6"
[7] "7"
[8] "8"
[9] "9"
[10] "10"

Current function to obtain a solution for "line" but not "bad.line":

"some.function" <- function(line, quote.char = "'") {
quoted <- unlist(strsplit(line, quote.char))
quoted <- quoted[quoted != ""]
first <- quoted[1]
second <- quoted[3]
last <- quoted[4]
last.parts <-unlist(strsplit(last, " "))
last.parts <- last.parts[last.parts != ""]
out <- c(first, second, last.parts)
return(out)
}

This solution is not very good because the text parts of "line" are not 
required to be enclosed in quotations unless it has a space. All the files
I currently have to process have the first two pieces enclose in "'". But
it is future files that I worry about. Is there an existing function that
I have overlooked that splits strings, ignoring the delimiter when it is
enclosed in quotes? I know that I can do some testing on the length of
"quoted" in function "some.function" but it seems there should be a more
elegent way of doing this type of thing. Any suggestions?

With best wishes and kind regards I am



_______________________________________________
No banners. No pop-ups. No kidding.
Introducing My Way - http://www.myway.com




More information about the R-help mailing list