[R] regexpr and parsing question
Kimpel, Mark William
mkimpel at iupui.edu
Tue Jan 30 23:23:45 CET 2007
The main problem I am trying to solve it this:
I am importing a tab delimited file whose first line contains only one
column, which is a descriptor of the form "col_1 col_2 col_3", i.e. the
colnames are not tab delineated but are separated by whitespace. I would
like to parse this first line and make such that it becomes the colnames
of the rest of the file, which I am reading into R using read.delim().
The file is so huge that I must do this in R.
My first question is this: What is the best way to accomplish what I
want to do?
My other questions revolve around some failed attempts on my part to
solve the problem on my own using regular expressions. I thought that
perhaps I could change the first line to "c("col_1", "col_2", "col_3")
using gsub. I was having trouble figuring out how R uses the backslash
character because I know that sometimes the backslash one would use in
Perl needs to be a double backslash in R.
Here is a sample of what I tried and what I got:
a<-"col_1 col_2 col_3"
> gsub("\\s", " " , a)
[1] "col_1 col_2 col_3"
> gsub("\\s", "\\s" , a)
[1] "col_1scol_2scol_3"
As you can see, it looks like R is taking a regular expression for
"pattern", but not taking it for "replacement". Why is this?
Assuming that I did want to solve my original problem with gsub and then
turn the string into an R object, how would I get gsub to return
"c("col_1", "col_2", "col_3") using my original string?
Finally, is there a way to declare a string as a regular expression so
that R sees it the same way other languages, such as Perl do, i.e. make
the backslash be interpreted the same way? For someone who is just
learning regular expressions as I am, it is very frustrating to read
about them in references and then have to translate what I've learned
into R syntax. I was thinking that instead of enclosing the string in
"", one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
use I() in formulae.
These are a bunch of questions, but obviously I have a lot to learn!
Thanks,
Mark
Mark W. Kimpel MD
(317) 490-5129 Work, & Mobile
(317) 663-0513 Home (no voice mail please)
1-(317)-536-2730 FAX
More information about the R-help
mailing list