[R] regexpr and parsing question

Gabor Grothendieck ggrothendieck at gmail.com
Wed Jan 31 00:16:25 CET 2007


Both spaces and tabs are whitespace so this
should be good enough (unless you can
have empty fields):

read.table("myfile.dat", header = TRUE)

See the sep= argument in ?read.table .

Although I don't think you really need this, here are
some regular expressions for processing a header
into the form you asked for.  The first line places
quotes around the names, the second one inserts
commas and the last one adds c( and ).

s <- gsub('(\\S+)', '"\\1"', 'col1 col2 col3')
s <- gsub("(\\S+) ", "\\1, ", s)
sub("(.*)", "c(\\1)", s)


On 1/30/07, Kimpel, Mark William <mkimpel at iupui.edu> wrote:
> The main problem I am trying to solve it this:
>
> I am importing a tab delimited file whose first line contains only one
> column, which is a descriptor of the form "col_1 col_2 col_3", i.e. the
> colnames are not tab delineated but are separated by whitespace. I would
> like to parse this first line and make such that it becomes the colnames
> of the rest of the file, which I am reading into R using read.delim().
> The file is so huge that I must do this in R.
>
> My first question is this: What is the best way to accomplish what I
> want to do?
>
> My other questions revolve around some failed attempts on my part to
> solve the problem on my own using regular expressions. I thought that
> perhaps I could change the first line to "c("col_1", "col_2", "col_3")
> using gsub. I was having trouble figuring out how R uses the backslash
> character because I know that sometimes the backslash one would use in
> Perl needs to be a double backslash in R.
>
> Here is a sample of what I tried and what I got:
>
> a<-"col_1 col_2 col_3"
>
> > gsub("\\s", " " , a)
>
> [1] "col_1 col_2 col_3"
>
> > gsub("\\s", "\\s" , a)
>
> [1] "col_1scol_2scol_3"
>
> As you can see, it looks like R is taking a regular expression for
> "pattern", but not taking it for "replacement". Why is this?
>
> Assuming that I did want to solve my original problem with gsub and then
> turn the string into an R object, how would I get gsub to return
> "c("col_1", "col_2", "col_3") using my original string?
>
> Finally, is there a way to declare a string as a regular expression so
> that R sees it the same way other languages, such as Perl do, i.e. make
> the backslash be interpreted the same way? For someone who is just
> learning regular expressions as I am, it is very frustrating to read
> about them in references and then have to translate what I've learned
> into R syntax. I was thinking that instead of enclosing the string in
> "", one could use THIS.IS.A.REGULAR.EXPRESSION(), similar to the way we
> use I() in formulae.
>
> These are a bunch of questions, but obviously I have a lot to learn!
>
> Thanks,
>
> Mark
>
> Mark W. Kimpel MD
>
>
>
> (317) 490-5129 Work, & Mobile
>
>
>
> (317) 663-0513 Home (no voice mail please)
>
> 1-(317)-536-2730 FAX
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



More information about the R-help mailing list