[R] Parsing
Paolo Sonego
paolo.sonego at gmail.com
Wed Jul 9 11:33:28 CEST 2008
Dear R users,
I have a big text file formatted like this:
x x_string
y y_string
id1 id1_string
id2 id2_string
z z_string
w w_string
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x x_string1
y y_string1
z z_string1
w w_string1
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
x x_string2
y y_string2
id1 id1_string1
id2 id2_string1
z z_string2
w w_string2
stuff stuff stuff
stuff stuff stuff
stuff stuff stuff
//
...
...
I'd like to parse this file and retrieve the x, y, id1, id2, z, w fields
and save them into a a matrix object:
x y id1 id2 z w
x_string y_string id1_string id2_string z_string w_string
x_string1 y_string1 NA NA z_string1 w_string1
x_string2 y_string2 id1_string1 id2_string1 z_string2 w_string2
...
...
id1, id2 fields are not always present within a section (the interval
between x and the last stuff) and
I'd like to insert a NA when they are absent (see above) so that
length(x)==length(y)==length(id1)==... .
Without the id1, id2 fields the task is easily solvable importing the
text file with readLines and retrieving the single fields with grep:
input = readLines("file.txt")
x = grep("^x\\s", input, value = T)
id1 = grep("^id1\\s", input, value = T)
...
I'd like to accomplish this task entirely in R (no SQL, no perl
script), possibly without using loops.
Any suggestions are quite welcome!
Regards,
Paolo
More information about the R-help
mailing list