[R] extracting data from unstructured (text?) file
frauke
fhoss at andrew.cmu.edu
Sun Mar 11 20:07:28 CET 2012
Dear R community,
I have the following problem I hoped you could help me with.
My data is save in thousand of files with a weird extension containing for
numbers and a z. For example *.1405z. With list.files I managed to load this
data into R. It looks like this (the row numbers are not in the original
file):
35 :LATEST STAGE 3.60 FT AT 730 AM CST ON
0102
36 .ER ARCT2 0102 C
DC200001020813/DH12/HGIFF/DIH6
37 :QPF FORECAST 6AM NOON 6PM
MDNT
38 .E1 :0102: / 3.5/ 3.4/
3.5
39 .E2 :0103: / 3.5/ 3.0/ 2.5/
2.1
40 .E3 :0104: / 1.8/ 1.5/ 1.3/
1.2
41 .E4 :0105: / 1.2/ 1.8/ 2.3/
2.7
42 .E5 :0106: / 3.0/ 3.0/ 3.1/
3.3
43 .E6 :0107: /
3.4
I need the table in rows 37 to 43 in a matrix, for example:
0201 NA 3.5 3.4 3.5
0103 3.5 3.0 2.5 2.1
0104 1.8 1.5 1.3 1.2
0105 1.2 1.8 2.3 2.7
0106 3.0 3.0 3.1 3.3
0107 3.4 NA NA NA
Unfortunately the row numbers vary per file. I can call up each line with
file[40,1] for line 40 for example. It returns:
[1] .E3 :0104: / 1.8/ 1.5/ 1.3/ 1.2
38 Levels: .E1 :0102: / 3.5/ 3.4/ 3.5 ...
So I have two problems really:
1. How do I detect the table in the file (resp. the line where the table
starts)?
2. How do I break up each line to write the values into a matrix?
Feel free to suggest an entirely different approach if you think that is
helpful.
Thanks a lot! Frauke
--
View this message in context: http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list