[R] extracting data from unstructured (text?) file

frauke fhoss at andrew.cmu.edu
Sun Mar 11 20:07:28 CET 2012


Dear R community, 

I have the following problem I hoped you could help me with. 

My data is save in thousand of files with a weird extension containing for
numbers and a z. For example *.1405z. With list.files I managed to load this
data into R. It looks like this (the row numbers are not in the original
file):

35                             :LATEST STAGE     3.60 FT AT 730 AM CST ON
0102
36                          .ER ARCT2    0102 C
DC200001020813/DH12/HGIFF/DIH6
37                   :QPF FORECAST        6AM       NOON        6PM      
MDNT
38                   .E1 :0102:              /       3.5/       3.4/      
3.5
39                   .E2 :0103:   /       3.5/       3.0/       2.5/      
2.1
40                   .E3 :0104:   /       1.8/       1.5/       1.3/      
1.2
41                   .E4 :0105:   /       1.2/       1.8/       2.3/      
2.7
42                   .E5 :0106:   /       3.0/       3.0/       3.1/      
3.3
43                                                    .E6 :0107:   /      
3.4

I need the table in rows 37 to 43 in a matrix, for example:
0201     NA    3.5    3.4    3.5
0103     3.5    3.0    2.5     2.1
0104     1.8    1.5    1.3    1.2
0105    1.2     1.8    2.3    2.7
0106     3.0    3.0    3.1    3.3
0107     3.4    NA    NA   NA

 Unfortunately the row numbers vary per file.  I can call up each line with
file[40,1] for line 40 for example. It returns:
[1] .E3 :0104:   /       1.8/       1.5/       1.3/       1.2
38 Levels: .E1 :0102:              /       3.5/       3.4/       3.5 ...

 So I have two problems really:
1. How do I detect the table in the file (resp. the line where the table
starts)?
2. How do I break up each line to write the values into a matrix?

Feel free to suggest an entirely different approach if you think that is
helpful. 

Thanks a lot! Frauke



--
View this message in context: http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list