[R] extracting data from unstructured (text?) file

jim holtman jholtman at gmail.com
Sun Mar 11 20:35:01 CET 2012


Can you at least provide a subset of 2 files so we can see how the
data is really stored in the file and what the separators are between
the 'columns' of data.  Also how do you determine where the data
actually starts for the rows that you want to pull off.  This will aid
in determining how to parse the data.

On Sun, Mar 11, 2012 at 3:07 PM, frauke <fhoss at andrew.cmu.edu> wrote:
> Dear R community,
>
> I have the following problem I hoped you could help me with.
>
> My data is save in thousand of files with a weird extension containing for
> numbers and a z. For example *.1405z. With list.files I managed to load this
> data into R. It looks like this (the row numbers are not in the original
> file):
>
> 35                             :LATEST STAGE     3.60 FT AT 730 AM CST ON
> 0102
> 36                          .ER ARCT2    0102 C
> DC200001020813/DH12/HGIFF/DIH6
> 37                   :QPF FORECAST        6AM       NOON        6PM
> MDNT
> 38                   .E1 :0102:              /       3.5/       3.4/
> 3.5
> 39                   .E2 :0103:   /       3.5/       3.0/       2.5/
> 2.1
> 40                   .E3 :0104:   /       1.8/       1.5/       1.3/
> 1.2
> 41                   .E4 :0105:   /       1.2/       1.8/       2.3/
> 2.7
> 42                   .E5 :0106:   /       3.0/       3.0/       3.1/
> 3.3
> 43                                                    .E6 :0107:   /
> 3.4
>
> I need the table in rows 37 to 43 in a matrix, for example:
> 0201     NA    3.5    3.4    3.5
> 0103     3.5    3.0    2.5     2.1
> 0104     1.8    1.5    1.3    1.2
> 0105    1.2     1.8    2.3    2.7
> 0106     3.0    3.0    3.1    3.3
> 0107     3.4    NA    NA   NA
>
>  Unfortunately the row numbers vary per file.  I can call up each line with
> file[40,1] for line 40 for example. It returns:
> [1] .E3 :0104:   /       1.8/       1.5/       1.3/       1.2
> 38 Levels: .E1 :0102:              /       3.5/       3.4/       3.5 ...
>
>  So I have two problems really:
> 1. How do I detect the table in the file (resp. the line where the table
> starts)?
> 2. How do I break up each line to write the values into a matrix?
>
> Feel free to suggest an entirely different approach if you think that is
> helpful.
>
> Thanks a lot! Frauke
>
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4464423.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.



More information about the R-help mailing list