[R] extracting data from unstructured (text?) file
jim holtman
jholtman at gmail.com
Mon Mar 12 14:50:38 CET 2012
On Sun, Mar 11, 2012 at 9:21 PM, frauke <fhoss at andrew.cmu.edu> wrote:
> Wow Jim, this is much more than I expected. Thank you!!
>
> It took me a while to figure out what exactly you are doing in that code.
> But I think I understand and it definitely runs. May I ask you two follow up
> questions?
>
> First, some of my files have data from two or more cities in them. So I have
> trouble that it picks the right city. What makes it difficult is that not in
> all files will a city be called the same. Sometimes it might be "van Buren",
> other times "Arkansas River at van Buren". Sometimes the target city is the
> first in the file, other times further down. Here is an example:
> http://r.789695.n4.nabble.com/file/n4465068/sample3.txt sample3.txt .
> Additionally, some files miss the city that I am looking for.
You can add code to search fo the city name that you want before
setting the 'inData' flag. You can use regular expressions to pick
out the city's name.
>
> Second, I would like extract some more data from the files, printed in bold
> below. I thought of storing this data in an extra line appended to the main
> table or so. I do manage to extract one at a time, but of course it takes
> ages to run the process over and over again to get all the data.
>
> :ARKANSAS RIVER AT VAN BUREN
> :FLOOD STAGE * 22.0 *
> :
> :LATEST STAGE *19.25* FT AT *400 AM* CST ON *010100*
> .ER VBUA4 0101 C DC200001010823/DH12/HGIFF/DIH6
> :QPF FORECAST 6AM NOON 6PM MDNT
> .E1 :0101: / 19.3/ 19.4/ 19.4
> .E2 :0102: / 19.4/ 19.4/ 19.4/ 19.4
> .E3 :0103: / 19.4/ 19.4/ 19.4/ 19.4
> .E4 :0104: / 19.4/ 19.4/ 19.4/ 19.4
> .E5 :0105: / 19.4/ 19.4/ 19.4/ 19.4
> .E6 :0106: / 19.4
> .ER VBUA4 0101 C DC200001010823/DH12/PPQFZ/DIH6/ 0.00/0.00/0.00/0.00
> .ER VBUA4 0101 C DC200001010823/DH12/QTIFF/DIH6
> :QPF FORECAST 6AM NOON 6PM MDNT
> .E1 :0101: / 0.98/ 2.78/ 8.66
> .E2 :0102: / 9.88/ 8.70/ 7.36/ 7.48
> .E3 :0103: / 8.25/ 8.42/ 8.53/ 9.02
>
As you are reading the lines in, you can use regular expressions to
extract the data that you are interest in. I am not sure where you
want to store the data. Do you want it in a separate file?
> Please Jim, only answer these questions if you have time. I certainly
> appreciate any help very much.
>
> Thank you, Frauke
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/extracting-data-from-unstructured-text-file-tp4464423p4465068.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
More information about the R-help
mailing list