[R] analizing .txt file with R or an other program
Daniel Malter
daniel at umd.edu
Sat Jul 23 21:51:39 CEST 2011
Hi,
The blunt answer is: by learning R. In particular, you will need pattern
matching techniques as in ?grep and (somewhat advanced, some would call it
basic) knowledge of R. So if you aren't familiar with either, I would
suggest an introductory manual or one of the many websites you find online
and then to dig deeper into the pattern matching stuff.
Generally, please adhere to the posting guide (provide a self-contained,
i.e., copy/paste-able, example of code/data for people to work with). Also,
you will be much more likely to receive a response if you have demonstrated
own coding effort (contributors are willing to solve problems but unwilling
to do other people's work).
Best,
Daniel
aRe wrote:
>
> Hello together
>
> I have a .txt file with about 1Mio! rows.
>
> Sometimes the rows are in the following order (whereas the number of rows
> between the rows marked with an x differ):
>
> ...
> *SBLINK R 5261507*x
> 5261439 516.4 364.3 9148.0 ... 816.0 -1133.0 48.4 MA.C.TB...BL.
> 5261441 516.4 364.0 9145.0 ... 799.0 -1135.0 48.7 MA.C.TB...B..
> 5261443 516.4 363.9 9140.0 ... 817.0 -1171.0 49.3 MA.C.TB.....R
> *MSG 5261445 Prime 11_fe_ha*x
> 5261445 516.7 363.8 9133.0 ... 813.0 -1097.0 49.3 MA.C.TB......
> 5261447 517.0 363.8 9127.0 ... 818.0 -1144.0 49.9 MA.C.T.LRTB..
> *EBLINK R 5261507 5261645 140*x
> 5261509 . . 0.0 ... . . . .............
> 5261511 . . 0.0 ... . . . .............
> *MSG 5261512 Mask 8_ma_ma*x
> 5261513 . . 0.0 ... . . . .............
> 5261515 . . 0.0 ... . . . .............
> ...
>
> Here I would like to generate an output, that gives me the two parts
> "...Prime 11_fe_ha" and "...Mask 8_ma_ma" if and only if "...Prime
> 11_fe_ha" is situated between "SBLINK..." and "EBLINK...".
>
>
>
>
> Sometimes the rows are in the following order (whereas the number of rows
> between the rows marked with an x differ):
>
> ...
> *MSG 5261445 Prime 11_fe_ha*x
> 5261439 516.4 364.3 9148.0 ... 816.0 -1133.0 48.4 MA.C.TB...BL.
> 5261441 516.4 364.0 9145.0 ... 799.0 -1135.0 48.7 MA.C.TB...B..
> 5261443 516.4 363.9 9140.0 ... 817.0 -1171.0 49.3 MA.C.TB.....R
> *SBLINK R 5261507*x5261445 516.7 363.8 9133.0 ... 813.0 -1097.0
> 49.3 MA.C.TB......
> 5261447 517.0 363.8 9127.0 ... 818.0 -1144.0 49.9 MA.C.T.LRTB..
> *EBLINK R 5261507 5261645 140*x
> 5261509 . . 0.0 ... . . . .............
> 5261511 . . 0.0 ... . . . .............
> *MSG 5261512 Mask 8_ma_ma*x
> 5261513 . . 0.0 ... . . . .............
> 5261515 . . 0.0 ... . . . .............
> ...
>
> Here I would like to generate an output, that consists of the two parts
> "...Prime 11_fe_ha" and "...Mask 8_ma_ma" if and only if "SBLINK..." is
> situated between "... Prime 11_fe_ha" and "...Mask 8_ma_ma". The place of
> the "EBLINK..." is not important. that means also the following structure
> should lead to the same output:
>
> ...
> *MSG 5261445 Prime 11_fe_ha*x
> 5261439 516.4 364.3 9148.0 ... 816.0 -1133.0 48.4 MA.C.TB...BL.
> 5261441 516.4 364.0 9145.0 ... 799.0 -1135.0 48.7 MA.C.TB...B..
> 5261443 516.4 363.9 9140.0 ... 817.0 -1171.0 49.3 MA.C.TB.....R
> *SBLINK R 5261507*x5261445 516.7 363.8 9133.0 ... 813.0 -1097.0
> 5261447 517.0 363.8 9127.0 ... 818.0 -1144.0 49.9 MA.C.T.LRTB..
> 5261509 . . 0.0 ... . . . .............
> 5261511 . . 0.0 ... . . . .............
> *MSG 5261512 Mask 8_ma_ma*x
> 5261513 . . 0.0 ... . . . .............
> 5261515 . . 0.0 ... . . . .............
> *EBLINK R 5261507 5261645 140*x
> ...
>
>
> can someone give me a advice how I could manage this task?
>
> thanks
>
> best
>
--
View this message in context: http://r.789695.n4.nabble.com/analizing-txt-file-with-R-or-an-other-program-tp3689025p3689393.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list