[R] Preprocessing troublesome files in R - looking for some perl like functionality

Andy Bunn abunn at whrc.org
Thu Jun 2 15:43:05 CEST 2005


Hi all:

I have acquired a 100s of data files that I need to preprocess to get them
usable in R. The files are fixed width (to a point) and contain 1 to 3 lines
of header, followed by a variable number of fixed width data lines (that I
can read with read.fwf). I want to read through the files and remove every
_line_ where characters column 83-86 do not equal "STD". If I can do that
and store it in a text file, then I can get the data I need using read.fwf.
I can't figure out how to do this because of the irregularity of the header
info buried in the file. It seems like the kind of thing perl or emacs would
be good at but I'd like to do it all in R if possible. Any pointers
appreciated.

-Andy

R > version
         _
platform i386-pc-mingw32
arch     i386
os       mingw32
system   i386, mingw32
status
major    2
minor    1.0
year     2005
month    04
day      18
language R


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a snippet of one of the data files:

929    2 Russia   Dahurian larch 150  6946-11249 1830 1990 -
RAW
RUSS061830 568  11122  1 806  1 843  2 862  3 902  31244  3 986  31210
31074  3  RAW
RUSS0618401369  4 937  41154  4 869  4 702  4 716  4 972  4 682  5 878  5
582  5  RAW
929    2 Russia   Dahurian larch 150  6946-11249 1830 1990 -
STD
RUSS061830 568  11122  1 806  1 843  2 862  3 902  31244  3 986  31210
31074  3  STD
RUSS0618401369  4 937  41154  4 869  4 702  4 716  4 972  4 682  5 878  5
582  5  STD
RUSS0619701158 26 906 26 954 26 746 26 629 26 858 261268 261345 261102
261298 26  STD
RUSS061980 483 26 780 26 995 261273 261391 26 996 261621 26 878 261418 26
514 26  STD
RUSS0619901071 269990  09990  09990  09990  09990  09990  09990  09990
09990  0  STD
929    2 Russia   Dahurian larch 150  6946-11249 1830 1990 -
RES
RUSS061830 604  11215  1 889  1 828  2 909  3 982  31294  3 947  31091
31030  3  RES
RUSS0618401290  4 858  41057  4 917  4 712  4 824  41077  4 709  5 911  5
747  5  RES
RUSS061850 873  5 994  51179  71040  71028  7 923  71120  7 846 101146 11
854 13  RES
RUSS0618601609 141209 16 780 16 758 171238 171191 17 858 17 903 17 930 18
334 18  RES
929    2 Russia   Dahurian larch 150  6946-11249 1830 1990 -
ARS
RUSS061850 873  5 994  51179  71040  71028  7 923  71120  7 846 101146 11
854 13  ARS
RUSS0618601609 141209 16 780 16 758 171238 171191 17 858 17 903 17 930 18
334 18  ARS




More information about the R-help mailing list