[R] Preprocessing troublesome files in R - looking for some perl like functionality
Andy Bunn
abunn at whrc.org
Thu Jun 2 15:43:05 CEST 2005
Hi all:
I have acquired a 100s of data files that I need to preprocess to get them
usable in R. The files are fixed width (to a point) and contain 1 to 3 lines
of header, followed by a variable number of fixed width data lines (that I
can read with read.fwf). I want to read through the files and remove every
_line_ where characters column 83-86 do not equal "STD". If I can do that
and store it in a text file, then I can get the data I need using read.fwf.
I can't figure out how to do this because of the irregularity of the header
info buried in the file. It seems like the kind of thing perl or emacs would
be good at but I'd like to do it all in R if possible. Any pointers
appreciated.
-Andy
R > version
_
platform i386-pc-mingw32
arch i386
os mingw32
system i386, mingw32
status
major 2
minor 1.0
year 2005
month 04
day 18
language R
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a snippet of one of the data files:
929 2 Russia Dahurian larch 150 6946-11249 1830 1990 -
RAW
RUSS061830 568 11122 1 806 1 843 2 862 3 902 31244 3 986 31210
31074 3 RAW
RUSS0618401369 4 937 41154 4 869 4 702 4 716 4 972 4 682 5 878 5
582 5 RAW
929 2 Russia Dahurian larch 150 6946-11249 1830 1990 -
STD
RUSS061830 568 11122 1 806 1 843 2 862 3 902 31244 3 986 31210
31074 3 STD
RUSS0618401369 4 937 41154 4 869 4 702 4 716 4 972 4 682 5 878 5
582 5 STD
RUSS0619701158 26 906 26 954 26 746 26 629 26 858 261268 261345 261102
261298 26 STD
RUSS061980 483 26 780 26 995 261273 261391 26 996 261621 26 878 261418 26
514 26 STD
RUSS0619901071 269990 09990 09990 09990 09990 09990 09990 09990
09990 0 STD
929 2 Russia Dahurian larch 150 6946-11249 1830 1990 -
RES
RUSS061830 604 11215 1 889 1 828 2 909 3 982 31294 3 947 31091
31030 3 RES
RUSS0618401290 4 858 41057 4 917 4 712 4 824 41077 4 709 5 911 5
747 5 RES
RUSS061850 873 5 994 51179 71040 71028 7 923 71120 7 846 101146 11
854 13 RES
RUSS0618601609 141209 16 780 16 758 171238 171191 17 858 17 903 17 930 18
334 18 RES
929 2 Russia Dahurian larch 150 6946-11249 1830 1990 -
ARS
RUSS061850 873 5 994 51179 71040 71028 7 923 71120 7 846 101146 11
854 13 ARS
RUSS0618601609 141209 16 780 16 758 171238 171191 17 858 17 903 17 930 18
334 18 ARS
More information about the R-help
mailing list