[R] Dealing with data of varying format
mpb170 at psu.edu
Thu Jun 6 16:18:12 CEST 2002
I'm trying read some data files into data frames, but the formatting of their
headers is creating problems when I try to reference certain columns. For
instance...below I've pasted four of the headers to give an idea how they vary
A 90404001 POLARSTERN GERMANY MIZEX
A 819300 HUDSON CANADA unknown
A 823109 METEOR (POST 7/64) GERMANY unknown
A 900002034 LOUIS ST: LAURENT UNITED STATES AO1994
Basically, I want all four of these headers to be read into a data frame
row with 5 columns. This is no problem for the first two headers. But the
second two headers are automatically being read into rows that are 7 and
8 columns long respectively due to the extra wordes (such as United States
for a country name as opposed to just Germany).
The headers are set up with equal spacing so that if you were programming in
Fortran you could simply create variables of lengths 3, 13, 27, 24, and 15 and
the data would be read in properly. For instance, the third line above would
be read into 5 variables as demonstrated below, with underscores representing spaces:
823109_ _ _ _ _ _ _
METEOR_(POST_/64)_ _ _ _ _ _ _ _ _ _
GERMANY_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
unknown_ _ _ _ _ _ _ _
Is there anything in R that would do something similar? Ideally I'd like to read the header files into data frames but realize this may not be entirely possible
if I need to define the length of the elements to be read in. So I'm willing to read the header files into variables first then transfer them to a data frame if this makes my task any more doable.
Any other suggestions to deal with this problem would be much appreciated. Thanks.
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !) To: r-help-request at stat.math.ethz.ch
More information about the R-help