[R] reading data from web data sources
tim+r-project.org at coote.org
Sat Feb 27 12:32:15 CET 2010
I'm trying to read some time series data of meteorological records
that are available on the web (eg http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat)
. I'd like to be able to read in the digital data directly into R.
However, I cannot work out the right function and set of parameters to
use. It could be that the only practical route is to write a parser,
possibly in some other language, reformat the files and then read
these into R. As far as I can tell, the informal grammar of the file is:
<comments terminated by a blank line>
[<year number on a line on its own>
<daily readings lines> ]+
and the <daily readings> are of the form:
<whitespace> <day number> [<whitespace> <reading on day of month>] 12
Readings for days in months where a day does not exist have special
values. Missing values have a different special value.
And then I've got the problem of iterating over all relevant files to
get a whole timeseries.
Is there a way to read in this type of file into R? I've read all of
the examples that I can find, but cannot work out how to do it. I
don't think that read.table can handle the separate sections of data
representing each year. read.ftable maybe can be coerced to parse the
data, but I cannot see how after reading the documentation and
experimenting with the parameters.
I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.
Any help/suggestions would be greatly appreciated. I can see that this
type of issue is likely to grow in importance, and I'd also like to
give the data owners suggestions on how to reformat their data so that
it is easier to consume by machines, while being easy to read for
The early records are a serious machine parsing challenge as they are
tiff images of old notebooks ;-)
tim at coote.org
More information about the R-help