[R] reading data from web data sources
ggrothendieck at gmail.com
Sat Feb 27 13:15:00 CET 2010
Try this. First we read the raw lines into R using grep to remove any
lines containing a character that is not a number or space. Then we
look for the year lines and repeat them down V1 using cumsum. Finally
we omit the year lines.
myURL <- "http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat"
raw.lines <- readLines(myURL)
DF <- read.table(textConnection(raw.lines[!grepl("[^
0-9.]",raw.lines)]), fill = TRUE)
DF$V1 <- DF[cumsum(is.na(DF[])), 1]
DF <- na.omit(DF)
On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote <tim+r-project.org at coote.org> wrote:
> I'm trying to read some time series data of meteorological records that are
> available on the web (eg
> http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat). I'd
> like to be able to read in the digital data directly into R. However, I
> cannot work out the right function and set of parameters to use. It could
> be that the only practical route is to write a parser, possibly in some
> other language, reformat the files and then read these into R. As far as I
> can tell, the informal grammar of the file is:
> <comments terminated by a blank line>
> [<year number on a line on its own>
> <daily readings lines> ]+
> and the <daily readings> are of the form:
> <whitespace> <day number> [<whitespace> <reading on day of month>] 12
> Readings for days in months where a day does not exist have special values.
> Missing values have a different special value.
> And then I've got the problem of iterating over all relevant files to get a
> whole timeseries.
> Is there a way to read in this type of file into R? I've read all of the
> examples that I can find, but cannot work out how to do it. I don't think
> that read.table can handle the separate sections of data representing each
> year. read.ftable maybe can be coerced to parse the data, but I cannot see
> how after reading the documentation and experimenting with the parameters.
> I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10.
> Any help/suggestions would be greatly appreciated. I can see that this type
> of issue is likely to grow in importance, and I'd also like to give the data
> owners suggestions on how to reformat their data so that it is easier to
> consume by machines, while being easy to read for humans.
> The early records are a serious machine parsing challenge as they are tiff
> images of old notebooks ;-)
> Tim Coote
> tim at coote.org
> vincit veritas
> R-help at r-project.org mailing list
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help