[R] Downloading tab separated data from internet

Prof Brian Ripley ripley at stats.ox.ac.uk
Sat Dec 3 09:09:05 CET 2011


AFAICS what you mean is 'how can I fill in an HTML form using R'.
Answer: use package RCurl.

Do study the posting guide: none of the 'at a minimum' information was 
given here.


On 03/12/2011 04:47, HC wrote:
> Hi all,
>
> I am trying to download some tab separated data from the internet. The data
> is not available directly at the URL that could be known apriori. There is
> an intermediate form where start and end dates have to be given to get to
> the required page.
>
> For example, I want to download data for a station 03015795. The form for
> this station is at:
>
> http://ida.water.usgs.gov/ida/available_records.cfm?sn=03015795
>
> I could get the start date and end date from this form using:
>
> #
> # Specifying station and reading from the opening form
> stn<-"03015795"
> myurl<-paste("http://ida.water.usgs.gov/ida/available_records.cfm?sn=",stn,sep="")
> mypage1 = readLines(myurl)
>
> # Getting the start and end dates
> mypattern = '<td align="center">([^<]*)</td>'
> datalines = grep(mypattern, mypage1[124], value=TRUE)
> getexpr = function(s,g)substring(s,g,g+attr(g,'match.length')-1)
> gg = gregexpr(mypattern, datalines)
> matches = mapply(getexpr,datalines,gg)
> result = gsub(mypattern,'\\1',matches)
> names(result)=NULL
> mydates<-result[1:2]
>
> I want to know how I can feed these start and end dates to the form and
> execute the button to go to the data page and then to download the data,
> either as displayed in the browser or by saving as a file.
>
> Any help on this is most appreciated.
>
> Thanks.
> HC


-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595



More information about the R-help mailing list