[R] R and HTTP get 'has file changed'
Dirk Eddelbuettel
edd at debian.org
Fri Jul 13 05:04:18 CEST 2007
On 12 July 2007 at 19:46, Seth Falcon wrote:
| Hi Dirk,
|
| Dirk Eddelbuettel <edd at debian.org> writes:
| > Is there a way, maybe using Duncan TL's RCurl, to efficiently test whether
| > an URL such as
| >
| > http://$CRAN/src/contrib/
| >
| > has changed? I.e. one way is via a test of a page in that directory as per
| > (sorry about the long line, and this would be on Linux with links and awk
| > installed)
| >
| > > strptime(system("links -width 160 -dump http://cran.r-project.org/src/contrib/ | awk '/PACKAGES.html/ {print $3,$4}\'", intern=TRUE), "%d-%b-%Y %H:%M")
| > [1] "2007-07-12 18:16:00"
| > >
| >
| > and one can then compare the POSIXt with a cached value --- but requesting
| > the header would presumably be more efficient.
| >
| > Is there are way to request the 'has changed' part of the http 1.1 spe
| > directly in R?
|
| Here's a way to use RCurl obtain HTTP headers:
|
| h <- basicTextGatherer()
| junk <- getURI(url, writeheader=h$update, header=TRUE, nobody=TRUE)
| h <- h$value()
Sweet:
> library(RCurl)
> h <- basicTextGatherer()
> junk <- getURI("http://cran.r-project.org/src/contrib/PACKAGES.html", writeheader=h$update, header=TRUE, nobody=TRUE)
> h <- h$value()
> h
[1] "HTTP/1.1 200 OK\r\nDate: Fri, 13 Jul 2007 02:58:03 GMT\r\nServer: Apache/2.2.3 (Debian)\r\nLast-Modified: Thu, 12 Jul 2007 16:16:08 GMT\r\nETag: \"a7c11e-21f34-4fe68200\"\r\nAccept-Ranges: bytes\r\nContent-Length: 139060\r\nContent-Type: text/html\r\n\r\n"
>
So I can just filter the Date and Last-Modified fields from here, without
having to worry the particular header request. Nice!
| If you want to check many URLs, I think you will find the following
I don't. I just want something 'light and easy' as the script (to feed
CRANberries) may get run a few times from crontan and should stop early if
no new data will be there to be processed.
Thanks!
Dirk
--
Hell, there are no rules here - we're trying to accomplish something.
-- Thomas A. Edison
More information about the R-help
mailing list