[R] R and HTTP get 'has file changed'

Dirk Eddelbuettel edd at debian.org
Fri Jul 13 05:04:18 CEST 2007


On 12 July 2007 at 19:46, Seth Falcon wrote:
| Hi Dirk,
| 
| Dirk Eddelbuettel <edd at debian.org> writes:
| > Is there a way, maybe using Duncan TL's RCurl, to efficiently test whether
| > an URL such as 
| >
| > 	http://$CRAN/src/contrib/ 
| >
| > has changed?  I.e. one way is via a test of a page in that directory as per
| > (sorry about the long line, and this would be on Linux with links and awk
| > installed)
| >
| >    > strptime(system("links -width 160 -dump http://cran.r-project.org/src/contrib/ | awk '/PACKAGES.html/ {print $3,$4}\'", intern=TRUE), "%d-%b-%Y %H:%M")
| >    [1] "2007-07-12 18:16:00"
| >    > 
| >
| > and one can then compare the POSIXt with a cached value --- but requesting
| > the header would presumably be more efficient.
| >
| > Is there are way to request the 'has changed' part of the http 1.1 spe
| > directly in R?
| 
| Here's a way to use RCurl obtain HTTP headers:
| 
|         h <- basicTextGatherer()
|         junk <- getURI(url, writeheader=h$update, header=TRUE, nobody=TRUE)
|         h <- h$value()

Sweet:

> library(RCurl)
> h <- basicTextGatherer()
> junk <- getURI("http://cran.r-project.org/src/contrib/PACKAGES.html", writeheader=h$update, header=TRUE, nobody=TRUE)
> h <- h$value()
> h
[1] "HTTP/1.1 200 OK\r\nDate: Fri, 13 Jul 2007 02:58:03 GMT\r\nServer: Apache/2.2.3 (Debian)\r\nLast-Modified: Thu, 12 Jul 2007 16:16:08 GMT\r\nETag: \"a7c11e-21f34-4fe68200\"\r\nAccept-Ranges: bytes\r\nContent-Length: 139060\r\nContent-Type: text/html\r\n\r\n"
> 

So I can just filter the Date and Last-Modified fields from here, without
having to worry the particular header request. Nice!

| If you want to check many URLs, I think you will find the following

I don't. I just want something 'light and easy' as the script (to feed
CRANberries) may get run a few times from crontan and should stop early if
no new data will be there to be processed.

Thanks!

Dirk

-- 
Hell, there are no rules here - we're trying to accomplish something. 
                                                  -- Thomas A. Edison



More information about the R-help mailing list