[Rd] [R] HTTP User-Agent header

Robert Gentleman rgentlem at fhcrc.org
Mon Jul 31 18:45:30 CEST 2006


should appear at an R-devel near you...
thanks Seth


Seth Falcon wrote:
> Robert Gentleman <rgentlem at fhcrc.org> writes:
>> OK, that suggests setting at the options level would solve both of your 
>> problems and that seems like the best approach. I don't really want to 
>> pass this around as a parameter through the maze of functions that might 
>> actually download something if we don't have to.
> 
> I have an updated patch that adds an HTTPUserAgent option.  The
> default is a string like:
> 
>     R (2.4.0 x86_64-unknown-linux-gnu x86_64 linux-gnu)
> 
> If the HTTPUserAgent option is NULL, no user agent header is added to
> HTTP requests (this is the current behavior).  This option allows R to
> use an arbitrary user agent header.
> 
> The patch adds two non-exported functions to utils: 
>    1) defaultUserAgent - returns a string like above
>    2) makeUserAgent - formats content of HTTPUserAgent option for use
>       as part of an HTTP request header.
> 
> I've tested on OSX and Linux, but not on Windows.  When USE_WININET is
> defined, a user agent string of "R" was already being used.  With this
> patch, the HTTPUserAgent options is used.  I'm unsure if NULL is
> allowed.
> 
> Also, in src/main/internet.c there is a comment:
>   "Next 6 are for use by libxml, only"
> and then a definition for R_HTTPOpen.  Not sure how/when these get
> used.  The user agent for these calls remains unspecified with this
> patch.
> 
> + seth
> 
> 
> Patch summary:
>  src/include/R_ext/R-ftp-http.h   |    2 +-
>  src/include/Rmodules/Rinternet.h |    2 +-
>  src/library/base/man/options.Rd  |    5 +++++
>  src/library/utils/R/readhttp.R   |   25 +++++++++++++++++++++++++
>  src/library/utils/R/zzz.R        |    3 ++-
>  src/main/internet.c              |    2 +-
>  src/modules/internet/internet.c  |   37 +++++++++++++++++++++++++------------
>  src/modules/internet/nanohttp.c  |    8 ++++++--
>  8 files changed, 66 insertions(+), 18 deletions(-)
> 
> 
> 
> Index: src/include/R_ext/R-ftp-http.h
> ===================================================================
> --- src/include/R_ext/R-ftp-http.h	(revision 38715)
> +++ src/include/R_ext/R-ftp-http.h	(working copy)
> @@ -36,7 +36,7 @@
>  int   R_FTPRead(void *ctx, char *dest, int len);
>  void  R_FTPClose(void *ctx);
>  
> -void *	RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK);
> +void *	RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers, int cacheOK);
>  int	RxmlNanoHTTPRead(void *ctx, void *dest, int len);
>  void	RxmlNanoHTTPClose(void *ctx);
>  int 	RxmlNanoHTTPReturnCode(void *ctx);
> Index: src/include/Rmodules/Rinternet.h
> ===================================================================
> --- src/include/Rmodules/Rinternet.h	(revision 38715)
> +++ src/include/Rmodules/Rinternet.h	(working copy)
> @@ -9,7 +9,7 @@
>  typedef Rconnection (*R_NewUrlRoutine)(char *description, char *mode);
>  typedef Rconnection (*R_NewSockRoutine)(char *host, int port, int server, char *mode); 
>  
> -typedef void * (*R_HTTPOpenRoutine)(const char *url, const int cacheOK);
> +typedef void * (*R_HTTPOpenRoutine)(const char *url, const char *headers, const int cacheOK);
>  typedef int    (*R_HTTPReadRoutine)(void *ctx, char *dest, int len);
>  typedef void   (*R_HTTPCloseRoutine)(void *ctx);
>  	      
> Index: src/main/internet.c
> ===================================================================
> --- src/main/internet.c	(revision 38715)
> +++ src/main/internet.c	(working copy)
> @@ -129,7 +129,7 @@
>  {
>      if(!initialized) internet_Init();
>      if(initialized > 0)
> -	return (*ptr->HTTPOpen)(url, 0);
> +	return (*ptr->HTTPOpen)(url, NULL, 0);
>      else {
>  	error(_("internet routines cannot be loaded"));
>  	return NULL;
> Index: src/library/utils/R/zzz.R
> ===================================================================
> --- src/library/utils/R/zzz.R	(revision 38715)
> +++ src/library/utils/R/zzz.R	(working copy)
> @@ -9,7 +9,8 @@
>               internet.info = 2,
>               pkgType = .Platform$pkgType,
>               str = list(strict.width = "no"),
> -             example.ask = "default")
> +             example.ask = "default",
> +             HTTPUserAgent = defaultUserAgent())
>      extra <-
>          if(.Platform$OS.type == "windows") {
>              list(mailer = "none",
> Index: src/library/utils/R/readhttp.R
> ===================================================================
> --- src/library/utils/R/readhttp.R	(revision 38715)
> +++ src/library/utils/R/readhttp.R	(working copy)
> @@ -6,3 +6,28 @@
>          stop("transfer failure")
>      file.show(file, delete.file = delete.file, title = title, ...)
>  }
> +
> +
> +
> +defaultUserAgent <- function()
> +{
> +    Rver <- paste(R.version$major, R.version$minor, sep=".")
> +    Rdetails <- paste(Rver, R.version$platform, R.version$arch,
> +                      R.version$os)
> +    paste("R (", Rdetails, ")", sep="")
> +}
> +
> +
> +makeUserAgent <- function(format = TRUE) {
> +    agent <- getOption("HTTPUserAgent")
> +    if (is.null(agent)) {
> +        return(NULL)
> +    }
> +    if (length(agent) != 1)
> +      stop(sQuote("HTTPUserAgent"),
> +           " option must be a length one character vector or NULL")
> +    if (format)
> +      paste("User-Agent: ", agent[1], "\r\n", sep = "")
> +    else
> +      agent[1]
> +}
> Index: src/library/base/man/options.Rd
> ===================================================================
> --- src/library/base/man/options.Rd	(revision 38715)
> +++ src/library/base/man/options.Rd	(working copy)
> @@ -368,6 +368,11 @@
>      \item{\code{help.try.all.packages}:}{default for an argument of
>        \code{\link{help}}.}
>  
> +    \item{\code{HTTPUserAgent}:}{string used as the user agent in HTTP
> +      requests.  If \code{NULL}, HTTP requests will be made without a
> +      user agent header.  The default is \code{R (<version> <platform>
> +      <arch> <os>)}}
> +
>      \item{\code{internet.info}:}{The minimum level of information to be
>        printed on URL downloads etc.  Default is 2, for failure causes.
>        Set to 1 or 0 to get more information.}
> Index: src/modules/internet/internet.c
> ===================================================================
> --- src/modules/internet/internet.c	(revision 38715)
> +++ src/modules/internet/internet.c	(working copy)
> @@ -28,7 +28,7 @@
>  #include <Rconnections.h>
>  #include <R_ext/R-ftp-http.h>
>  
> -static void *in_R_HTTPOpen(const char *url, const int cacheOK);
> +static void *in_R_HTTPOpen(const char *url, const char *headers, const int cacheOK);
>  static int   in_R_HTTPRead(void *ctx, char *dest, int len);
>  static void  in_R_HTTPClose(void *ctx);
>  
> @@ -70,7 +70,7 @@
>  
>      switch(type) {
>      case HTTPsh:
> -	ctxt = in_R_HTTPOpen(url, 0);
> +	ctxt = in_R_HTTPOpen(url, NULL, 0);
>  	if(ctxt == NULL) {
>  	  /* if we call error() we get a connection leak*/
>  	  /* so do_url has to raise the error*/
> @@ -238,14 +238,14 @@
>  }
>  #endif
>  
> -/* download(url, destfile, quiet, mode, cacheOK) */
> +/* download(url, destfile, quiet, mode, headers, cacheOK) */
>  
>  #define CPBUFSIZE 65536
>  #define IBUFSIZE 4096
>  static SEXP in_do_download(SEXP call, SEXP op, SEXP args, SEXP env)
>  {
> -    SEXP ans, scmd, sfile, smode;
> -    char *url, *file, *mode;
> +    SEXP ans, scmd, sfile, smode, sheaders, agentFun;
> +    char *url, *file, *mode, *headers;
>      int quiet, status = 0, cacheOK;
>  
>      checkArity(op, args);
> @@ -271,6 +271,17 @@
>      cacheOK = asLogical(CAR(args));
>      if(cacheOK == NA_LOGICAL)
>  	error(_("invalid '%s' argument"), "cacheOK");
> +#ifdef USE_WININET
> +    PROTECT(agentFun = lang2(install("makeUserAgent"), ScalarLogical(0)));
> +#else
> +    PROTECT(agentFun = lang1(install("makeUserAgent")));
> +#endif
> +    PROTECT(sheaders = eval(agentFun, R_FindNamespace(mkString("utils"))));
> +    UNPROTECT(1);
> +    if(TYPEOF(sheaders) == NILSXP)
> +        headers = NULL;
> +    else 
> +        headers = CHAR(STRING_ELT(sheaders, 0));
>  #ifdef Win32
>      if (!pbar.wprog) {
>  	pbar.wprog = newwindow(_("Download progress"), rect(0, 0, 540, 100),
> @@ -319,7 +330,7 @@
>  #ifdef Win32
>  	R_FlushConsole();
>  #endif
> -	ctxt = in_R_HTTPOpen(url, cacheOK);
> +	ctxt = in_R_HTTPOpen(url, headers, cacheOK);
>  	if(ctxt == NULL) status = 1;
>  	else {
>  	    if(!quiet) REprintf(_("opened URL\n"), url);
> @@ -466,14 +477,14 @@
>  
>      PROTECT(ans = allocVector(INTSXP, 1));
>      INTEGER(ans)[0] = status;
> -    UNPROTECT(1);
> +    UNPROTECT(2);
>      return ans;
>  }
>  
>  
>  #if defined(SUPPORT_LIBXML) && !defined(USE_WININET)
>  
> -void *in_R_HTTPOpen(const char *url, int cacheOK)
> +void *in_R_HTTPOpen(const char *url, const char *headers, const int cacheOK)
>  {
>      inetconn *con;
>      void *ctxt;
> @@ -484,7 +495,7 @@
>      if(timeout == NA_INTEGER || timeout <= 0) timeout = 60;
>  
>      RxmlNanoHTTPTimeout(timeout);
> -    ctxt = RxmlNanoHTTPOpen(url, NULL, cacheOK);
> +    ctxt = RxmlNanoHTTPOpen(url, NULL, headers, cacheOK);
>      if(ctxt != NULL) {
>  	int rc = RxmlNanoHTTPReturnCode(ctxt);
>  	if(rc != 200) {
> @@ -605,7 +616,8 @@
>  }
>  #endif /* USE_WININET_ASYNC */
>  
> -static void *in_R_HTTPOpen(const char *url, const int cacheOK)
> +static void *in_R_HTTPOpen(const char *url, const char *headers, 
> +                           const int cacheOK)
>  {
>      WIctxt  wictxt;
>      DWORD status, d1 = 4, d2 = 0, d3 = 100;
> @@ -622,7 +634,7 @@
>      wictxt->length = -1;
>      wictxt->type = NULL;
>      wictxt->hand =
> -	InternetOpen("R", INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL,
> +	InternetOpen(headers, INTERNET_OPEN_TYPE_PRECONFIG, NULL, NULL,
>  #ifdef USE_WININET_ASYNC
>  		     INTERNET_FLAG_ASYNC
>  #else
> @@ -870,7 +882,8 @@
>  #endif
>  
>  #ifndef HAVE_INTERNET
> -static void *in_R_HTTPOpen(const char *url, const int cacheOK)
> +static void *in_R_HTTPOpen(const char *url, const char *headers, 
> +                           const int cacheOK)
>  {
>      return NULL;
>  }
> Index: src/modules/internet/nanohttp.c
> ===================================================================
> --- src/modules/internet/nanohttp.c	(revision 38715)
> +++ src/modules/internet/nanohttp.c	(working copy)
> @@ -1034,6 +1034,9 @@
>   * @contentType:  if available the Content-Type information will be
>   *                returned at that location
>   *
> + * @headers: headers to be used in the HTTP request.  These must be name/value
> + *           pairs separated by ':', each on their own line.
> + *
>   * This function try to open a connection to the indicated resource
>   * via HTTP GET.
>   *
> @@ -1042,10 +1045,11 @@
>   */
>  
>  void*
> -RxmlNanoHTTPOpen(const char *URL, char **contentType, int cacheOK)
> +RxmlNanoHTTPOpen(const char *URL, char **contentType, const char *headers,
> +                 int cacheOK)
>  {
>      if (contentType != NULL) *contentType = NULL;
> -    return RxmlNanoHTTPMethod(URL, NULL, NULL, contentType, NULL, cacheOK);
> +    return RxmlNanoHTTPMethod(URL, NULL, NULL, contentType, headers, cacheOK);
>  }
>  
>  /**
> 
> ______________________________________________
> R-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

-- 
Robert Gentleman, PhD
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M2-B876
PO Box 19024
Seattle, Washington 98109-1024
206-667-7700
rgentlem at fhcrc.org



More information about the R-devel mailing list