[R] Behaviour of 'source' with URLs and proxy

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 5 13:45:58 CEST 2011


On Wed, 5 Oct 2011, Renaud Gaujoux wrote:

> From the help page ?file I -- had -- read the following:
>
> "For ‘url’ the description is a complete URL, including scheme
> (such as ‘http://’,ftp://’ or ‘file://’). Proxies can be
> specified for HTTP and FTP ‘url’ connections: see ‘download.file’."

So you should have known that it was the same as url()!

> From the internet.info messages it seems that the proxy is actually used, but 
> somehow differently than what download.file does (via wget).

No, somewhat differently than *wget* does.  As that help page says, 
the section on proxies only refers to the internal method.

> Is source supposed to work through a proxy?

Yes, and it has been tested to do so.  But not tested on your proxy ....


>
> -- 
> Renaud Gaujoux
> Computational Biology - University of Cape Town
> South Africa
>
>
> On 05/10/2011 12:26, Prof Brian Ripley wrote:
>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>> 
>>> Hi,
>>> 
>>> I am having troubles sourcing a file from our local network from R.
>>> It looks like this file are not properly accessed by 'source', even they 
>>> can be downloaded with download.file. (See below my settings and some 
>>> tests I did). I ended up with a work around, but I would like to 
>>> understand what is going on.
>>> 
>>> Doesn't source/readLines uses the same mechanism as download.file to 
>>> access URLs?
>> 
>> No. They use url() connections. See ?file.
>> 
>>> 
>>> Thank you.
>>> 
>>> Renaud
>>> 
>>> My setting:
>>> - I am using R 2.13.2 on Ubuntu 11.04.
>>> - I am accessing internet through a proxy (set up with cntlm, not sure if 
>>> this is the issue but I don't know how to check without it). This means 
>>> that http_proxy='http://localhost:8080/'.
>>> - We have local CRNA/BioConductor mirrors that can be accessed without 
>>> going through the proxy.
>>> - My .Rprofile sources a file 'setrepos.R' on the local network, that sets 
>>> all relevant repos to our local mirrors.
>>> 
>>> From the shell:
>>> - I can wget any URL (local or internet) from command line without a 
>>> problem.
>>> - In particular I can wget the file 'setrepos.R' from command line.
>>> 
>>> Symptoms:
>>> - with options(download.file.method='wget'), I can download any URL (local 
>>> or internet) with download.file
>>> - I _cannot_ source any local or internet URL if http_proxy is set. It 
>>> simply freezes. Using internet.info=0 gives the following messages:
>>> ############
>>> Warning messages:
>>> 1: In file(file, "r", encoding = encoding) :
>>> using HTTP proxy 'http://localhost:8080/'
>>> 2: In file(file, "r", encoding = encoding) :
>>> connected to 'localhost' on port 8080.
>>> 3: In file(file, "r", encoding = encoding) :
>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0
>>> Host: *OUR_HOST*
>>> Pragma: no-cache
>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>> 
>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK
>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004
>>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive
>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: 
>>> Keep-Alive
>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597
>>> 9: In file(file, "r", encoding = encoding) :
>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT
>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: text/plain
>>> 11: In file(file, "r", encoding = encoding) :
>>> <- ETag: "30b8018-63d-4a627b821c980"
>>> 12: In file(file, "r", encoding = encoding) :
>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 with 
>>> Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g 
>>> mod_perl/2.0.4 Perl/v5.10.0
>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes
>>> 14: In file(file, "r", encoding = encoding) :
>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT
>>> 15: In file(file, "r", encoding = encoding) : Code 200, content-type 
>>> 'text/plain'
>>> ############
>>> 
>>> - Setting options(download.file.method='wget') before sourcing does not 
>>> change the behaviour.
>>> - However, I can source any local URL if http_proxy='', without changing 
>>> download.file.method. But then download.file does not work for internet 
>>> URL any more since the proxy settings are wrong. I could set 
>>> http_proxy='', then source, then restore the proxy settings and set 
>>> options(download.file.method='wget'). But this is just a work around and I 
>>> would like to understand what is going on.
>>> 
>>> Session Info:
>>> 
>>> R version 2.13.2 (2011-09-30)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> 
>>> locale:
>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C
>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8
>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C
>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C
>>> 
>>> attached base packages:
>>> [1] stats graphics grDevices utils datasets methods base
>>> 
>>> other attached packages:
>>> [1] devtools_0.4
>>> 
>>> loaded via a namespace (and not attached):
>>> [1] RCurl_1.6-10 tools_2.13.2
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Renaud Gaujoux
>>> Computational Biology - University of Cape Town
>>> South Africa
>>> 
>>> 
>>> 
>>> 
>>> ###
>>> 
>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and 
>>> e-mai...{{dropped:5}}
>>> 
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide 
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>> 
>> 
>
>
>
> ###
>
> UNIVERSITY OF CAPE TOWN 
> This e-mail is subject to the UCT ICT policies and e-mail disclaimer 
> published on our website at 
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 
> 21 650 9111. This e-mail is intended only for the person(s) to whom it is 
> addressed. If the e-mail has reached you in error, please notify the author. 
> If you are not the intended recipient of the e-mail you may not use, 
> disclose, copy, redirect or print the content. If this e-mail is not related 
> to the business of UCT it is sent by the sender in the sender's individual 
> capacity.
>
> ###
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list