[R] Behaviour of 'source' with URLs and proxy

Prof Brian Ripley ripley at stats.ox.ac.uk
Wed Oct 5 14:49:49 CEST 2011


On Wed, 5 Oct 2011, Renaud Gaujoux wrote:

>
> On 05/10/2011 13:45, Prof Brian Ripley wrote:
>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>> 
>>> From the help page ?file I -- had -- read the following:
>>> 
>>> "For ‘url’ the description is a complete URL, including scheme
>>> (such as ‘http://’,ftp://’ or ‘file://’). Proxies can be
>>> specified for HTTP and FTP ‘url’ connections: see ‘download.file’."
>> 
>> So you should have known that it was the same as url()!
>
> I agree. I just thought -- incorrectly -- that any attempt to download a file 
> from R would eventually call the same C code as download.file. Or maybe

It does.  But download.file(method="wget") does not call that C code 
....

> source() does not download and source, but reads the file on the fly?

That is true too, but then downloading a file is always done in 
chunks.

>> 
>>> From the internet.info messages it seems that the proxy is actually used, 
>>> but somehow differently than what download.file does (via wget).
>> 
>> No, somewhat differently than *wget* does.  As that help page says, the 
>> section on proxies only refers to the internal method.
>> 
>>> Is source supposed to work through a proxy?
>> 
>> Yes, and it has been tested to do so.  But not tested on your proxy ....
>
> OK, I agree that my settings look special, but in the end it is supposed to 
> be a plain local proxy with no authentication.
> The proxy is effectively used by the internal method and, from the messages 
> (below), the remote file is opened, http headers are returned, but nothing 
> else happens and I have to cancel the command (Ctrl-C).
>
> This is where I would like to have some input, so that I can work out the 
> issue.
> I tried to go through the C code for internet with no great luck: seems that 
> in_R_HTTPRead and RxmlNanoHTTPRead would the place to look at.
>
> Any idea on what would cause these functions to hang (infinite loop, 
> communication problem, ...)?
> I know, I am too curious.
>
> Thank you
>
>
>> Sys.getenv('http_proxy')
> [1] "http://localhost:8080/"
>> Sys.getenv('no_proxy')
> [1] "localhost,127.0.0.0/8,*.local"
>> options(internet.info=0)
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt")
> trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt'
> Content type 'text/plain' length 1209 bytes
> opened URL
> ^C
> There were 15 warnings (use warnings() to see them)
>> warnings()
> Warning messages:
> 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>  connected to 'localhost' on port 8080.
> 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>  -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0
> Host: lib.stat.cmu.edu
> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>
> 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- HTTP/1.1 200 OK
> 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004
> 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Connection: Keep-Alive
> 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Proxy-Connection: Keep-Alive
> 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Content-Length: 1209
> 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Age: 747
> 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Date: Wed, 05 Oct 2011 11:52:25 GMT
> 10: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Content-Type: text/plain
> 11: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- ETag: "5c700f3-4b9-399383c0"
> 12: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Server: Apache
> 13: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Accept-Ranges: bytes
> 14: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
> <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT
> 15: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",  ... :
>  Code 200, content-type 'text/plain'
>
>> 
>> 
>>> 
>>> -- 
>>> Renaud Gaujoux
>>> Computational Biology - University of Cape Town
>>> South Africa
>>> 
>>> 
>>> On 05/10/2011 12:26, Prof Brian Ripley wrote:
>>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am having troubles sourcing a file from our local network from R.
>>>>> It looks like this file are not properly accessed by 'source', even they 
>>>>> can be downloaded with download.file. (See below my settings and some 
>>>>> tests I did). I ended up with a work around, but I would like to 
>>>>> understand what is going on.
>>>>> 
>>>>> Doesn't source/readLines uses the same mechanism as download.file to 
>>>>> access URLs?
>>>> 
>>>> No. They use url() connections. See ?file.
>>>> 
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>> Renaud
>>>>> 
>>>>> My setting:
>>>>> - I am using R 2.13.2 on Ubuntu 11.04.
>>>>> - I am accessing internet through a proxy (set up with cntlm, not sure 
>>>>> if this is the issue but I don't know how to check without it). This 
>>>>> means that http_proxy='http://localhost:8080/'.
>>>>> - We have local CRNA/BioConductor mirrors that can be accessed without 
>>>>> going through the proxy.
>>>>> - My .Rprofile sources a file 'setrepos.R' on the local network, that 
>>>>> sets all relevant repos to our local mirrors.
>>>>> 
>>>>> From the shell:
>>>>> - I can wget any URL (local or internet) from command line without a 
>>>>> problem.
>>>>> - In particular I can wget the file 'setrepos.R' from command line.
>>>>> 
>>>>> Symptoms:
>>>>> - with options(download.file.method='wget'), I can download any URL 
>>>>> (local or internet) with download.file
>>>>> - I _cannot_ source any local or internet URL if http_proxy is set. It 
>>>>> simply freezes. Using internet.info=0 gives the following messages:
>>>>> ############
>>>>> Warning messages:
>>>>> 1: In file(file, "r", encoding = encoding) :
>>>>> using HTTP proxy 'http://localhost:8080/'
>>>>> 2: In file(file, "r", encoding = encoding) :
>>>>> connected to 'localhost' on port 8080.
>>>>> 3: In file(file, "r", encoding = encoding) :
>>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0
>>>>> Host: *OUR_HOST*
>>>>> Pragma: no-cache
>>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>>>> 
>>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK
>>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004
>>>>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive
>>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection: 
>>>>> Keep-Alive
>>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597
>>>>> 9: In file(file, "r", encoding = encoding) :
>>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT
>>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type: 
>>>>> text/plain
>>>>> 11: In file(file, "r", encoding = encoding) :
>>>>> <- ETag: "30b8018-63d-4a627b821c980"
>>>>> 12: In file(file, "r", encoding = encoding) :
>>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6 
>>>>> with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9 
>>>>> OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0
>>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes
>>>>> 14: In file(file, "r", encoding = encoding) :
>>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT
>>>>> 15: In file(file, "r", encoding = encoding) : Code 200, content-type 
>>>>> 'text/plain'
>>>>> ############
>>>>> 
>>>>> - Setting options(download.file.method='wget') before sourcing does not 
>>>>> change the behaviour.
>>>>> - However, I can source any local URL if http_proxy='', without changing 
>>>>> download.file.method. But then download.file does not work for internet 
>>>>> URL any more since the proxy settings are wrong. I could set 
>>>>> http_proxy='', then source, then restore the proxy settings and set 
>>>>> options(download.file.method='wget'). But this is just a work around and 
>>>>> I would like to understand what is going on.
>>>>> 
>>>>> Session Info:
>>>>> 
>>>>> R version 2.13.2 (2011-09-30)
>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>> 
>>>>> locale:
>>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C
>>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8
>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C
>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C
>>>>> 
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>> 
>>>>> other attached packages:
>>>>> [1] devtools_0.4
>>>>> 
>>>>> loaded via a namespace (and not attached):
>>>>> [1] RCurl_1.6-10 tools_2.13.2
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> -- 
>>>>> 
>>>>> Renaud Gaujoux
>>>>> Computational Biology - University of Cape Town
>>>>> South Africa
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> ###
>>>>> 
>>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies 
>>>>> and e-mai...{{dropped:5}}
>>>>> 
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide 
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> ###
>>> 
>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and 
>>> e-mail disclaimer published on our website at 
>>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from 
>>> +27 21 650 9111. This e-mail is intended only for the person(s) to whom it 
>>> is addressed. If the e-mail has reached you in error, please notify the 
>>> author. If you are not the intended recipient of the e-mail you may not 
>>> use, disclose, copy, redirect or print the content. If this e-mail is not 
>>> related to the business of UCT it is sent by the sender in the sender's 
>>> individual capacity.
>>> 
>>> ###
>>> 
>>> 
>> 
>
>
>
> ###
>
> UNIVERSITY OF CAPE TOWN 
> This e-mail is subject to the UCT ICT policies and e-mail disclaimer 
> published on our website at 
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27 
> 21 650 9111. This e-mail is intended only for the person(s) to whom it is 
> addressed. If the e-mail has reached you in error, please notify the author. 
> If you are not the intended recipient of the e-mail you may not use, 
> disclose, copy, redirect or print the content. If this e-mail is not related 
> to the business of UCT it is sent by the sender in the sender's individual 
> capacity.
>
> ###
>
>

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


More information about the R-help mailing list