[R] Behaviour of 'source' with URLs and proxy
Prof Brian Ripley
ripley at stats.ox.ac.uk
Wed Oct 5 14:49:49 CEST 2011
On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>
> On 05/10/2011 13:45, Prof Brian Ripley wrote:
>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>
>>> From the help page ?file I -- had -- read the following:
>>>
>>> "For ‘url’ the description is a complete URL, including scheme
>>> (such as ‘http://’, ‘ftp://’ or ‘file://’). Proxies can be
>>> specified for HTTP and FTP ‘url’ connections: see ‘download.file’."
>>
>> So you should have known that it was the same as url()!
>
> I agree. I just thought -- incorrectly -- that any attempt to download a file
> from R would eventually call the same C code as download.file. Or maybe
It does. But download.file(method="wget") does not call that C code
....
> source() does not download and source, but reads the file on the fly?
That is true too, but then downloading a file is always done in
chunks.
>>
>>> From the internet.info messages it seems that the proxy is actually used,
>>> but somehow differently than what download.file does (via wget).
>>
>> No, somewhat differently than *wget* does. As that help page says, the
>> section on proxies only refers to the internal method.
>>
>>> Is source supposed to work through a proxy?
>>
>> Yes, and it has been tested to do so. But not tested on your proxy ....
>
> OK, I agree that my settings look special, but in the end it is supposed to
> be a plain local proxy with no authentication.
> The proxy is effectively used by the internal method and, from the messages
> (below), the remote file is opened, http headers are returned, but nothing
> else happens and I have to cancel the command (Ctrl-C).
>
> This is where I would like to have some input, so that I can work out the
> issue.
> I tried to go through the C code for internet with no great luck: seems that
> in_R_HTTPRead and RxmlNanoHTTPRead would the place to look at.
>
> Any idea on what would cause these functions to hang (infinite loop,
> communication problem, ...)?
> I know, I am too curious.
>
> Thank you
>
>
>> Sys.getenv('http_proxy')
> [1] "http://localhost:8080/"
>> Sys.getenv('no_proxy')
> [1] "localhost,127.0.0.0/8,*.local"
>> options(internet.info=0)
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt")
> trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt'
> Content type 'text/plain' length 1209 bytes
> opened URL
> ^C
> There were 15 warnings (use warnings() to see them)
>> warnings()
> Warning messages:
> 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> connected to 'localhost' on port 8080.
> 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0
> Host: lib.stat.cmu.edu
> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>
> 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- HTTP/1.1 200 OK
> 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004
> 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Connection: Keep-Alive
> 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Proxy-Connection: Keep-Alive
> 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Content-Length: 1209
> 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Age: 747
> 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Date: Wed, 05 Oct 2011 11:52:25 GMT
> 10: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Content-Type: text/plain
> 11: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- ETag: "5c700f3-4b9-399383c0"
> 12: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Server: Apache
> 13: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Accept-Ranges: bytes
> 14: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT
> 15: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
> Code 200, content-type 'text/plain'
>
>>
>>
>>>
>>> --
>>> Renaud Gaujoux
>>> Computational Biology - University of Cape Town
>>> South Africa
>>>
>>>
>>> On 05/10/2011 12:26, Prof Brian Ripley wrote:
>>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am having troubles sourcing a file from our local network from R.
>>>>> It looks like this file are not properly accessed by 'source', even they
>>>>> can be downloaded with download.file. (See below my settings and some
>>>>> tests I did). I ended up with a work around, but I would like to
>>>>> understand what is going on.
>>>>>
>>>>> Doesn't source/readLines uses the same mechanism as download.file to
>>>>> access URLs?
>>>>
>>>> No. They use url() connections. See ?file.
>>>>
>>>>>
>>>>> Thank you.
>>>>>
>>>>> Renaud
>>>>>
>>>>> My setting:
>>>>> - I am using R 2.13.2 on Ubuntu 11.04.
>>>>> - I am accessing internet through a proxy (set up with cntlm, not sure
>>>>> if this is the issue but I don't know how to check without it). This
>>>>> means that http_proxy='http://localhost:8080/'.
>>>>> - We have local CRNA/BioConductor mirrors that can be accessed without
>>>>> going through the proxy.
>>>>> - My .Rprofile sources a file 'setrepos.R' on the local network, that
>>>>> sets all relevant repos to our local mirrors.
>>>>>
>>>>> From the shell:
>>>>> - I can wget any URL (local or internet) from command line without a
>>>>> problem.
>>>>> - In particular I can wget the file 'setrepos.R' from command line.
>>>>>
>>>>> Symptoms:
>>>>> - with options(download.file.method='wget'), I can download any URL
>>>>> (local or internet) with download.file
>>>>> - I _cannot_ source any local or internet URL if http_proxy is set. It
>>>>> simply freezes. Using internet.info=0 gives the following messages:
>>>>> ############
>>>>> Warning messages:
>>>>> 1: In file(file, "r", encoding = encoding) :
>>>>> using HTTP proxy 'http://localhost:8080/'
>>>>> 2: In file(file, "r", encoding = encoding) :
>>>>> connected to 'localhost' on port 8080.
>>>>> 3: In file(file, "r", encoding = encoding) :
>>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0
>>>>> Host: *OUR_HOST*
>>>>> Pragma: no-cache
>>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>>>>
>>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK
>>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1 SRVWINTMG004
>>>>> 6: In file(file, "r", encoding = encoding) : <- Connection: Keep-Alive
>>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection:
>>>>> Keep-Alive
>>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597
>>>>> 9: In file(file, "r", encoding = encoding) :
>>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT
>>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type:
>>>>> text/plain
>>>>> 11: In file(file, "r", encoding = encoding) :
>>>>> <- ETag: "30b8018-63d-4a627b821c980"
>>>>> 12: In file(file, "r", encoding = encoding) :
>>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1 PHP/5.2.6-2ubuntu4.6
>>>>> with Suhosin-Patch mod_python/3.3.1 Python/2.5.2 mod_ssl/2.2.9
>>>>> OpenSSL/0.9.8g mod_perl/2.0.4 Perl/v5.10.0
>>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges: bytes
>>>>> 14: In file(file, "r", encoding = encoding) :
>>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT
>>>>> 15: In file(file, "r", encoding = encoding) : Code 200, content-type
>>>>> 'text/plain'
>>>>> ############
>>>>>
>>>>> - Setting options(download.file.method='wget') before sourcing does not
>>>>> change the behaviour.
>>>>> - However, I can source any local URL if http_proxy='', without changing
>>>>> download.file.method. But then download.file does not work for internet
>>>>> URL any more since the proxy settings are wrong. I could set
>>>>> http_proxy='', then source, then restore the proxy settings and set
>>>>> options(download.file.method='wget'). But this is just a work around and
>>>>> I would like to understand what is going on.
>>>>>
>>>>> Session Info:
>>>>>
>>>>> R version 2.13.2 (2011-09-30)
>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C
>>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8
>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C
>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C
>>>>>
>>>>> attached base packages:
>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>
>>>>> other attached packages:
>>>>> [1] devtools_0.4
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] RCurl_1.6-10 tools_2.13.2
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Renaud Gaujoux
>>>>> Computational Biology - University of Cape Town
>>>>> South Africa
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ###
>>>>>
>>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies
>>>>> and e-mai...{{dropped:5}}
>>>>>
>>>>> ______________________________________________
>>>>> R-help at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>> PLEASE do read the posting guide
>>>>> http://www.R-project.org/posting-guide.html
>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>
>>>>
>>>
>>>
>>>
>>> ###
>>>
>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT policies and
>>> e-mail disclaimer published on our website at
>>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from
>>> +27 21 650 9111. This e-mail is intended only for the person(s) to whom it
>>> is addressed. If the e-mail has reached you in error, please notify the
>>> author. If you are not the intended recipient of the e-mail you may not
>>> use, disclose, copy, redirect or print the content. If this e-mail is not
>>> related to the business of UCT it is sent by the sender in the sender's
>>> individual capacity.
>>>
>>> ###
>>>
>>>
>>
>
>
>
> ###
>
> UNIVERSITY OF CAPE TOWN
> This e-mail is subject to the UCT ICT policies and e-mail disclaimer
> published on our website at
> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable from +27
> 21 650 9111. This e-mail is intended only for the person(s) to whom it is
> addressed. If the e-mail has reached you in error, please notify the author.
> If you are not the intended recipient of the e-mail you may not use,
> disclose, copy, redirect or print the content. If this e-mail is not related
> to the business of UCT it is sent by the sender in the sender's individual
> capacity.
>
> ###
>
>
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-help
mailing list