[R] Behaviour of 'source' with URLs and proxy
Renaud Gaujoux
renaud at mancala.cbio.uct.ac.za
Wed Oct 5 15:07:24 CEST 2011
So source() always reads a URL using the internal method, because it
reads them chunk by chunk, and I suppose the other methods of
download.file (wget, etc...) do not support (?).
I guess the only way of finding out where the reading process gets stuck
is to get into the C code and add more tracking messages. Will try this.
Thank you.
On 05/10/2011 14:49, Prof Brian Ripley wrote:
> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>
>>
>> On 05/10/2011 13:45, Prof Brian Ripley wrote:
>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>
>>>> From the help page ?file I -- had -- read the following:
>>>>
>>>> "For ‘url’ the description is a complete URL, including scheme
>>>> (such as ‘http://’, ‘ftp://’ or ‘file://’). Proxies can be
>>>> specified for HTTP and FTP ‘url’ connections: see ‘download.file’."
>>>
>>> So you should have known that it was the same as url()!
>>
>> I agree. I just thought -- incorrectly -- that any attempt to
>> download a file from R would eventually call the same C code as
>> download.file. Or maybe
>
> It does. But download.file(method="wget") does not call that C code ....
>
>> source() does not download and source, but reads the file on the fly?
>
> That is true too, but then downloading a file is always done in chunks.
>
>>>
>>>> From the internet.info messages it seems that the proxy is actually
>>>> used, but somehow differently than what download.file does (via wget).
>>>
>>> No, somewhat differently than *wget* does. As that help page says,
>>> the section on proxies only refers to the internal method.
>>>
>>>> Is source supposed to work through a proxy?
>>>
>>> Yes, and it has been tested to do so. But not tested on your proxy
>>> ....
>>
>> OK, I agree that my settings look special, but in the end it is
>> supposed to be a plain local proxy with no authentication.
>> The proxy is effectively used by the internal method and, from the
>> messages (below), the remote file is opened, http headers are
>> returned, but nothing else happens and I have to cancel the command
>> (Ctrl-C).
>>
>> This is where I would like to have some input, so that I can work out
>> the issue.
>> I tried to go through the C code for internet with no great luck:
>> seems that in_R_HTTPRead and RxmlNanoHTTPRead would the place to look
>> at.
>>
>> Any idea on what would cause these functions to hang (infinite loop,
>> communication problem, ...)?
>> I know, I am too curious.
>>
>> Thank you
>>
>>
>>> Sys.getenv('http_proxy')
>> [1] "http://localhost:8080/"
>>> Sys.getenv('no_proxy')
>> [1] "localhost,127.0.0.0/8,*.local"
>>> options(internet.info=0)
>>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt")
>> trying URL 'http://lib.stat.cmu.edu/datasets/csb/ch3a.txt'
>> Content type 'text/plain' length 1209 bytes
>> opened URL
>> ^C
>> There were 15 warnings (use warnings() to see them)
>>> warnings()
>> Warning messages:
>> 1: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> connected to 'localhost' on port 8080.
>> 2: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> -> (Proxy) GET http://lib.stat.cmu.edu/datasets/csb/ch3a.txt HTTP/1.0
>> Host: lib.stat.cmu.edu
>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>
>> 3: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- HTTP/1.1 200 OK
>> 4: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- Via: 1.1 SRVWINTMG003, 1.1 SRVWINTMG004
>> 5: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- Connection: Keep-Alive
>> 6: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- Proxy-Connection: Keep-Alive
>> 7: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- Content-Length: 1209
>> 8: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- Age: 747
>> 9: In download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt",
>> ... :
>> <- Date: Wed, 05 Oct 2011 11:52:25 GMT
>> 10: In
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
>> <- Content-Type: text/plain
>> 11: In
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
>> <- ETag: "5c700f3-4b9-399383c0"
>> 12: In
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
>> <- Server: Apache
>> 13: In
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
>> <- Accept-Ranges: bytes
>> 14: In
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
>> <- Last-Modified: Fri, 29 Jul 1994 14:21:11 GMT
>> 15: In
>> download.file("http://lib.stat.cmu.edu/datasets/csb/ch3a.txt", ... :
>> Code 200, content-type 'text/plain'
>>
>>>
>>>
>>>>
>>>> --
>>>> Renaud Gaujoux
>>>> Computational Biology - University of Cape Town
>>>> South Africa
>>>>
>>>>
>>>> On 05/10/2011 12:26, Prof Brian Ripley wrote:
>>>>> On Wed, 5 Oct 2011, Renaud Gaujoux wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I am having troubles sourcing a file from our local network from R.
>>>>>> It looks like this file are not properly accessed by 'source',
>>>>>> even they can be downloaded with download.file. (See below my
>>>>>> settings and some tests I did). I ended up with a work around,
>>>>>> but I would like to understand what is going on.
>>>>>>
>>>>>> Doesn't source/readLines uses the same mechanism as download.file
>>>>>> to access URLs?
>>>>>
>>>>> No. They use url() connections. See ?file.
>>>>>
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Renaud
>>>>>>
>>>>>> My setting:
>>>>>> - I am using R 2.13.2 on Ubuntu 11.04.
>>>>>> - I am accessing internet through a proxy (set up with cntlm, not
>>>>>> sure if this is the issue but I don't know how to check without
>>>>>> it). This means that http_proxy='http://localhost:8080/'.
>>>>>> - We have local CRNA/BioConductor mirrors that can be accessed
>>>>>> without going through the proxy.
>>>>>> - My .Rprofile sources a file 'setrepos.R' on the local network,
>>>>>> that sets all relevant repos to our local mirrors.
>>>>>>
>>>>>> From the shell:
>>>>>> - I can wget any URL (local or internet) from command line
>>>>>> without a problem.
>>>>>> - In particular I can wget the file 'setrepos.R' from command line.
>>>>>>
>>>>>> Symptoms:
>>>>>> - with options(download.file.method='wget'), I can download any
>>>>>> URL (local or internet) with download.file
>>>>>> - I _cannot_ source any local or internet URL if http_proxy is
>>>>>> set. It simply freezes. Using internet.info=0 gives the following
>>>>>> messages:
>>>>>> ############
>>>>>> Warning messages:
>>>>>> 1: In file(file, "r", encoding = encoding) :
>>>>>> using HTTP proxy 'http://localhost:8080/'
>>>>>> 2: In file(file, "r", encoding = encoding) :
>>>>>> connected to 'localhost' on port 8080.
>>>>>> 3: In file(file, "r", encoding = encoding) :
>>>>>> -> (Proxy) GET http://*OUR_HOST*/~renaud/R/setrepos.R HTTP/1.0
>>>>>> Host: *OUR_HOST*
>>>>>> Pragma: no-cache
>>>>>> User-Agent: R (2.13.2 x86_64-pc-linux-gnu x86_64 linux-gnu)
>>>>>>
>>>>>> 4: In file(file, "r", encoding = encoding) : <- HTTP/1.1 200 OK
>>>>>> 5: In file(file, "r", encoding = encoding) : <- Via: 1.1
>>>>>> SRVWINTMG004
>>>>>> 6: In file(file, "r", encoding = encoding) : <- Connection:
>>>>>> Keep-Alive
>>>>>> 7: In file(file, "r", encoding = encoding) : <- Proxy-Connection:
>>>>>> Keep-Alive
>>>>>> 8: In file(file, "r", encoding = encoding) : <- Content-Length: 1597
>>>>>> 9: In file(file, "r", encoding = encoding) :
>>>>>> <- Date: Wed, 05 Oct 2011 06:43:13 GMT
>>>>>> 10: In file(file, "r", encoding = encoding) : <- Content-Type:
>>>>>> text/plain
>>>>>> 11: In file(file, "r", encoding = encoding) :
>>>>>> <- ETag: "30b8018-63d-4a627b821c980"
>>>>>> 12: In file(file, "r", encoding = encoding) :
>>>>>> <- Server: Apache/2.2.9 (Ubuntu) DAV/2 SVN/1.5.1
>>>>>> PHP/5.2.6-2ubuntu4.6 with Suhosin-Patch mod_python/3.3.1
>>>>>> Python/2.5.2 mod_ssl/2.2.9 OpenSSL/0.9.8g mod_perl/2.0.4
>>>>>> Perl/v5.10.0
>>>>>> 13: In file(file, "r", encoding = encoding) : <- Accept-Ranges:
>>>>>> bytes
>>>>>> 14: In file(file, "r", encoding = encoding) :
>>>>>> <- Last-Modified: Mon, 20 Jun 2011 17:03:50 GMT
>>>>>> 15: In file(file, "r", encoding = encoding) : Code 200,
>>>>>> content-type 'text/plain'
>>>>>> ############
>>>>>>
>>>>>> - Setting options(download.file.method='wget') before sourcing
>>>>>> does not change the behaviour.
>>>>>> - However, I can source any local URL if http_proxy='', without
>>>>>> changing download.file.method. But then download.file does not
>>>>>> work for internet URL any more since the proxy settings are
>>>>>> wrong. I could set http_proxy='', then source, then restore the
>>>>>> proxy settings and set options(download.file.method='wget'). But
>>>>>> this is just a work around and I would like to understand what is
>>>>>> going on.
>>>>>>
>>>>>> Session Info:
>>>>>>
>>>>>> R version 2.13.2 (2011-09-30)
>>>>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>>>>
>>>>>> locale:
>>>>>> [1] LC_CTYPE=en_ZA.UTF-8 LC_NUMERIC=C
>>>>>> [3] LC_TIME=en_ZA.UTF-8 LC_COLLATE=en_ZA.UTF-8
>>>>>> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
>>>>>> [7] LC_PAPER=en_ZA.UTF-8 LC_NAME=C
>>>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>>>> [11] LC_MEASUREMENT=en_ZA.UTF-8 LC_IDENTIFICATION=C
>>>>>>
>>>>>> attached base packages:
>>>>>> [1] stats graphics grDevices utils datasets methods base
>>>>>>
>>>>>> other attached packages:
>>>>>> [1] devtools_0.4
>>>>>>
>>>>>> loaded via a namespace (and not attached):
>>>>>> [1] RCurl_1.6-10 tools_2.13.2
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Renaud Gaujoux
>>>>>> Computational Biology - University of Cape Town
>>>>>> South Africa
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ###
>>>>>>
>>>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT
>>>>>> policies and e-mai...{{dropped:5}}
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> ###
>>>>
>>>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT
>>>> policies and e-mail disclaimer published on our website at
>>>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable
>>>> from +27 21 650 9111. This e-mail is intended only for the
>>>> person(s) to whom it is addressed. If the e-mail has reached you in
>>>> error, please notify the author. If you are not the intended
>>>> recipient of the e-mail you may not use, disclose, copy, redirect
>>>> or print the content. If this e-mail is not related to the business
>>>> of UCT it is sent by the sender in the sender's individual capacity.
>>>>
>>>> ###
>>>>
>>>>
>>>
>>
>>
>>
>> ###
>>
>> UNIVERSITY OF CAPE TOWN This e-mail is subject to the UCT ICT
>> policies and e-mail disclaimer published on our website at
>> http://www.uct.ac.za/about/policies/emaildisclaimer/ or obtainable
>> from +27 21 650 9111. This e-mail is intended only for the person(s)
>> to whom it is addressed. If the e-mail has reached you in error,
>> please notify the author. If you are not the intended recipient of
>> the e-mail you may not use, disclose, copy, redirect or print the
>> content. If this e-mail is not related to the business of UCT it is
>> sent by the sender in the sender's individual capacity.
>>
>> ###
>>
>>
>
###
UNIVERSITY OF CAPE TOWN
This e-mail is subject to the UCT ICT policies and e-mai...{{dropped:5}}
More information about the R-help
mailing list