[Rd] Connections to https: URLs -- IE expert help needed

Duncan Temple Lang duncan at wald.ucdavis.edu
Sat Jan 6 23:54:02 CET 2007



Prof Brian Ripley wrote:
> On Mon, 1 Jan 2007, Duncan Temple Lang wrote:
> 
>> Kurt Hornik wrote:
>>>>>>>> Duncan Temple Lang writes:
>>>> -----BEGIN PGP SIGNED MESSAGE-----
>>>> Hash: SHA1
>>>> Prof Brian Ripley wrote:
>>>>> I've added to R-devel the ability to use download.file() and url() to
>>>>> https: URLs, *only* if --internet2 is used on Windows.
>>>>>
>>>>> This uses the Internet Explorer internals, and only works if the
>>>>> certificate is accepted (so e.g. does not work for
>>>>> https://svn.r-project.org).
>>>>>
>>>>> Now I use IE (and Windows for that matter) only when really necessary, and
>>>>> Firefox has simple ways to permanently accept non-verifiable certificates.
>>>>> I would be grateful if someone who is much more familiar with IE could
>>>>> write a note explaining how to deal with this that we could add to the
>>>>> rw-FAQ.
>>>>>
>>>>> To forestall the inevitable question: there are no plans to add https:
>>>>> support on any other platform, but it is something that would make a nice
>>>>> project for a user contribution.  The current internal code is based on
>>>>> likxml2, and that AFAICS still does not have https: support.
>>>>>
>>>> Generally (i.e. not in particular response to Brian but related to
>>>> this thread)
>>> With a similar disclaimer: Brian's efforts were triggered by me asking
>>> how to use url() to read R's mailing list archive files, such as
>>>
>>>   https://stat.ethz.ch/pipermail/r-help/2007-January.txt.gz
>>>
>>> directly into R.  Turns out we cannot ... which, in a way, is a shame
>>> ("R cannot read its own web pages") :-(
>> Indeed, it is a shame.  Although, when I process mail messages,
>> I use Perl's very rich collection of modules for processing
>> mail in so many different formats. And then I use RSPerl
>> to control this and get the data into R pretty quickly.
>> So we can do it in R and probably the delegation to
>> mail-processing software is a good given the number of special
>> cases, etc.
>>
>> And even if we had HTTPs in R, we would still want to deal with
>> the certificate on that page, which gets us to more details.
>> Which is the reason I think leaving things to libcurl,
>> libwww, etc. will be best as they continue to evolve
>> to handle new protocols and settings.
> 
> The issue here is the same as it ever was, that of event-loops and not 
> blocking the R process.  I think that is where the missing extensibility 
> is, and it has been raised for at least 6 years now.

Of course, that is one area where extensibility is needed.
Attempts have been made to address this generaly over the last 6 
years,
but the architecture of and the focus on the current numerous R 
front-ends is not necessarily ideal for trying to solve this 
properly.

But your sentence suggests that the extensibility of the
connection API is not an issue.  And we don't agree
on that. I think the two issues of extensibility are
relevant. Blocking is important, but not being able
to explore or add new facilities is fundamental
and I believe of immense importance. Extensibility
of the R engine at the system level rather than in the
interpreted language is a major impediment to the evolution
of R, IMHO.

> 
> If I try to get that example URI with RCurl it
> 
> 1) blocks the R process for a long time.
> 2) fails to retrieve the URI as it is unable to handle the certificate.

2) is, as you would put it, "user error" ;-)
You need to tell libcurl what options you want in the request.
Telling it whether to ignore certificates, where the certificates 
are, etc.  are query-specific options.

> 
> Can you please point us to an extension package that behaves better?
> 

Well, as regards point 1), libcurl does have facilities for
non-blocking calls and so does RCurl via the multi_ interface
of libcurl and the function getURIAsynchronous() in RCurl and the
lower-level functions.
  And one could also merge the basic libcurl interface
into our select calls. I seem to recall libwww has features we 
also can manually integrate into our event loop.

The key thing I am trying to get across is that if we
are going to include these things into R and we have
to do things manually, then we should try to integrate
them in an evolvable, extensible manner that leverages
libraries that do things properly.

> [When Kurt first sent me the example, I was surprised that wget handled 
> it. I then checked, and wget < 1.10 does not check certificates at all.]
>



More information about the R-devel mailing list