[Rd] Connections to https: URLs -- IE expert help needed
Prof Brian Ripley
ripley at stats.ox.ac.uk
Fri Jan 5 11:00:09 CET 2007
On Mon, 1 Jan 2007, Duncan Temple Lang wrote:
> Kurt Hornik wrote:
>>>>>>> Duncan Temple Lang writes:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA1
>>
>>> Prof Brian Ripley wrote:
>>>> I've added to R-devel the ability to use download.file() and url() to
>>>> https: URLs, *only* if --internet2 is used on Windows.
>>>>
>>>> This uses the Internet Explorer internals, and only works if the
>>>> certificate is accepted (so e.g. does not work for
>>>> https://svn.r-project.org).
>>>>
>>>> Now I use IE (and Windows for that matter) only when really necessary, and
>>>> Firefox has simple ways to permanently accept non-verifiable certificates.
>>>> I would be grateful if someone who is much more familiar with IE could
>>>> write a note explaining how to deal with this that we could add to the
>>>> rw-FAQ.
>>>>
>>>> To forestall the inevitable question: there are no plans to add https:
>>>> support on any other platform, but it is something that would make a nice
>>>> project for a user contribution. The current internal code is based on
>>>> likxml2, and that AFAICS still does not have https: support.
>>>>
>>
>>> Generally (i.e. not in particular response to Brian but related to
>>> this thread)
>>
>> With a similar disclaimer: Brian's efforts were triggered by me asking
>> how to use url() to read R's mailing list archive files, such as
>>
>> https://stat.ethz.ch/pipermail/r-help/2007-January.txt.gz
>>
>> directly into R. Turns out we cannot ... which, in a way, is a shame
>> ("R cannot read its own web pages") :-(
>
> Indeed, it is a shame. Although, when I process mail messages,
> I use Perl's very rich collection of modules for processing
> mail in so many different formats. And then I use RSPerl
> to control this and get the data into R pretty quickly.
> So we can do it in R and probably the delegation to
> mail-processing software is a good given the number of special
> cases, etc.
>
> And even if we had HTTPs in R, we would still want to deal with
> the certificate on that page, which gets us to more details.
> Which is the reason I think leaving things to libcurl,
> libwww, etc. will be best as they continue to evolve
> to handle new protocols and settings.
The issue here is the same as it ever was, that of event-loops and not
blocking the R process. I think that is where the missing extensibility
is, and it has been raised for at least 6 years now.
If I try to get that example URI with RCurl it
1) blocks the R process for a long time.
2) fails to retrieve the URI as it is unable to handle the certificate.
Can you please point us to an extension package that behaves better?
[When Kurt first sent me the example, I was surprised that wget handled
it. I then checked, and wget < 1.10 does not check certificates at all.]
--
Brian D. Ripley, ripley at stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595
More information about the R-devel
mailing list