[Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)

Kevin Ushey kevinushey at gmail.com
Wed Aug 26 00:58:55 CEST 2015


(final post, sorry to be spamming everyone all day...)

As kindly pointed out by Martin off-list, I was in fact using an old
version of R-devel (it looks like the binaries provided at
http://r.research.att.com/ are currently stale -- although the page
lists r69167 as the current version, the binaries being distributed
are for r69078).

Building R locally with trunk (r69180) and testing confirms that
errors no longer clobber the whole `install.packages()` process;
having the various download methods respect HTTP status / error codes
when using `libcurl` is still an issue but one I imagine that R-core
is aware of.

Thanks, and apologies again for the spam,
Kevin

On Tue, Aug 25, 2015 at 2:41 PM, Kevin Ushey <kevinushey at gmail.com> wrote:
> In fact, this does reproduce on R-devel:
>
>     > options(download.file.method = "libcurl")
>     > options(repos = c(CRAN = "https://cran.rstudio.com/", CRANextra =
>     + "http://www.stats.ox.ac.uk/pub/RWin"))
>     > install.packages("lattice") ## could be any package
>     Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
> (as ‘lib’ is unspecified)
>     Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>
>     > sessionInfo()
>     R Under development (unstable) (2015-08-14 r69078)
>     Platform: x86_64-apple-darwin13.4.0 (64-bit)
>     Running under: OS X 10.10.4 (Yosemite)
>
> I think this could be problematic for users with custom CRAN
> repositories. For example, if I have a CRAN repository that only
> serves source packages (no binary packages), this implies that any R
> session configured to download binary packages would fail to download
> any packages at all (as it would barf on attempting to read the
> non-existent PACKAGES file for the 'binary' branch of the custom
> repository).
>
> This can also be seen by attempting to install a package using current
> R-devel (since no binaries are made available for R 3.3):
>
>     > options(download.file.method = "libcurl")
>     > options(repos = c(CRAN = "https://cran.rstudio.com/"))
>     > print(getOption("pkgType"))
>     [1] "both"
>     > install.packages("lattice")
>     Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
>     (as ‘lib’ is unspecified)
>     Error in install.packages : Line starting '<!DOCTYPE HTML PUBLI
> ...' is malformed!
>
> The same error (with a different, XML response) is returned when using
> e.g. `https://cran.fhcrc.org`.
>
> Kevin
>
> On Tue, Aug 25, 2015 at 1:33 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>> On 08/25/2015 01:30 PM, Kevin Ushey wrote:
>>>
>>> Hi Martin,
>>>
>>> Indeed it does (and I should have confirmed myself with R-patched and
>>> R-devel
>>> before posting...)
>>
>>
>> actually I don't know that it does -- it addresses the symptom but I think
>> there should be an error from libcurl on the 403 / 404 rather than from
>> read.dcf on error page...
>>
>> Martin
>>
>>
>>>
>>> Thanks, and sorry for the noise.
>>> Kevin
>>>
>>>
>>> On Tue, Aug 25, 2015, 13:11 Martin Morgan <mtmorgan at fredhutch.org
>>> <mailto:mtmorgan at fredhutch.org>> wrote:
>>>
>>>     On 08/25/2015 12:54 PM, Kevin Ushey wrote:
>>>      > Hi all,
>>>      >
>>>      > The following fails for me (on OS X, although I imagine it's the
>>> same
>>>      > on other platforms using libcurl):
>>>      >
>>>      >      options(download.file.method = "libcurl")
>>>      >      options(repos = c(CRAN = "https://cran.rstudio.com/",
>>> CRANextra =
>>>      > "http://www.stats.ox.ac.uk/pub/RWin"))
>>>      >      install.packages("lattice") ## could be any package
>>>      >
>>>      > gives me:
>>>      >
>>>      >      > options(download.file.method = "libcurl")
>>>      >      > options(repos = c(CRAN = "https://cran.rstudio.com/",
>>> CRANextra
>>>      > = "http://www.stats.ox.ac.uk/pub/RWin"))
>>>      >      > install.packages("lattice") ## coudl be any package
>>>      >      Installing package into
>>> ‘/Users/kevinushey/Library/R/3.2/library’
>>>      >      (as ‘lib’ is unspecified)
>>>      >      Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>>>      >
>>>      > This seems to come from a call to `available.packages()` to a URL
>>> that
>>>      > doesn't exist on the server (likely when querying PACKAGES on the
>>>      > CRANextra repo)
>>>      >
>>>      > Eg.
>>>      >
>>>      >      > URL <- "http://www.stats.ox.ac.uk/pub/RWin"
>>>      >      > available.packages(URL, method = "internal")
>>>      >      Warning: unable to access index for repository
>>>      > http://www.stats.ox.ac.uk/pub/RWin
>>>      >           Package Version Priority Depends Imports LinkingTo
>>> Suggests
>>>      > Enhances License License_is_FOSS
>>>      >          License_restricts_use OS_type Archs MD5sum
>>> NeedsCompilation
>>>      > File Repository
>>>      >      > available.packages(URL, method = "libcurl")
>>>      >      Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>>>      >
>>>      > It looks like libcurl downloads and retrieves the 403 page itself,
>>>      > rather than reporting that it was actually forbidden, e.g.:
>>>      >
>>>      >      >
>>>
>>> download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz",
>>>      > tempfile(), method = "libcurl")
>>>      >      trying URL
>>>
>>> 'http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz'
>>>      >      Content type 'text/html; charset=iso-8859-1' length 339 bytes
>>>      >      ==================================================
>>>      >      downloaded 339 bytes
>>>      >
>>>      > Using `method = "internal"` gives an error related to the inability
>>> to
>>>      > access that URL due to the HTTP status 403.
>>>      >
>>>      > The overarching issue here is that package installation shouldn't
>>> fail
>>>      > even if libcurl fails to access one of the repositories set.
>>>      >
>>>
>>>     With
>>>
>>>       > R.version.string
>>>     [1] "R version 3.2.2 Patched (2015-08-25 r69179)"
>>>
>>>     the behavior is to warn with an indication of the repository for which
>>> the
>>>     problem occurs
>>>
>>>       > URL <- "http://www.stats.ox.ac.uk/pub/RWin"
>>>       > available.packages(URL, method="libcurl")
>>>     Warning: unable to access index for repository
>>>     http://www.stats.ox.ac.uk/pub/RWin:
>>>         Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>>>            Package Version Priority Depends Imports LinkingTo Suggests
>>> Enhances
>>>            License License_is_FOSS License_restricts_use OS_type Archs
>>> MD5sum
>>>            NeedsCompilation File Repository
>>>       > available.packages(URL, method="internal")
>>>     Warning: unable to access index for repository
>>>     http://www.stats.ox.ac.uk/pub/RWin:
>>>         cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/PACKAGES'
>>>            Package Version Priority Depends Imports LinkingTo Suggests
>>> Enhances
>>>            License License_is_FOSS License_restricts_use OS_type Archs
>>> MD5sum
>>>            NeedsCompilation File Repository
>>>
>>>     Does that work for you / address the problem?
>>>
>>>     Martin
>>>
>>>      >> sessionInfo()
>>>      > R version 3.2.2 (2015-08-14)
>>>      > Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>>      > Running under: OS X 10.10.4 (Yosemite)
>>>      >
>>>      > locale:
>>>      > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>>      >
>>>      > attached base packages:
>>>      > [1] stats     graphics  grDevices utils     datasets  methods
>>> base
>>>      >
>>>      > other attached packages:
>>>      > [1] testthat_0.8.1.0.99  knitr_1.11           devtools_1.5.0.9001
>>>      > [4] BiocInstaller_1.15.5
>>>      >
>>>      > loaded via a namespace (and not attached):
>>>      >   [1] httr_1.0.0     R6_2.0.0.9000  tools_3.2.2    parallel_3.2.2
>>>     whisker_0.3-2
>>>      >   [6] RCurl_1.95-4.1 memoise_0.2.1  stringr_0.6.2  digest_0.6.4
>>>       evaluate_0.7.2
>>>      >
>>>      > Thanks,
>>>      > Kevin
>>>      >
>>>      > ______________________________________________
>>>      > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>>>      > https://stat.ethz.ch/mailman/listinfo/r-devel
>>>      >
>>>
>>>
>>>     --
>>>     Computational Biology / Fred Hutchinson Cancer Research Center
>>>     1100 Fairview Ave. N.
>>>     PO Box 19024 Seattle, WA 98109
>>>
>>>     Location: Arnold Building M1 B861
>>>     Phone: (206) 667-2793
>>>
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793



More information about the R-devel mailing list