[Rd] Issues with libcurl + HTTP status codes (eg. 403, 404)
Kevin Ushey
kevinushey at gmail.com
Wed Aug 26 00:58:55 CEST 2015
(final post, sorry to be spamming everyone all day...)
As kindly pointed out by Martin off-list, I was in fact using an old
version of R-devel (it looks like the binaries provided at
http://r.research.att.com/ are currently stale -- although the page
lists r69167 as the current version, the binaries being distributed
are for r69078).
Building R locally with trunk (r69180) and testing confirms that
errors no longer clobber the whole `install.packages()` process;
having the various download methods respect HTTP status / error codes
when using `libcurl` is still an issue but one I imagine that R-core
is aware of.
Thanks, and apologies again for the spam,
Kevin
On Tue, Aug 25, 2015 at 2:41 PM, Kevin Ushey <kevinushey at gmail.com> wrote:
> In fact, this does reproduce on R-devel:
>
> > options(download.file.method = "libcurl")
> > options(repos = c(CRAN = "https://cran.rstudio.com/", CRANextra =
> + "http://www.stats.ox.ac.uk/pub/RWin"))
> > install.packages("lattice") ## could be any package
> Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
> (as ‘lib’ is unspecified)
> Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>
> > sessionInfo()
> R Under development (unstable) (2015-08-14 r69078)
> Platform: x86_64-apple-darwin13.4.0 (64-bit)
> Running under: OS X 10.10.4 (Yosemite)
>
> I think this could be problematic for users with custom CRAN
> repositories. For example, if I have a CRAN repository that only
> serves source packages (no binary packages), this implies that any R
> session configured to download binary packages would fail to download
> any packages at all (as it would barf on attempting to read the
> non-existent PACKAGES file for the 'binary' branch of the custom
> repository).
>
> This can also be seen by attempting to install a package using current
> R-devel (since no binaries are made available for R 3.3):
>
> > options(download.file.method = "libcurl")
> > options(repos = c(CRAN = "https://cran.rstudio.com/"))
> > print(getOption("pkgType"))
> [1] "both"
> > install.packages("lattice")
> Installing package into ‘/Users/kevinushey/Library/R/3.3/library’
> (as ‘lib’ is unspecified)
> Error in install.packages : Line starting '<!DOCTYPE HTML PUBLI
> ...' is malformed!
>
> The same error (with a different, XML response) is returned when using
> e.g. `https://cran.fhcrc.org`.
>
> Kevin
>
> On Tue, Aug 25, 2015 at 1:33 PM, Martin Morgan <mtmorgan at fredhutch.org> wrote:
>> On 08/25/2015 01:30 PM, Kevin Ushey wrote:
>>>
>>> Hi Martin,
>>>
>>> Indeed it does (and I should have confirmed myself with R-patched and
>>> R-devel
>>> before posting...)
>>
>>
>> actually I don't know that it does -- it addresses the symptom but I think
>> there should be an error from libcurl on the 403 / 404 rather than from
>> read.dcf on error page...
>>
>> Martin
>>
>>
>>>
>>> Thanks, and sorry for the noise.
>>> Kevin
>>>
>>>
>>> On Tue, Aug 25, 2015, 13:11 Martin Morgan <mtmorgan at fredhutch.org
>>> <mailto:mtmorgan at fredhutch.org>> wrote:
>>>
>>> On 08/25/2015 12:54 PM, Kevin Ushey wrote:
>>> > Hi all,
>>> >
>>> > The following fails for me (on OS X, although I imagine it's the
>>> same
>>> > on other platforms using libcurl):
>>> >
>>> > options(download.file.method = "libcurl")
>>> > options(repos = c(CRAN = "https://cran.rstudio.com/",
>>> CRANextra =
>>> > "http://www.stats.ox.ac.uk/pub/RWin"))
>>> > install.packages("lattice") ## could be any package
>>> >
>>> > gives me:
>>> >
>>> > > options(download.file.method = "libcurl")
>>> > > options(repos = c(CRAN = "https://cran.rstudio.com/",
>>> CRANextra
>>> > = "http://www.stats.ox.ac.uk/pub/RWin"))
>>> > > install.packages("lattice") ## coudl be any package
>>> > Installing package into
>>> ‘/Users/kevinushey/Library/R/3.2/library’
>>> > (as ‘lib’ is unspecified)
>>> > Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>>> >
>>> > This seems to come from a call to `available.packages()` to a URL
>>> that
>>> > doesn't exist on the server (likely when querying PACKAGES on the
>>> > CRANextra repo)
>>> >
>>> > Eg.
>>> >
>>> > > URL <- "http://www.stats.ox.ac.uk/pub/RWin"
>>> > > available.packages(URL, method = "internal")
>>> > Warning: unable to access index for repository
>>> > http://www.stats.ox.ac.uk/pub/RWin
>>> > Package Version Priority Depends Imports LinkingTo
>>> Suggests
>>> > Enhances License License_is_FOSS
>>> > License_restricts_use OS_type Archs MD5sum
>>> NeedsCompilation
>>> > File Repository
>>> > > available.packages(URL, method = "libcurl")
>>> > Error: Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>>> >
>>> > It looks like libcurl downloads and retrieves the 403 page itself,
>>> > rather than reporting that it was actually forbidden, e.g.:
>>> >
>>> > >
>>>
>>> download.file("http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz",
>>> > tempfile(), method = "libcurl")
>>> > trying URL
>>>
>>> 'http://www.stats.ox.ac.uk/pub/RWin/bin/macosx/mavericks/contrib/3.2/PACKAGES.gz'
>>> > Content type 'text/html; charset=iso-8859-1' length 339 bytes
>>> > ==================================================
>>> > downloaded 339 bytes
>>> >
>>> > Using `method = "internal"` gives an error related to the inability
>>> to
>>> > access that URL due to the HTTP status 403.
>>> >
>>> > The overarching issue here is that package installation shouldn't
>>> fail
>>> > even if libcurl fails to access one of the repositories set.
>>> >
>>>
>>> With
>>>
>>> > R.version.string
>>> [1] "R version 3.2.2 Patched (2015-08-25 r69179)"
>>>
>>> the behavior is to warn with an indication of the repository for which
>>> the
>>> problem occurs
>>>
>>> > URL <- "http://www.stats.ox.ac.uk/pub/RWin"
>>> > available.packages(URL, method="libcurl")
>>> Warning: unable to access index for repository
>>> http://www.stats.ox.ac.uk/pub/RWin:
>>> Line starting '<!DOCTYPE HTML PUBLI ...' is malformed!
>>> Package Version Priority Depends Imports LinkingTo Suggests
>>> Enhances
>>> License License_is_FOSS License_restricts_use OS_type Archs
>>> MD5sum
>>> NeedsCompilation File Repository
>>> > available.packages(URL, method="internal")
>>> Warning: unable to access index for repository
>>> http://www.stats.ox.ac.uk/pub/RWin:
>>> cannot open URL 'http://www.stats.ox.ac.uk/pub/RWin/PACKAGES'
>>> Package Version Priority Depends Imports LinkingTo Suggests
>>> Enhances
>>> License License_is_FOSS License_restricts_use OS_type Archs
>>> MD5sum
>>> NeedsCompilation File Repository
>>>
>>> Does that work for you / address the problem?
>>>
>>> Martin
>>>
>>> >> sessionInfo()
>>> > R version 3.2.2 (2015-08-14)
>>> > Platform: x86_64-apple-darwin13.4.0 (64-bit)
>>> > Running under: OS X 10.10.4 (Yosemite)
>>> >
>>> > locale:
>>> > [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
>>> >
>>> > attached base packages:
>>> > [1] stats graphics grDevices utils datasets methods
>>> base
>>> >
>>> > other attached packages:
>>> > [1] testthat_0.8.1.0.99 knitr_1.11 devtools_1.5.0.9001
>>> > [4] BiocInstaller_1.15.5
>>> >
>>> > loaded via a namespace (and not attached):
>>> > [1] httr_1.0.0 R6_2.0.0.9000 tools_3.2.2 parallel_3.2.2
>>> whisker_0.3-2
>>> > [6] RCurl_1.95-4.1 memoise_0.2.1 stringr_0.6.2 digest_0.6.4
>>> evaluate_0.7.2
>>> >
>>> > Thanks,
>>> > Kevin
>>> >
>>> > ______________________________________________
>>> > R-devel at r-project.org <mailto:R-devel at r-project.org> mailing list
>>> > https://stat.ethz.ch/mailman/listinfo/r-devel
>>> >
>>>
>>>
>>> --
>>> Computational Biology / Fred Hutchinson Cancer Research Center
>>> 1100 Fairview Ave. N.
>>> PO Box 19024 Seattle, WA 98109
>>>
>>> Location: Arnold Building M1 B861
>>> Phone: (206) 667-2793
>>>
>>
>>
>> --
>> Computational Biology / Fred Hutchinson Cancer Research Center
>> 1100 Fairview Ave. N.
>> PO Box 19024 Seattle, WA 98109
>>
>> Location: Arnold Building M1 B861
>> Phone: (206) 667-2793
More information about the R-devel
mailing list