[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive
Leonardo Collado Torres
lcollado at jhu.edu
Tue May 31 19:55:31 CEST 2016
Hi Michael,
We tried getting things to work with Amazon Cloud Drive (see Abhi's
efforts at https://github.com/nellore/duffel/commits/master). But we
now have the data hosted elsewhere where the links work properly.
I just noted a small mistake on rtracklayer:::expandPath(). See:
> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
[1] FALSE
> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
[1] TRUE
The fix is simple. At
https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
change it to:
expandPath <- function(x) {
if (startsWith(x, "http") | startsWith(x, "ftp"))
expandURL(x)
else path.expand(x)
}
Best,
Leo
On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> I checked in something that tries to find openssl automatically on the Mac.
>
> It looks like AWS is for some reason returning 404 for the HEAD command that
> the UCSC library uses the get info about the file like the content size.
> Same thing happens when I play around in Firefox's developer tools. The
> error response header claims a JSON content type, but no JSON is actually
> sent, so there is no further description of the error. I think this is a bug
> in Amazon.
>
> Seems like for now you'll need to download the file first.
>
> Michael
>
> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
> wrote:
>>
>> Hi Michael,
>>
>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>> the OS X binary would work out of the box.
>>
>> Anyhow, I installed rtracklayer (release) manually and got another
>> error (slightly different message now).
>>
>>
>>
>>
>> $ svn co
>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>> $ R CMD INSTALL rtracklayer
>> Loading required package: colorout
>> * installing to library
>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>> * installing *source* package ‘rtracklayer’ ...
>> checking for pkg-config... /usr/local/bin/pkg-config
>> checking pkg-config is at least version 0.9.0... yes
>> checking for OPENSSL... yes
>> ## more output
>>
>> $ R
>> > library('rtracklayer')
>> > unshorten_url <- function(uri) {
>> + require('RCurl')
>> + opts <- list(
>> + followlocation = TRUE, # resolve redirects
>> + ssl.verifyhost = FALSE, # suppress certain SSL errors
>> + ssl.verifypeer = FALSE,
>> + nobody = TRUE, # perform HEAD request
>> + verbose = FALSE
>> + )
>> + curlhandle <- getCurlHandle(.opts = opts)
>> + getURL(uri, curl = curlhandle)
>> + info <- getCurlInfo(curlhandle)
>> + rm(curlhandle) # release the curlhandle!
>> + info$effective.url
>> + }
>> > url <-
>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>> Loading required package: RCurl
>> Loading required package: bitops
>> > url
>> [1]
>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>> > x <- import.bw(url, as = 'RleList')
>> Error in seqinfo(ranges) : UCSC library operation failed
>> In addition: Warning message:
>> In seqinfo(ranges) :
>> Couldn't open
>>
>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>> > x <-
>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>> Error in seqinfo(ranges) : UCSC library operation failed
>> In addition: Warning messages:
>> 1: In seqinfo(ranges) :
>> TCP non-blocking connect() to content-na.drive.amazonaws.com
>> timed-out in select() after 10000 milliseconds - Cancelling!
>> 2: In seqinfo(ranges) :
>> Couldn't open
>>
>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>> > ## Reproducibility info
>> > message(Sys.time())
>> 2016-05-05 17:38:30
>> > options(width = 120)
>> > devtools::session_info()
>> Session info
>> -----------------------------------------------------------------------------------------------------------
>> setting value
>> version R version 3.3.0 RC (2016-05-01 r70572)
>> system x86_64, darwin13.4.0
>> ui X11
>> language (EN)
>> collate en_US.UTF-8
>> tz America/New_York
>> date 2016-05-05
>>
>> Packages
>> ---------------------------------------------------------------------------------------------------------------
>> package * version date source
>> Biobase 2.32.0 2016-05-04 Bioconductor
>> BiocGenerics * 0.18.0 2016-05-04 Bioconductor
>> BiocParallel 1.6.0 2016-05-04 Bioconductor
>> Biostrings 2.40.0 2016-05-04 Bioconductor
>> bitops * 1.0-6 2013-08-17 CRAN (R 3.3.0)
>> colorout * 1.1-2 2016-05-05 Github
>> (jalvesaq/colorout at 6538970)
>> devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
>> digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
>> GenomeInfoDb * 1.8.0 2016-05-04 Bioconductor
>> GenomicAlignments 1.8.0 2016-05-04 Bioconductor
>> GenomicRanges * 1.24.0 2016-05-04 Bioconductor
>> IRanges * 2.6.0 2016-05-04 Bioconductor
>> memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
>> RCurl * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>> Rsamtools 1.24.0 2016-05-04 Bioconductor
>> rtracklayer * 1.32.0 2016-05-05 Bioconductor
>> S4Vectors * 0.10.0 2016-05-04 Bioconductor
>> SummarizedExperiment 1.2.0 2016-05-04 Bioconductor
>> withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
>> XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>> XVector 0.12.0 2016-05-04 Bioconductor
>> zlibbioc 1.18.0 2016-05-04 Bioconductor
>> >
>>
>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>> > The URL redirection is something I can try to add. For the other error,
>> > you
>> > need to get openssl installed and made visible to pkg-config, so that
>> > rtracklayer finds it during its build process.
>> >
>> > Michael
>> >
>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>> > <lcollado at jhu.edu>
>> > wrote:
>> >>
>> >> Hi Michael,
>> >>
>> >> I have a use case that is similar to
>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>> >> it might need some changes in rtracklayer to work. That's why I'm
>> >> posting it here this time.
>> >>
>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>> >> the web which is in a different type of url than before. Using
>> >> utils::download.file() with the defaults doesn't work, I have to use
>> >> method = 'curl' and extra = '-L'.
>> >>
>> >> More specifically, the original url
>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>> >> effective url
>> >>
>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>> >>
>> >> Now, using the second url with utils::download.file() and default
>> >> methods also doesn't work. It does on the browser though.
>> >>
>> >>
>> >> As you can see, downloading the file doesn't work out of the box.
>> >> Which I guess that it's not surprising that using rtracklayer I get
>> >> errors like:
>> >>
>> >> In seqinfo(ranges) :
>> >> No openssl available in netConnectHttps for
>> >> content-na.drive.amazonaws.com : 443
>> >>
>> >> You can find further details (code and log file) at
>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>> >>
>> >> Thanks,
>> >> Leo
>> >
>> >
>
>
More information about the Bioc-devel
mailing list