[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive
Michael Lawrence
lawrence.michael at gene.com
Tue May 31 20:02:53 CEST 2016
Thanks for pointing out that buglet. Fixed.
On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
<lcollado at jhu.edu> wrote:
> Hi Michael,
>
> We tried getting things to work with Amazon Cloud Drive (see Abhi's
> efforts at https://github.com/nellore/duffel/commits/master). But we
> now have the data hosted elsewhere where the links work properly.
>
> I just noted a small mistake on rtracklayer:::expandPath(). See:
>
>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
> [1] FALSE
>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
> [1] TRUE
>
>
> The fix is simple. At
> https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
> change it to:
>
> expandPath <- function(x) {
> if (startsWith(x, "http") | startsWith(x, "ftp"))
> expandURL(x)
> else path.expand(x)
> }
>
> Best,
> Leo
>
> On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> I checked in something that tries to find openssl automatically on the Mac.
>>
>> It looks like AWS is for some reason returning 404 for the HEAD command that
>> the UCSC library uses the get info about the file like the content size.
>> Same thing happens when I play around in Firefox's developer tools. The
>> error response header claims a JSON content type, but no JSON is actually
>> sent, so there is no further description of the error. I think this is a bug
>> in Amazon.
>>
>> Seems like for now you'll need to download the file first.
>>
>> Michael
>>
>> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
>> wrote:
>>>
>>> Hi Michael,
>>>
>>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>>> the OS X binary would work out of the box.
>>>
>>> Anyhow, I installed rtracklayer (release) manually and got another
>>> error (slightly different message now).
>>>
>>>
>>>
>>>
>>> $ svn co
>>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>>> $ R CMD INSTALL rtracklayer
>>> Loading required package: colorout
>>> * installing to library
>>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>>> * installing *source* package ‘rtracklayer’ ...
>>> checking for pkg-config... /usr/local/bin/pkg-config
>>> checking pkg-config is at least version 0.9.0... yes
>>> checking for OPENSSL... yes
>>> ## more output
>>>
>>> $ R
>>> > library('rtracklayer')
>>> > unshorten_url <- function(uri) {
>>> + require('RCurl')
>>> + opts <- list(
>>> + followlocation = TRUE, # resolve redirects
>>> + ssl.verifyhost = FALSE, # suppress certain SSL errors
>>> + ssl.verifypeer = FALSE,
>>> + nobody = TRUE, # perform HEAD request
>>> + verbose = FALSE
>>> + )
>>> + curlhandle <- getCurlHandle(.opts = opts)
>>> + getURL(uri, curl = curlhandle)
>>> + info <- getCurlInfo(curlhandle)
>>> + rm(curlhandle) # release the curlhandle!
>>> + info$effective.url
>>> + }
>>> > url <-
>>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>>> Loading required package: RCurl
>>> Loading required package: bitops
>>> > url
>>> [1]
>>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>>> > x <- import.bw(url, as = 'RleList')
>>> Error in seqinfo(ranges) : UCSC library operation failed
>>> In addition: Warning message:
>>> In seqinfo(ranges) :
>>> Couldn't open
>>>
>>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>> > x <-
>>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>>> Error in seqinfo(ranges) : UCSC library operation failed
>>> In addition: Warning messages:
>>> 1: In seqinfo(ranges) :
>>> TCP non-blocking connect() to content-na.drive.amazonaws.com
>>> timed-out in select() after 10000 milliseconds - Cancelling!
>>> 2: In seqinfo(ranges) :
>>> Couldn't open
>>>
>>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>> > ## Reproducibility info
>>> > message(Sys.time())
>>> 2016-05-05 17:38:30
>>> > options(width = 120)
>>> > devtools::session_info()
>>> Session info
>>> -----------------------------------------------------------------------------------------------------------
>>> setting value
>>> version R version 3.3.0 RC (2016-05-01 r70572)
>>> system x86_64, darwin13.4.0
>>> ui X11
>>> language (EN)
>>> collate en_US.UTF-8
>>> tz America/New_York
>>> date 2016-05-05
>>>
>>> Packages
>>> ---------------------------------------------------------------------------------------------------------------
>>> package * version date source
>>> Biobase 2.32.0 2016-05-04 Bioconductor
>>> BiocGenerics * 0.18.0 2016-05-04 Bioconductor
>>> BiocParallel 1.6.0 2016-05-04 Bioconductor
>>> Biostrings 2.40.0 2016-05-04 Bioconductor
>>> bitops * 1.0-6 2013-08-17 CRAN (R 3.3.0)
>>> colorout * 1.1-2 2016-05-05 Github
>>> (jalvesaq/colorout at 6538970)
>>> devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
>>> digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
>>> GenomeInfoDb * 1.8.0 2016-05-04 Bioconductor
>>> GenomicAlignments 1.8.0 2016-05-04 Bioconductor
>>> GenomicRanges * 1.24.0 2016-05-04 Bioconductor
>>> IRanges * 2.6.0 2016-05-04 Bioconductor
>>> memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
>>> RCurl * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>> Rsamtools 1.24.0 2016-05-04 Bioconductor
>>> rtracklayer * 1.32.0 2016-05-05 Bioconductor
>>> S4Vectors * 0.10.0 2016-05-04 Bioconductor
>>> SummarizedExperiment 1.2.0 2016-05-04 Bioconductor
>>> withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
>>> XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>> XVector 0.12.0 2016-05-04 Bioconductor
>>> zlibbioc 1.18.0 2016-05-04 Bioconductor
>>> >
>>>
>>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>>> <lawrence.michael at gene.com> wrote:
>>> > The URL redirection is something I can try to add. For the other error,
>>> > you
>>> > need to get openssl installed and made visible to pkg-config, so that
>>> > rtracklayer finds it during its build process.
>>> >
>>> > Michael
>>> >
>>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>>> > <lcollado at jhu.edu>
>>> > wrote:
>>> >>
>>> >> Hi Michael,
>>> >>
>>> >> I have a use case that is similar to
>>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>>> >> it might need some changes in rtracklayer to work. That's why I'm
>>> >> posting it here this time.
>>> >>
>>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>>> >> the web which is in a different type of url than before. Using
>>> >> utils::download.file() with the defaults doesn't work, I have to use
>>> >> method = 'curl' and extra = '-L'.
>>> >>
>>> >> More specifically, the original url
>>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>>> >> effective url
>>> >>
>>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>>> >>
>>> >> Now, using the second url with utils::download.file() and default
>>> >> methods also doesn't work. It does on the browser though.
>>> >>
>>> >>
>>> >> As you can see, downloading the file doesn't work out of the box.
>>> >> Which I guess that it's not surprising that using rtracklayer I get
>>> >> errors like:
>>> >>
>>> >> In seqinfo(ranges) :
>>> >> No openssl available in netConnectHttps for
>>> >> content-na.drive.amazonaws.com : 443
>>> >>
>>> >> You can find further details (code and log file) at
>>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>>> >>
>>> >> Thanks,
>>> >> Leo
>>> >
>>> >
>>
>>
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list