[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive
Leonardo Collado Torres
lcollado at jhu.edu
Tue May 31 23:31:12 CEST 2016
Awesome, thanks!
On Tue, May 31, 2016 at 4:11 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Sure, done.
>
> On Tue, May 31, 2016 at 11:18 AM, Leonardo Collado Torres
> <lcollado at jhu.edu> wrote:
>> Hi Michael,
>>
>> Thanks!
>>
>> Actually, it looks like there are a few more quick changes I need you
>> to do. Simply at
>> https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R
>> replace path.expand() with expandPath(). I'm not sure this applies to
>> all current path.expand() calls, but at least it does for
>> https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20
>>
>> Best,
>> Leo
>>
>>
>>
>>
>>> library(recount); system.time( regions <- expressed_regions('SRP009615', 'chrY', cutoff = 5L) )
>> 2016-05-31 14:11:52 loadCoverage: loading BigWig file
>> http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
>> Error in seqinfo(con) : UCSC library operation failed
>> In addition: Warning message:
>> In seqinfo(con) :
>> Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
>> Timing stopped at: 0.068 0.009 0.817
>>> traceback()
>> 14: .Call(BWGFile_seqlengths, path.expand(path(x)))
>> 13: seqinfo(con)
>> 12: seqinfo(con)
>> 11: .local(con, format, text, ...)
>> 10: import(file, selection = range, as = "RleList")
>> 9: import(file, selection = range, as = "RleList")
>> 8: FUN(X[[i]], ...)
>> 7: lapply(as.list(X), FUN = FUN, ...)
>> 6: lapply(as.list(X), FUN = FUN, ...)
>> 5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
>> verbose = verbose)
>> 4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
>> verbose = verbose)
>> 3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
>> 2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
>> 1: system.time(regions <- expressed_regions("SRP009615", "chrY",
>> cutoff = 5L))
>>> options(width = 120); devtools::session_info()
>> Session info -----------------------------------------------------------------------------------------------------------
>> setting value
>> version R version 3.3.0 RC (2016-05-01 r70572)
>> system x86_64, darwin13.4.0
>> ui AQUA
>> language (EN)
>> collate en_US.UTF-8
>> tz America/New_York
>> date 2016-05-31
>>
>> Packages ---------------------------------------------------------------------------------------------------------------
>> package * version date source
>> acepack 1.3-3.3 2014-11-24 CRAN (R 3.3.0)
>> AnnotationDbi 1.35.3 2016-05-27 Bioconductor
>> Biobase 2.33.0 2016-05-05 Bioconductor
>> BiocGenerics * 0.19.0 2016-05-05 Bioconductor
>> BiocParallel 1.7.2 2016-05-20 Bioconductor
>> biomaRt 2.29.2 2016-05-30 Bioconductor
>> Biostrings 2.41.1 2016-05-27 Bioconductor
>> bitops 1.0-6 2013-08-17 CRAN (R 3.3.0)
>> BSgenome 1.41.0 2016-05-05 Bioconductor
>> bumphunter 1.13.0 2016-05-05 Bioconductor
>> chron 2.3-47 2015-06-24 CRAN (R 3.3.0)
>> cluster 2.0.4 2016-04-18 CRAN (R 3.3.0)
>> codetools 0.2-14 2015-07-15 CRAN (R 3.3.0)
>> colorspace 1.2-6 2015-03-11 CRAN (R 3.3.0)
>> data.table 1.9.6 2015-09-19 CRAN (R 3.3.0)
>> DBI 0.4-1 2016-05-08 CRAN (R 3.3.0)
>> derfinder * 1.7.5 2016-05-20 Bioconductor
>> derfinderHelper 1.7.3 2016-05-20 Bioconductor
>> devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
>> digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
>> doRNG 1.6 2014-03-07 CRAN (R 3.3.0)
>> foreach 1.4.3 2015-10-13 CRAN (R 3.3.0)
>> foreign 0.8-66 2015-08-19 CRAN (R 3.3.0)
>> Formula 1.2-1 2015-04-07 CRAN (R 3.3.0)
>> GenomeInfoDb * 1.9.1 2016-05-13 Bioconductor
>> GenomicAlignments 1.9.0 2016-05-05 Bioconductor
>> GenomicFeatures 1.25.12 2016-05-21 Bioconductor
>> GenomicFiles 1.9.7 2016-05-27 Bioconductor
>> GenomicRanges * 1.25.0 2016-05-05 Bioconductor
>> ggplot2 2.1.0 2016-03-01 CRAN (R 3.3.0)
>> gridExtra 2.2.1 2016-02-29 CRAN (R 3.3.0)
>> gtable 0.2.0 2016-02-26 CRAN (R 3.3.0)
>> Hmisc 3.17-4 2016-05-02 CRAN (R 3.3.0)
>> IRanges * 2.7.1 2016-05-27 Bioconductor
>> iterators 1.0.8 2015-10-13 CRAN (R 3.3.0)
>> lattice 0.20-33 2015-07-14 CRAN (R 3.3.0)
>> latticeExtra 0.6-28 2016-02-09 CRAN (R 3.3.0)
>> locfit 1.5-9.1 2013-04-20 CRAN (R 3.3.0)
>> magrittr 1.5 2014-11-22 CRAN (R 3.3.0)
>> Matrix 1.2-6 2016-05-02 CRAN (R 3.3.0)
>> matrixStats 0.50.2 2016-04-24 CRAN (R 3.3.0)
>> memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
>> munsell 0.4.3 2016-02-13 CRAN (R 3.3.0)
>> nnet 7.3-12 2016-02-02 CRAN (R 3.3.0)
>> pkgmaker 0.22 2014-05-14 CRAN (R 3.3.0)
>> plyr 1.8.3 2015-06-12 CRAN (R 3.3.0)
>> qvalue 2.5.2 2016-05-20 Bioconductor
>> RColorBrewer 1.1-2 2014-12-07 CRAN (R 3.3.0)
>> Rcpp 0.12.5 2016-05-14 CRAN (R 3.3.0)
>> RCurl 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>> recount * 0.99.0 2016-05-31 Bioconductor
>> registry 0.3 2015-07-08 CRAN (R 3.3.0)
>> reshape2 1.4.1 2014-12-06 CRAN (R 3.3.0)
>> rngtools 1.2.4 2014-03-06 CRAN (R 3.3.0)
>> rpart 4.1-10 2015-06-29 CRAN (R 3.3.0)
>> Rsamtools 1.25.0 2016-05-05 Bioconductor
>> RSQLite 1.0.0 2014-10-25 CRAN (R 3.3.0)
>> rtracklayer 1.33.2 2016-05-31 Github
>> (Bioconductor-mirror/rtracklayer at 917973e)
>> S4Vectors * 0.11.2 2016-05-27 Bioconductor
>> scales 0.4.0 2016-02-26 CRAN (R 3.3.0)
>> stringi 1.0-1 2015-10-22 CRAN (R 3.3.0)
>> stringr 1.0.0 2015-04-30 CRAN (R 3.3.0)
>> SummarizedExperiment 1.3.2 2016-05-20 Bioconductor
>> survival 2.39-4 2016-05-11 CRAN (R 3.3.0)
>> VariantAnnotation 1.19.1 2016-05-20 Bioconductor
>> withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
>> XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>> xtable 1.8-2 2016-02-05 CRAN (R 3.3.0)
>> XVector 0.13.0 2016-05-05 Bioconductor
>> zlibbioc 1.19.0 2016-05-05 Bioconductor
>>
>>
>>
>>
>> On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>>> Thanks for pointing out that buglet. Fixed.
>>>
>>> On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
>>> <lcollado at jhu.edu> wrote:
>>>> Hi Michael,
>>>>
>>>> We tried getting things to work with Amazon Cloud Drive (see Abhi's
>>>> efforts at https://github.com/nellore/duffel/commits/master). But we
>>>> now have the data hosted elsewhere where the links work properly.
>>>>
>>>> I just noted a small mistake on rtracklayer:::expandPath(). See:
>>>>
>>>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
>>>> [1] FALSE
>>>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
>>>> [1] TRUE
>>>>
>>>>
>>>> The fix is simple. At
>>>> https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
>>>> change it to:
>>>>
>>>> expandPath <- function(x) {
>>>> if (startsWith(x, "http") | startsWith(x, "ftp"))
>>>> expandURL(x)
>>>> else path.expand(x)
>>>> }
>>>>
>>>> Best,
>>>> Leo
>>>>
>>>> On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
>>>> <lawrence.michael at gene.com> wrote:
>>>>> I checked in something that tries to find openssl automatically on the Mac.
>>>>>
>>>>> It looks like AWS is for some reason returning 404 for the HEAD command that
>>>>> the UCSC library uses the get info about the file like the content size.
>>>>> Same thing happens when I play around in Firefox's developer tools. The
>>>>> error response header claims a JSON content type, but no JSON is actually
>>>>> sent, so there is no further description of the error. I think this is a bug
>>>>> in Amazon.
>>>>>
>>>>> Seems like for now you'll need to download the file first.
>>>>>
>>>>> Michael
>>>>>
>>>>> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
>>>>> wrote:
>>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>>>>>> the OS X binary would work out of the box.
>>>>>>
>>>>>> Anyhow, I installed rtracklayer (release) manually and got another
>>>>>> error (slightly different message now).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> $ svn co
>>>>>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>>>>>> $ R CMD INSTALL rtracklayer
>>>>>> Loading required package: colorout
>>>>>> * installing to library
>>>>>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>>>>>> * installing *source* package ‘rtracklayer’ ...
>>>>>> checking for pkg-config... /usr/local/bin/pkg-config
>>>>>> checking pkg-config is at least version 0.9.0... yes
>>>>>> checking for OPENSSL... yes
>>>>>> ## more output
>>>>>>
>>>>>> $ R
>>>>>> > library('rtracklayer')
>>>>>> > unshorten_url <- function(uri) {
>>>>>> + require('RCurl')
>>>>>> + opts <- list(
>>>>>> + followlocation = TRUE, # resolve redirects
>>>>>> + ssl.verifyhost = FALSE, # suppress certain SSL errors
>>>>>> + ssl.verifypeer = FALSE,
>>>>>> + nobody = TRUE, # perform HEAD request
>>>>>> + verbose = FALSE
>>>>>> + )
>>>>>> + curlhandle <- getCurlHandle(.opts = opts)
>>>>>> + getURL(uri, curl = curlhandle)
>>>>>> + info <- getCurlInfo(curlhandle)
>>>>>> + rm(curlhandle) # release the curlhandle!
>>>>>> + info$effective.url
>>>>>> + }
>>>>>> > url <-
>>>>>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>>>>>> Loading required package: RCurl
>>>>>> Loading required package: bitops
>>>>>> > url
>>>>>> [1]
>>>>>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>>>>>> > x <- import.bw(url, as = 'RleList')
>>>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>>>> In addition: Warning message:
>>>>>> In seqinfo(ranges) :
>>>>>> Couldn't open
>>>>>>
>>>>>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>>>> > x <-
>>>>>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>>>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>>>> In addition: Warning messages:
>>>>>> 1: In seqinfo(ranges) :
>>>>>> TCP non-blocking connect() to content-na.drive.amazonaws.com
>>>>>> timed-out in select() after 10000 milliseconds - Cancelling!
>>>>>> 2: In seqinfo(ranges) :
>>>>>> Couldn't open
>>>>>>
>>>>>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>>>> > ## Reproducibility info
>>>>>> > message(Sys.time())
>>>>>> 2016-05-05 17:38:30
>>>>>> > options(width = 120)
>>>>>> > devtools::session_info()
>>>>>> Session info
>>>>>> -----------------------------------------------------------------------------------------------------------
>>>>>> setting value
>>>>>> version R version 3.3.0 RC (2016-05-01 r70572)
>>>>>> system x86_64, darwin13.4.0
>>>>>> ui X11
>>>>>> language (EN)
>>>>>> collate en_US.UTF-8
>>>>>> tz America/New_York
>>>>>> date 2016-05-05
>>>>>>
>>>>>> Packages
>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>> package * version date source
>>>>>> Biobase 2.32.0 2016-05-04 Bioconductor
>>>>>> BiocGenerics * 0.18.0 2016-05-04 Bioconductor
>>>>>> BiocParallel 1.6.0 2016-05-04 Bioconductor
>>>>>> Biostrings 2.40.0 2016-05-04 Bioconductor
>>>>>> bitops * 1.0-6 2013-08-17 CRAN (R 3.3.0)
>>>>>> colorout * 1.1-2 2016-05-05 Github
>>>>>> (jalvesaq/colorout at 6538970)
>>>>>> devtools 1.11.1 2016-04-21 CRAN (R 3.3.0)
>>>>>> digest 0.6.9 2016-01-08 CRAN (R 3.3.0)
>>>>>> GenomeInfoDb * 1.8.0 2016-05-04 Bioconductor
>>>>>> GenomicAlignments 1.8.0 2016-05-04 Bioconductor
>>>>>> GenomicRanges * 1.24.0 2016-05-04 Bioconductor
>>>>>> IRanges * 2.6.0 2016-05-04 Bioconductor
>>>>>> memoise 1.0.0 2016-01-29 CRAN (R 3.3.0)
>>>>>> RCurl * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>>>>> Rsamtools 1.24.0 2016-05-04 Bioconductor
>>>>>> rtracklayer * 1.32.0 2016-05-05 Bioconductor
>>>>>> S4Vectors * 0.10.0 2016-05-04 Bioconductor
>>>>>> SummarizedExperiment 1.2.0 2016-05-04 Bioconductor
>>>>>> withr 1.0.1 2016-02-04 CRAN (R 3.3.0)
>>>>>> XML 3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>>>>> XVector 0.12.0 2016-05-04 Bioconductor
>>>>>> zlibbioc 1.18.0 2016-05-04 Bioconductor
>>>>>> >
>>>>>>
>>>>>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>>>>>> <lawrence.michael at gene.com> wrote:
>>>>>> > The URL redirection is something I can try to add. For the other error,
>>>>>> > you
>>>>>> > need to get openssl installed and made visible to pkg-config, so that
>>>>>> > rtracklayer finds it during its build process.
>>>>>> >
>>>>>> > Michael
>>>>>> >
>>>>>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>>>>>> > <lcollado at jhu.edu>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Hi Michael,
>>>>>> >>
>>>>>> >> I have a use case that is similar to
>>>>>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>>>>>> >> it might need some changes in rtracklayer to work. That's why I'm
>>>>>> >> posting it here this time.
>>>>>> >>
>>>>>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>>>>>> >> the web which is in a different type of url than before. Using
>>>>>> >> utils::download.file() with the defaults doesn't work, I have to use
>>>>>> >> method = 'curl' and extra = '-L'.
>>>>>> >>
>>>>>> >> More specifically, the original url
>>>>>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>>>>>> >> effective url
>>>>>> >>
>>>>>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>>>>>> >>
>>>>>> >> Now, using the second url with utils::download.file() and default
>>>>>> >> methods also doesn't work. It does on the browser though.
>>>>>> >>
>>>>>> >>
>>>>>> >> As you can see, downloading the file doesn't work out of the box.
>>>>>> >> Which I guess that it's not surprising that using rtracklayer I get
>>>>>> >> errors like:
>>>>>> >>
>>>>>> >> In seqinfo(ranges) :
>>>>>> >> No openssl available in netConnectHttps for
>>>>>> >> content-na.drive.amazonaws.com : 443
>>>>>> >>
>>>>>> >> You can find further details (code and log file) at
>>>>>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Leo
>>>>>> >
>>>>>> >
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list