[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive

Leonardo Collado Torres lcollado at jhu.edu
Tue May 31 20:18:16 CEST 2016


Hi Michael,

Thanks!

Actually, it looks like there are a few more quick changes I need you
to do. Simply at
https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R
replace path.expand() with expandPath(). I'm not sure this applies to
all current path.expand() calls, but at least it does for
https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20

Best,
Leo




> library(recount); system.time( regions <- expressed_regions('SRP009615', 'chrY', cutoff = 5L) )
2016-05-31 14:11:52 loadCoverage: loading BigWig file
http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
Error in seqinfo(con) : UCSC library operation failed
In addition: Warning message:
In seqinfo(con) :
  Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
Timing stopped at: 0.068 0.009 0.817
> traceback()
14: .Call(BWGFile_seqlengths, path.expand(path(x)))
13: seqinfo(con)
12: seqinfo(con)
11: .local(con, format, text, ...)
10: import(file, selection = range, as = "RleList")
9: import(file, selection = range, as = "RleList")
8: FUN(X[[i]], ...)
7: lapply(as.list(X), FUN = FUN, ...)
6: lapply(as.list(X), FUN = FUN, ...)
5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
       verbose = verbose)
4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
       verbose = verbose)
3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
1: system.time(regions <- expressed_regions("SRP009615", "chrY",
       cutoff = 5L))
> options(width = 120); devtools::session_info()
Session info -----------------------------------------------------------------------------------------------------------
 setting  value
 version  R version 3.3.0 RC (2016-05-01 r70572)
 system   x86_64, darwin13.4.0
 ui       AQUA
 language (EN)
 collate  en_US.UTF-8
 tz       America/New_York
 date     2016-05-31

Packages ---------------------------------------------------------------------------------------------------------------
 package              * version  date       source
 acepack                1.3-3.3  2014-11-24 CRAN (R 3.3.0)
 AnnotationDbi          1.35.3   2016-05-27 Bioconductor
 Biobase                2.33.0   2016-05-05 Bioconductor
 BiocGenerics         * 0.19.0   2016-05-05 Bioconductor
 BiocParallel           1.7.2    2016-05-20 Bioconductor
 biomaRt                2.29.2   2016-05-30 Bioconductor
 Biostrings             2.41.1   2016-05-27 Bioconductor
 bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
 BSgenome               1.41.0   2016-05-05 Bioconductor
 bumphunter             1.13.0   2016-05-05 Bioconductor
 chron                  2.3-47   2015-06-24 CRAN (R 3.3.0)
 cluster                2.0.4    2016-04-18 CRAN (R 3.3.0)
 codetools              0.2-14   2015-07-15 CRAN (R 3.3.0)
 colorspace             1.2-6    2015-03-11 CRAN (R 3.3.0)
 data.table             1.9.6    2015-09-19 CRAN (R 3.3.0)
 DBI                    0.4-1    2016-05-08 CRAN (R 3.3.0)
 derfinder            * 1.7.5    2016-05-20 Bioconductor
 derfinderHelper        1.7.3    2016-05-20 Bioconductor
 devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
 digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
 doRNG                  1.6      2014-03-07 CRAN (R 3.3.0)
 foreach                1.4.3    2015-10-13 CRAN (R 3.3.0)
 foreign                0.8-66   2015-08-19 CRAN (R 3.3.0)
 Formula                1.2-1    2015-04-07 CRAN (R 3.3.0)
 GenomeInfoDb         * 1.9.1    2016-05-13 Bioconductor
 GenomicAlignments      1.9.0    2016-05-05 Bioconductor
 GenomicFeatures        1.25.12  2016-05-21 Bioconductor
 GenomicFiles           1.9.7    2016-05-27 Bioconductor
 GenomicRanges        * 1.25.0   2016-05-05 Bioconductor
 ggplot2                2.1.0    2016-03-01 CRAN (R 3.3.0)
 gridExtra              2.2.1    2016-02-29 CRAN (R 3.3.0)
 gtable                 0.2.0    2016-02-26 CRAN (R 3.3.0)
 Hmisc                  3.17-4   2016-05-02 CRAN (R 3.3.0)
 IRanges              * 2.7.1    2016-05-27 Bioconductor
 iterators              1.0.8    2015-10-13 CRAN (R 3.3.0)
 lattice                0.20-33  2015-07-14 CRAN (R 3.3.0)
 latticeExtra           0.6-28   2016-02-09 CRAN (R 3.3.0)
 locfit                 1.5-9.1  2013-04-20 CRAN (R 3.3.0)
 magrittr               1.5      2014-11-22 CRAN (R 3.3.0)
 Matrix                 1.2-6    2016-05-02 CRAN (R 3.3.0)
 matrixStats            0.50.2   2016-04-24 CRAN (R 3.3.0)
 memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
 munsell                0.4.3    2016-02-13 CRAN (R 3.3.0)
 nnet                   7.3-12   2016-02-02 CRAN (R 3.3.0)
 pkgmaker               0.22     2014-05-14 CRAN (R 3.3.0)
 plyr                   1.8.3    2015-06-12 CRAN (R 3.3.0)
 qvalue                 2.5.2    2016-05-20 Bioconductor
 RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.3.0)
 Rcpp                   0.12.5   2016-05-14 CRAN (R 3.3.0)
 RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
 recount              * 0.99.0   2016-05-31 Bioconductor
 registry               0.3      2015-07-08 CRAN (R 3.3.0)
 reshape2               1.4.1    2014-12-06 CRAN (R 3.3.0)
 rngtools               1.2.4    2014-03-06 CRAN (R 3.3.0)
 rpart                  4.1-10   2015-06-29 CRAN (R 3.3.0)
 Rsamtools              1.25.0   2016-05-05 Bioconductor
 RSQLite                1.0.0    2014-10-25 CRAN (R 3.3.0)
 rtracklayer            1.33.2   2016-05-31 Github
(Bioconductor-mirror/rtracklayer at 917973e)
 S4Vectors            * 0.11.2   2016-05-27 Bioconductor
 scales                 0.4.0    2016-02-26 CRAN (R 3.3.0)
 stringi                1.0-1    2015-10-22 CRAN (R 3.3.0)
 stringr                1.0.0    2015-04-30 CRAN (R 3.3.0)
 SummarizedExperiment   1.3.2    2016-05-20 Bioconductor
 survival               2.39-4   2016-05-11 CRAN (R 3.3.0)
 VariantAnnotation      1.19.1   2016-05-20 Bioconductor
 withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
 XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
 xtable                 1.8-2    2016-02-05 CRAN (R 3.3.0)
 XVector                0.13.0   2016-05-05 Bioconductor
 zlibbioc               1.19.0   2016-05-05 Bioconductor




On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Thanks for pointing out that buglet. Fixed.
>
> On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
> <lcollado at jhu.edu> wrote:
>> Hi Michael,
>>
>> We tried getting things to work with Amazon Cloud Drive (see Abhi's
>> efforts at https://github.com/nellore/duffel/commits/master). But we
>> now have the data hosted elsewhere where the links work properly.
>>
>> I just noted a small mistake on rtracklayer:::expandPath(). See:
>>
>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
>> [1] FALSE
>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
>> [1] TRUE
>>
>>
>> The fix is simple. At
>> https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
>> change it to:
>>
>> expandPath <- function(x) {
>> if (startsWith(x, "http") | startsWith(x, "ftp"))
>> expandURL(x)
>> else path.expand(x)
>> }
>>
>> Best,
>> Leo
>>
>> On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>>> I checked in something that tries to find openssl automatically on the Mac.
>>>
>>> It looks like AWS is for some reason returning 404 for the HEAD command that
>>> the UCSC library uses the get info about the file like the content size.
>>> Same thing happens when I play around in Firefox's developer tools. The
>>> error response header claims a JSON content type, but no JSON is actually
>>> sent, so there is no further description of the error. I think this is a bug
>>> in Amazon.
>>>
>>> Seems like for now you'll need to download the file first.
>>>
>>> Michael
>>>
>>> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
>>> wrote:
>>>>
>>>> Hi Michael,
>>>>
>>>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>>>> the OS X binary would work out of the box.
>>>>
>>>> Anyhow, I installed rtracklayer (release) manually and got another
>>>> error (slightly different message now).
>>>>
>>>>
>>>>
>>>>
>>>> $ svn co
>>>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>>>> $ R CMD INSTALL rtracklayer
>>>> Loading required package: colorout
>>>> * installing to library
>>>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>>>> * installing *source* package ‘rtracklayer’ ...
>>>> checking for pkg-config... /usr/local/bin/pkg-config
>>>> checking pkg-config is at least version 0.9.0... yes
>>>> checking for OPENSSL... yes
>>>> ## more output
>>>>
>>>> $ R
>>>> > library('rtracklayer')
>>>> > unshorten_url <- function(uri) {
>>>> +     require('RCurl')
>>>> +     opts <- list(
>>>> +         followlocation = TRUE,  # resolve redirects
>>>> +         ssl.verifyhost = FALSE, # suppress certain SSL errors
>>>> +         ssl.verifypeer = FALSE,
>>>> +         nobody = TRUE, # perform HEAD request
>>>> +         verbose = FALSE
>>>> +     )
>>>> +     curlhandle <- getCurlHandle(.opts = opts)
>>>> +     getURL(uri, curl = curlhandle)
>>>> +     info <- getCurlInfo(curlhandle)
>>>> +     rm(curlhandle)  # release the curlhandle!
>>>> +     info$effective.url
>>>> + }
>>>> > url <-
>>>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>>>> Loading required package: RCurl
>>>> Loading required package: bitops
>>>> > url
>>>> [1]
>>>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>>>> > x <- import.bw(url, as = 'RleList')
>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>> In addition: Warning message:
>>>> In seqinfo(ranges) :
>>>>   Couldn't open
>>>>
>>>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>> > x <-
>>>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>> In addition: Warning messages:
>>>> 1: In seqinfo(ranges) :
>>>>   TCP non-blocking connect() to content-na.drive.amazonaws.com
>>>> timed-out in select() after 10000 milliseconds - Cancelling!
>>>> 2: In seqinfo(ranges) :
>>>>   Couldn't open
>>>>
>>>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>> > ## Reproducibility info
>>>> > message(Sys.time())
>>>> 2016-05-05 17:38:30
>>>> > options(width = 120)
>>>> > devtools::session_info()
>>>> Session info
>>>> -----------------------------------------------------------------------------------------------------------
>>>>  setting  value
>>>>  version  R version 3.3.0 RC (2016-05-01 r70572)
>>>>  system   x86_64, darwin13.4.0
>>>>  ui       X11
>>>>  language (EN)
>>>>  collate  en_US.UTF-8
>>>>  tz       America/New_York
>>>>  date     2016-05-05
>>>>
>>>> Packages
>>>> ---------------------------------------------------------------------------------------------------------------
>>>>  package              * version  date       source
>>>>  Biobase                2.32.0   2016-05-04 Bioconductor
>>>>  BiocGenerics         * 0.18.0   2016-05-04 Bioconductor
>>>>  BiocParallel           1.6.0    2016-05-04 Bioconductor
>>>>  Biostrings             2.40.0   2016-05-04 Bioconductor
>>>>  bitops               * 1.0-6    2013-08-17 CRAN (R 3.3.0)
>>>>  colorout             * 1.1-2    2016-05-05 Github
>>>> (jalvesaq/colorout at 6538970)
>>>>  devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
>>>>  digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
>>>>  GenomeInfoDb         * 1.8.0    2016-05-04 Bioconductor
>>>>  GenomicAlignments      1.8.0    2016-05-04 Bioconductor
>>>>  GenomicRanges        * 1.24.0   2016-05-04 Bioconductor
>>>>  IRanges              * 2.6.0    2016-05-04 Bioconductor
>>>>  memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
>>>>  RCurl                * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>>>  Rsamtools              1.24.0   2016-05-04 Bioconductor
>>>>  rtracklayer          * 1.32.0   2016-05-05 Bioconductor
>>>>  S4Vectors            * 0.10.0   2016-05-04 Bioconductor
>>>>  SummarizedExperiment   1.2.0    2016-05-04 Bioconductor
>>>>  withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
>>>>  XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>>>  XVector                0.12.0   2016-05-04 Bioconductor
>>>>  zlibbioc               1.18.0   2016-05-04 Bioconductor
>>>> >
>>>>
>>>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>>>> <lawrence.michael at gene.com> wrote:
>>>> > The URL redirection is something I can try to add. For the other error,
>>>> > you
>>>> > need to get openssl installed and made visible to pkg-config, so that
>>>> > rtracklayer finds it during its build process.
>>>> >
>>>> > Michael
>>>> >
>>>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>>>> > <lcollado at jhu.edu>
>>>> > wrote:
>>>> >>
>>>> >> Hi Michael,
>>>> >>
>>>> >> I have a use case that is similar to
>>>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>>>> >> it might need some changes in rtracklayer to work. That's why I'm
>>>> >> posting it here this time.
>>>> >>
>>>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>>>> >> the web which is in a different type of url than before. Using
>>>> >> utils::download.file() with the defaults doesn't work, I have to use
>>>> >> method = 'curl' and extra = '-L'.
>>>> >>
>>>> >> More specifically, the original url
>>>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>>>> >> effective url
>>>> >>
>>>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>>>> >>
>>>> >> Now, using the second url with utils::download.file() and default
>>>> >> methods also doesn't work. It does on the browser though.
>>>> >>
>>>> >>
>>>> >> As you can see, downloading the file doesn't work out of the box.
>>>> >> Which I guess that it's not surprising that using rtracklayer I get
>>>> >> errors like:
>>>> >>
>>>> >> In seqinfo(ranges) :
>>>> >>   No openssl available in netConnectHttps for
>>>> >> content-na.drive.amazonaws.com : 443
>>>> >>
>>>> >> You can find further details (code and log file) at
>>>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>>>> >>
>>>> >> Thanks,
>>>> >> Leo
>>>> >
>>>> >
>>>
>>>
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list