[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive

Michael Lawrence lawrence.michael at gene.com
Tue May 31 22:11:17 CEST 2016


Sure, done.

On Tue, May 31, 2016 at 11:18 AM, Leonardo Collado Torres
<lcollado at jhu.edu> wrote:
> Hi Michael,
>
> Thanks!
>
> Actually, it looks like there are a few more quick changes I need you
> to do. Simply at
> https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R
> replace path.expand() with expandPath(). I'm not sure this applies to
> all current path.expand() calls, but at least it does for
> https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20
>
> Best,
> Leo
>
>
>
>
>> library(recount); system.time( regions <- expressed_regions('SRP009615', 'chrY', cutoff = 5L) )
> 2016-05-31 14:11:52 loadCoverage: loading BigWig file
> http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
> Error in seqinfo(con) : UCSC library operation failed
> In addition: Warning message:
> In seqinfo(con) :
>   Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
> Timing stopped at: 0.068 0.009 0.817
>> traceback()
> 14: .Call(BWGFile_seqlengths, path.expand(path(x)))
> 13: seqinfo(con)
> 12: seqinfo(con)
> 11: .local(con, format, text, ...)
> 10: import(file, selection = range, as = "RleList")
> 9: import(file, selection = range, as = "RleList")
> 8: FUN(X[[i]], ...)
> 7: lapply(as.list(X), FUN = FUN, ...)
> 6: lapply(as.list(X), FUN = FUN, ...)
> 5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
>        verbose = verbose)
> 4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
>        verbose = verbose)
> 3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
> 2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
> 1: system.time(regions <- expressed_regions("SRP009615", "chrY",
>        cutoff = 5L))
>> options(width = 120); devtools::session_info()
> Session info -----------------------------------------------------------------------------------------------------------
>  setting  value
>  version  R version 3.3.0 RC (2016-05-01 r70572)
>  system   x86_64, darwin13.4.0
>  ui       AQUA
>  language (EN)
>  collate  en_US.UTF-8
>  tz       America/New_York
>  date     2016-05-31
>
> Packages ---------------------------------------------------------------------------------------------------------------
>  package              * version  date       source
>  acepack                1.3-3.3  2014-11-24 CRAN (R 3.3.0)
>  AnnotationDbi          1.35.3   2016-05-27 Bioconductor
>  Biobase                2.33.0   2016-05-05 Bioconductor
>  BiocGenerics         * 0.19.0   2016-05-05 Bioconductor
>  BiocParallel           1.7.2    2016-05-20 Bioconductor
>  biomaRt                2.29.2   2016-05-30 Bioconductor
>  Biostrings             2.41.1   2016-05-27 Bioconductor
>  bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
>  BSgenome               1.41.0   2016-05-05 Bioconductor
>  bumphunter             1.13.0   2016-05-05 Bioconductor
>  chron                  2.3-47   2015-06-24 CRAN (R 3.3.0)
>  cluster                2.0.4    2016-04-18 CRAN (R 3.3.0)
>  codetools              0.2-14   2015-07-15 CRAN (R 3.3.0)
>  colorspace             1.2-6    2015-03-11 CRAN (R 3.3.0)
>  data.table             1.9.6    2015-09-19 CRAN (R 3.3.0)
>  DBI                    0.4-1    2016-05-08 CRAN (R 3.3.0)
>  derfinder            * 1.7.5    2016-05-20 Bioconductor
>  derfinderHelper        1.7.3    2016-05-20 Bioconductor
>  devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
>  digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
>  doRNG                  1.6      2014-03-07 CRAN (R 3.3.0)
>  foreach                1.4.3    2015-10-13 CRAN (R 3.3.0)
>  foreign                0.8-66   2015-08-19 CRAN (R 3.3.0)
>  Formula                1.2-1    2015-04-07 CRAN (R 3.3.0)
>  GenomeInfoDb         * 1.9.1    2016-05-13 Bioconductor
>  GenomicAlignments      1.9.0    2016-05-05 Bioconductor
>  GenomicFeatures        1.25.12  2016-05-21 Bioconductor
>  GenomicFiles           1.9.7    2016-05-27 Bioconductor
>  GenomicRanges        * 1.25.0   2016-05-05 Bioconductor
>  ggplot2                2.1.0    2016-03-01 CRAN (R 3.3.0)
>  gridExtra              2.2.1    2016-02-29 CRAN (R 3.3.0)
>  gtable                 0.2.0    2016-02-26 CRAN (R 3.3.0)
>  Hmisc                  3.17-4   2016-05-02 CRAN (R 3.3.0)
>  IRanges              * 2.7.1    2016-05-27 Bioconductor
>  iterators              1.0.8    2015-10-13 CRAN (R 3.3.0)
>  lattice                0.20-33  2015-07-14 CRAN (R 3.3.0)
>  latticeExtra           0.6-28   2016-02-09 CRAN (R 3.3.0)
>  locfit                 1.5-9.1  2013-04-20 CRAN (R 3.3.0)
>  magrittr               1.5      2014-11-22 CRAN (R 3.3.0)
>  Matrix                 1.2-6    2016-05-02 CRAN (R 3.3.0)
>  matrixStats            0.50.2   2016-04-24 CRAN (R 3.3.0)
>  memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
>  munsell                0.4.3    2016-02-13 CRAN (R 3.3.0)
>  nnet                   7.3-12   2016-02-02 CRAN (R 3.3.0)
>  pkgmaker               0.22     2014-05-14 CRAN (R 3.3.0)
>  plyr                   1.8.3    2015-06-12 CRAN (R 3.3.0)
>  qvalue                 2.5.2    2016-05-20 Bioconductor
>  RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.3.0)
>  Rcpp                   0.12.5   2016-05-14 CRAN (R 3.3.0)
>  RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>  recount              * 0.99.0   2016-05-31 Bioconductor
>  registry               0.3      2015-07-08 CRAN (R 3.3.0)
>  reshape2               1.4.1    2014-12-06 CRAN (R 3.3.0)
>  rngtools               1.2.4    2014-03-06 CRAN (R 3.3.0)
>  rpart                  4.1-10   2015-06-29 CRAN (R 3.3.0)
>  Rsamtools              1.25.0   2016-05-05 Bioconductor
>  RSQLite                1.0.0    2014-10-25 CRAN (R 3.3.0)
>  rtracklayer            1.33.2   2016-05-31 Github
> (Bioconductor-mirror/rtracklayer at 917973e)
>  S4Vectors            * 0.11.2   2016-05-27 Bioconductor
>  scales                 0.4.0    2016-02-26 CRAN (R 3.3.0)
>  stringi                1.0-1    2015-10-22 CRAN (R 3.3.0)
>  stringr                1.0.0    2015-04-30 CRAN (R 3.3.0)
>  SummarizedExperiment   1.3.2    2016-05-20 Bioconductor
>  survival               2.39-4   2016-05-11 CRAN (R 3.3.0)
>  VariantAnnotation      1.19.1   2016-05-20 Bioconductor
>  withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
>  XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>  xtable                 1.8-2    2016-02-05 CRAN (R 3.3.0)
>  XVector                0.13.0   2016-05-05 Bioconductor
>  zlibbioc               1.19.0   2016-05-05 Bioconductor
>
>
>
>
> On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
> <lawrence.michael at gene.com> wrote:
>> Thanks for pointing out that buglet. Fixed.
>>
>> On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
>> <lcollado at jhu.edu> wrote:
>>> Hi Michael,
>>>
>>> We tried getting things to work with Amazon Cloud Drive (see Abhi's
>>> efforts at https://github.com/nellore/duffel/commits/master). But we
>>> now have the data hosted elsewhere where the links work properly.
>>>
>>> I just noted a small mistake on rtracklayer:::expandPath(). See:
>>>
>>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
>>> [1] FALSE
>>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
>>> [1] TRUE
>>>
>>>
>>> The fix is simple. At
>>> https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
>>> change it to:
>>>
>>> expandPath <- function(x) {
>>> if (startsWith(x, "http") | startsWith(x, "ftp"))
>>> expandURL(x)
>>> else path.expand(x)
>>> }
>>>
>>> Best,
>>> Leo
>>>
>>> On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
>>> <lawrence.michael at gene.com> wrote:
>>>> I checked in something that tries to find openssl automatically on the Mac.
>>>>
>>>> It looks like AWS is for some reason returning 404 for the HEAD command that
>>>> the UCSC library uses the get info about the file like the content size.
>>>> Same thing happens when I play around in Firefox's developer tools. The
>>>> error response header claims a JSON content type, but no JSON is actually
>>>> sent, so there is no further description of the error. I think this is a bug
>>>> in Amazon.
>>>>
>>>> Seems like for now you'll need to download the file first.
>>>>
>>>> Michael
>>>>
>>>> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
>>>> wrote:
>>>>>
>>>>> Hi Michael,
>>>>>
>>>>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>>>>> the OS X binary would work out of the box.
>>>>>
>>>>> Anyhow, I installed rtracklayer (release) manually and got another
>>>>> error (slightly different message now).
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> $ svn co
>>>>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>>>>> $ R CMD INSTALL rtracklayer
>>>>> Loading required package: colorout
>>>>> * installing to library
>>>>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>>>>> * installing *source* package ‘rtracklayer’ ...
>>>>> checking for pkg-config... /usr/local/bin/pkg-config
>>>>> checking pkg-config is at least version 0.9.0... yes
>>>>> checking for OPENSSL... yes
>>>>> ## more output
>>>>>
>>>>> $ R
>>>>> > library('rtracklayer')
>>>>> > unshorten_url <- function(uri) {
>>>>> +     require('RCurl')
>>>>> +     opts <- list(
>>>>> +         followlocation = TRUE,  # resolve redirects
>>>>> +         ssl.verifyhost = FALSE, # suppress certain SSL errors
>>>>> +         ssl.verifypeer = FALSE,
>>>>> +         nobody = TRUE, # perform HEAD request
>>>>> +         verbose = FALSE
>>>>> +     )
>>>>> +     curlhandle <- getCurlHandle(.opts = opts)
>>>>> +     getURL(uri, curl = curlhandle)
>>>>> +     info <- getCurlInfo(curlhandle)
>>>>> +     rm(curlhandle)  # release the curlhandle!
>>>>> +     info$effective.url
>>>>> + }
>>>>> > url <-
>>>>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>>>>> Loading required package: RCurl
>>>>> Loading required package: bitops
>>>>> > url
>>>>> [1]
>>>>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>>>>> > x <- import.bw(url, as = 'RleList')
>>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>>> In addition: Warning message:
>>>>> In seqinfo(ranges) :
>>>>>   Couldn't open
>>>>>
>>>>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>>> > x <-
>>>>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>>> In addition: Warning messages:
>>>>> 1: In seqinfo(ranges) :
>>>>>   TCP non-blocking connect() to content-na.drive.amazonaws.com
>>>>> timed-out in select() after 10000 milliseconds - Cancelling!
>>>>> 2: In seqinfo(ranges) :
>>>>>   Couldn't open
>>>>>
>>>>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>>> > ## Reproducibility info
>>>>> > message(Sys.time())
>>>>> 2016-05-05 17:38:30
>>>>> > options(width = 120)
>>>>> > devtools::session_info()
>>>>> Session info
>>>>> -----------------------------------------------------------------------------------------------------------
>>>>>  setting  value
>>>>>  version  R version 3.3.0 RC (2016-05-01 r70572)
>>>>>  system   x86_64, darwin13.4.0
>>>>>  ui       X11
>>>>>  language (EN)
>>>>>  collate  en_US.UTF-8
>>>>>  tz       America/New_York
>>>>>  date     2016-05-05
>>>>>
>>>>> Packages
>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>  package              * version  date       source
>>>>>  Biobase                2.32.0   2016-05-04 Bioconductor
>>>>>  BiocGenerics         * 0.18.0   2016-05-04 Bioconductor
>>>>>  BiocParallel           1.6.0    2016-05-04 Bioconductor
>>>>>  Biostrings             2.40.0   2016-05-04 Bioconductor
>>>>>  bitops               * 1.0-6    2013-08-17 CRAN (R 3.3.0)
>>>>>  colorout             * 1.1-2    2016-05-05 Github
>>>>> (jalvesaq/colorout at 6538970)
>>>>>  devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
>>>>>  digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
>>>>>  GenomeInfoDb         * 1.8.0    2016-05-04 Bioconductor
>>>>>  GenomicAlignments      1.8.0    2016-05-04 Bioconductor
>>>>>  GenomicRanges        * 1.24.0   2016-05-04 Bioconductor
>>>>>  IRanges              * 2.6.0    2016-05-04 Bioconductor
>>>>>  memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
>>>>>  RCurl                * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>>>>  Rsamtools              1.24.0   2016-05-04 Bioconductor
>>>>>  rtracklayer          * 1.32.0   2016-05-05 Bioconductor
>>>>>  S4Vectors            * 0.10.0   2016-05-04 Bioconductor
>>>>>  SummarizedExperiment   1.2.0    2016-05-04 Bioconductor
>>>>>  withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
>>>>>  XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>>>>  XVector                0.12.0   2016-05-04 Bioconductor
>>>>>  zlibbioc               1.18.0   2016-05-04 Bioconductor
>>>>> >
>>>>>
>>>>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>>>>> <lawrence.michael at gene.com> wrote:
>>>>> > The URL redirection is something I can try to add. For the other error,
>>>>> > you
>>>>> > need to get openssl installed and made visible to pkg-config, so that
>>>>> > rtracklayer finds it during its build process.
>>>>> >
>>>>> > Michael
>>>>> >
>>>>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>>>>> > <lcollado at jhu.edu>
>>>>> > wrote:
>>>>> >>
>>>>> >> Hi Michael,
>>>>> >>
>>>>> >> I have a use case that is similar to
>>>>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>>>>> >> it might need some changes in rtracklayer to work. That's why I'm
>>>>> >> posting it here this time.
>>>>> >>
>>>>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>>>>> >> the web which is in a different type of url than before. Using
>>>>> >> utils::download.file() with the defaults doesn't work, I have to use
>>>>> >> method = 'curl' and extra = '-L'.
>>>>> >>
>>>>> >> More specifically, the original url
>>>>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>>>>> >> effective url
>>>>> >>
>>>>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>>>>> >>
>>>>> >> Now, using the second url with utils::download.file() and default
>>>>> >> methods also doesn't work. It does on the browser though.
>>>>> >>
>>>>> >>
>>>>> >> As you can see, downloading the file doesn't work out of the box.
>>>>> >> Which I guess that it's not surprising that using rtracklayer I get
>>>>> >> errors like:
>>>>> >>
>>>>> >> In seqinfo(ranges) :
>>>>> >>   No openssl available in netConnectHttps for
>>>>> >> content-na.drive.amazonaws.com : 443
>>>>> >>
>>>>> >> You can find further details (code and log file) at
>>>>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>>>>> >>
>>>>> >> Thanks,
>>>>> >> Leo
>>>>> >
>>>>> >
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list