[Bioc-devel] Issue importing bigwig files with rtracklayer from Amazon Cloud Drive

Leonardo Collado Torres lcollado at jhu.edu
Tue May 31 23:31:12 CEST 2016


Awesome, thanks!

On Tue, May 31, 2016 at 4:11 PM, Michael Lawrence
<lawrence.michael at gene.com> wrote:
> Sure, done.
>
> On Tue, May 31, 2016 at 11:18 AM, Leonardo Collado Torres
> <lcollado at jhu.edu> wrote:
>> Hi Michael,
>>
>> Thanks!
>>
>> Actually, it looks like there are a few more quick changes I need you
>> to do. Simply at
>> https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R
>> replace path.expand() with expandPath(). I'm not sure this applies to
>> all current path.expand() calls, but at least it does for
>> https://github.com/Bioconductor-mirror/rtracklayer/blob/917973eb7e9f16bbcd6f6e4b9452f9e40d9a1e94/R/bigWig.R#L20
>>
>> Best,
>> Leo
>>
>>
>>
>>
>>> library(recount); system.time( regions <- expressed_regions('SRP009615', 'chrY', cutoff = 5L) )
>> 2016-05-31 14:11:52 loadCoverage: loading BigWig file
>> http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
>> Error in seqinfo(con) : UCSC library operation failed
>> In addition: Warning message:
>> In seqinfo(con) :
>>   Couldn't open http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw
>> Timing stopped at: 0.068 0.009 0.817
>>> traceback()
>> 14: .Call(BWGFile_seqlengths, path.expand(path(x)))
>> 13: seqinfo(con)
>> 12: seqinfo(con)
>> 11: .local(con, format, text, ...)
>> 10: import(file, selection = range, as = "RleList")
>> 9: import(file, selection = range, as = "RleList")
>> 8: FUN(X[[i]], ...)
>> 7: lapply(as.list(X), FUN = FUN, ...)
>> 6: lapply(as.list(X), FUN = FUN, ...)
>> 5: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
>>        verbose = verbose)
>> 4: lapply(bList, .loadCoverageBigWig, range = which, chr = chr,
>>        verbose = verbose)
>> 3: loadCoverage(files = meanFile, chr = chr, chrlen = chrlen)
>> 2: expressed_regions("SRP009615", "chrY", cutoff = 5L)
>> 1: system.time(regions <- expressed_regions("SRP009615", "chrY",
>>        cutoff = 5L))
>>> options(width = 120); devtools::session_info()
>> Session info -----------------------------------------------------------------------------------------------------------
>>  setting  value
>>  version  R version 3.3.0 RC (2016-05-01 r70572)
>>  system   x86_64, darwin13.4.0
>>  ui       AQUA
>>  language (EN)
>>  collate  en_US.UTF-8
>>  tz       America/New_York
>>  date     2016-05-31
>>
>> Packages ---------------------------------------------------------------------------------------------------------------
>>  package              * version  date       source
>>  acepack                1.3-3.3  2014-11-24 CRAN (R 3.3.0)
>>  AnnotationDbi          1.35.3   2016-05-27 Bioconductor
>>  Biobase                2.33.0   2016-05-05 Bioconductor
>>  BiocGenerics         * 0.19.0   2016-05-05 Bioconductor
>>  BiocParallel           1.7.2    2016-05-20 Bioconductor
>>  biomaRt                2.29.2   2016-05-30 Bioconductor
>>  Biostrings             2.41.1   2016-05-27 Bioconductor
>>  bitops                 1.0-6    2013-08-17 CRAN (R 3.3.0)
>>  BSgenome               1.41.0   2016-05-05 Bioconductor
>>  bumphunter             1.13.0   2016-05-05 Bioconductor
>>  chron                  2.3-47   2015-06-24 CRAN (R 3.3.0)
>>  cluster                2.0.4    2016-04-18 CRAN (R 3.3.0)
>>  codetools              0.2-14   2015-07-15 CRAN (R 3.3.0)
>>  colorspace             1.2-6    2015-03-11 CRAN (R 3.3.0)
>>  data.table             1.9.6    2015-09-19 CRAN (R 3.3.0)
>>  DBI                    0.4-1    2016-05-08 CRAN (R 3.3.0)
>>  derfinder            * 1.7.5    2016-05-20 Bioconductor
>>  derfinderHelper        1.7.3    2016-05-20 Bioconductor
>>  devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
>>  digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
>>  doRNG                  1.6      2014-03-07 CRAN (R 3.3.0)
>>  foreach                1.4.3    2015-10-13 CRAN (R 3.3.0)
>>  foreign                0.8-66   2015-08-19 CRAN (R 3.3.0)
>>  Formula                1.2-1    2015-04-07 CRAN (R 3.3.0)
>>  GenomeInfoDb         * 1.9.1    2016-05-13 Bioconductor
>>  GenomicAlignments      1.9.0    2016-05-05 Bioconductor
>>  GenomicFeatures        1.25.12  2016-05-21 Bioconductor
>>  GenomicFiles           1.9.7    2016-05-27 Bioconductor
>>  GenomicRanges        * 1.25.0   2016-05-05 Bioconductor
>>  ggplot2                2.1.0    2016-03-01 CRAN (R 3.3.0)
>>  gridExtra              2.2.1    2016-02-29 CRAN (R 3.3.0)
>>  gtable                 0.2.0    2016-02-26 CRAN (R 3.3.0)
>>  Hmisc                  3.17-4   2016-05-02 CRAN (R 3.3.0)
>>  IRanges              * 2.7.1    2016-05-27 Bioconductor
>>  iterators              1.0.8    2015-10-13 CRAN (R 3.3.0)
>>  lattice                0.20-33  2015-07-14 CRAN (R 3.3.0)
>>  latticeExtra           0.6-28   2016-02-09 CRAN (R 3.3.0)
>>  locfit                 1.5-9.1  2013-04-20 CRAN (R 3.3.0)
>>  magrittr               1.5      2014-11-22 CRAN (R 3.3.0)
>>  Matrix                 1.2-6    2016-05-02 CRAN (R 3.3.0)
>>  matrixStats            0.50.2   2016-04-24 CRAN (R 3.3.0)
>>  memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
>>  munsell                0.4.3    2016-02-13 CRAN (R 3.3.0)
>>  nnet                   7.3-12   2016-02-02 CRAN (R 3.3.0)
>>  pkgmaker               0.22     2014-05-14 CRAN (R 3.3.0)
>>  plyr                   1.8.3    2015-06-12 CRAN (R 3.3.0)
>>  qvalue                 2.5.2    2016-05-20 Bioconductor
>>  RColorBrewer           1.1-2    2014-12-07 CRAN (R 3.3.0)
>>  Rcpp                   0.12.5   2016-05-14 CRAN (R 3.3.0)
>>  RCurl                  1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>  recount              * 0.99.0   2016-05-31 Bioconductor
>>  registry               0.3      2015-07-08 CRAN (R 3.3.0)
>>  reshape2               1.4.1    2014-12-06 CRAN (R 3.3.0)
>>  rngtools               1.2.4    2014-03-06 CRAN (R 3.3.0)
>>  rpart                  4.1-10   2015-06-29 CRAN (R 3.3.0)
>>  Rsamtools              1.25.0   2016-05-05 Bioconductor
>>  RSQLite                1.0.0    2014-10-25 CRAN (R 3.3.0)
>>  rtracklayer            1.33.2   2016-05-31 Github
>> (Bioconductor-mirror/rtracklayer at 917973e)
>>  S4Vectors            * 0.11.2   2016-05-27 Bioconductor
>>  scales                 0.4.0    2016-02-26 CRAN (R 3.3.0)
>>  stringi                1.0-1    2015-10-22 CRAN (R 3.3.0)
>>  stringr                1.0.0    2015-04-30 CRAN (R 3.3.0)
>>  SummarizedExperiment   1.3.2    2016-05-20 Bioconductor
>>  survival               2.39-4   2016-05-11 CRAN (R 3.3.0)
>>  VariantAnnotation      1.19.1   2016-05-20 Bioconductor
>>  withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
>>  XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>  xtable                 1.8-2    2016-02-05 CRAN (R 3.3.0)
>>  XVector                0.13.0   2016-05-05 Bioconductor
>>  zlibbioc               1.19.0   2016-05-05 Bioconductor
>>
>>
>>
>>
>> On Tue, May 31, 2016 at 2:02 PM, Michael Lawrence
>> <lawrence.michael at gene.com> wrote:
>>> Thanks for pointing out that buglet. Fixed.
>>>
>>> On Tue, May 31, 2016 at 10:55 AM, Leonardo Collado Torres
>>> <lcollado at jhu.edu> wrote:
>>>> Hi Michael,
>>>>
>>>> We tried getting things to work with Amazon Cloud Drive (see Abhi's
>>>> efforts at https://github.com/nellore/duffel/commits/master). But we
>>>> now have the data hosted elsewhere where the links work properly.
>>>>
>>>> I just noted a small mistake on rtracklayer:::expandPath(). See:
>>>>
>>>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http||ftp')
>>>> [1] FALSE
>>>>> startsWith('http://duffel.rail.bio/recount/SRP009615/bw/mean_SRP009615.bw', 'http')
>>>> [1] TRUE
>>>>
>>>>
>>>> The fix is simple. At
>>>> https://github.com/Bioconductor-mirror/rtracklayer/blob/c4b842bc4daa4b9db26cb86f3284cf8cf5c32ebd/R/web.R#L62-L66,
>>>> change it to:
>>>>
>>>> expandPath <- function(x) {
>>>> if (startsWith(x, "http") | startsWith(x, "ftp"))
>>>> expandURL(x)
>>>> else path.expand(x)
>>>> }
>>>>
>>>> Best,
>>>> Leo
>>>>
>>>> On Thu, May 5, 2016 at 8:10 PM, Michael Lawrence
>>>> <lawrence.michael at gene.com> wrote:
>>>>> I checked in something that tries to find openssl automatically on the Mac.
>>>>>
>>>>> It looks like AWS is for some reason returning 404 for the HEAD command that
>>>>> the UCSC library uses the get info about the file like the content size.
>>>>> Same thing happens when I play around in Firefox's developer tools. The
>>>>> error response header claims a JSON content type, but no JSON is actually
>>>>> sent, so there is no further description of the error. I think this is a bug
>>>>> in Amazon.
>>>>>
>>>>> Seems like for now you'll need to download the file first.
>>>>>
>>>>> Michael
>>>>>
>>>>> On Thu, May 5, 2016 at 2:46 PM, Leonardo Collado Torres <lcollado at jhu.edu>
>>>>> wrote:
>>>>>>
>>>>>> Hi Michael,
>>>>>>
>>>>>> I forgot about pkg-util (just did a fresh BioC 3.3 install). I assumed
>>>>>> the OS X binary would work out of the box.
>>>>>>
>>>>>> Anyhow, I installed rtracklayer (release) manually and got another
>>>>>> error (slightly different message now).
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> $ svn co
>>>>>> https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_3_3/madman/Rpacks/rtracklayer
>>>>>> $ R CMD INSTALL rtracklayer
>>>>>> Loading required package: colorout
>>>>>> * installing to library
>>>>>> ‘/Library/Frameworks/R.framework/Versions/3.3release/Resources/library’
>>>>>> * installing *source* package ‘rtracklayer’ ...
>>>>>> checking for pkg-config... /usr/local/bin/pkg-config
>>>>>> checking pkg-config is at least version 0.9.0... yes
>>>>>> checking for OPENSSL... yes
>>>>>> ## more output
>>>>>>
>>>>>> $ R
>>>>>> > library('rtracklayer')
>>>>>> > unshorten_url <- function(uri) {
>>>>>> +     require('RCurl')
>>>>>> +     opts <- list(
>>>>>> +         followlocation = TRUE,  # resolve redirects
>>>>>> +         ssl.verifyhost = FALSE, # suppress certain SSL errors
>>>>>> +         ssl.verifypeer = FALSE,
>>>>>> +         nobody = TRUE, # perform HEAD request
>>>>>> +         verbose = FALSE
>>>>>> +     )
>>>>>> +     curlhandle <- getCurlHandle(.opts = opts)
>>>>>> +     getURL(uri, curl = curlhandle)
>>>>>> +     info <- getCurlInfo(curlhandle)
>>>>>> +     rm(curlhandle)  # release the curlhandle!
>>>>>> +     info$effective.url
>>>>>> + }
>>>>>> > url <-
>>>>>> > unshorten_url('http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw')
>>>>>> Loading required package: RCurl
>>>>>> Loading required package: bitops
>>>>>> > url
>>>>>> [1]
>>>>>> "https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5"
>>>>>> > x <- import.bw(url, as = 'RleList')
>>>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>>>> In addition: Warning message:
>>>>>> In seqinfo(ranges) :
>>>>>>   Couldn't open
>>>>>>
>>>>>> https://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>>>> > x <-
>>>>>> > import.bw('http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5')
>>>>>> Error in seqinfo(ranges) : UCSC library operation failed
>>>>>> In addition: Warning messages:
>>>>>> 1: In seqinfo(ranges) :
>>>>>>   TCP non-blocking connect() to content-na.drive.amazonaws.com
>>>>>> timed-out in select() after 10000 milliseconds - Cancelling!
>>>>>> 2: In seqinfo(ranges) :
>>>>>>   Couldn't open
>>>>>>
>>>>>> http://content-na.drive.amazonaws.com/cdproxy/templink/usTQCr2pAaI3tTps4AFQuz1H9kmm23EDYy39SQ3ke5EuFiZq5
>>>>>> > ## Reproducibility info
>>>>>> > message(Sys.time())
>>>>>> 2016-05-05 17:38:30
>>>>>> > options(width = 120)
>>>>>> > devtools::session_info()
>>>>>> Session info
>>>>>> -----------------------------------------------------------------------------------------------------------
>>>>>>  setting  value
>>>>>>  version  R version 3.3.0 RC (2016-05-01 r70572)
>>>>>>  system   x86_64, darwin13.4.0
>>>>>>  ui       X11
>>>>>>  language (EN)
>>>>>>  collate  en_US.UTF-8
>>>>>>  tz       America/New_York
>>>>>>  date     2016-05-05
>>>>>>
>>>>>> Packages
>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>>  package              * version  date       source
>>>>>>  Biobase                2.32.0   2016-05-04 Bioconductor
>>>>>>  BiocGenerics         * 0.18.0   2016-05-04 Bioconductor
>>>>>>  BiocParallel           1.6.0    2016-05-04 Bioconductor
>>>>>>  Biostrings             2.40.0   2016-05-04 Bioconductor
>>>>>>  bitops               * 1.0-6    2013-08-17 CRAN (R 3.3.0)
>>>>>>  colorout             * 1.1-2    2016-05-05 Github
>>>>>> (jalvesaq/colorout at 6538970)
>>>>>>  devtools               1.11.1   2016-04-21 CRAN (R 3.3.0)
>>>>>>  digest                 0.6.9    2016-01-08 CRAN (R 3.3.0)
>>>>>>  GenomeInfoDb         * 1.8.0    2016-05-04 Bioconductor
>>>>>>  GenomicAlignments      1.8.0    2016-05-04 Bioconductor
>>>>>>  GenomicRanges        * 1.24.0   2016-05-04 Bioconductor
>>>>>>  IRanges              * 2.6.0    2016-05-04 Bioconductor
>>>>>>  memoise                1.0.0    2016-01-29 CRAN (R 3.3.0)
>>>>>>  RCurl                * 1.95-4.8 2016-03-01 CRAN (R 3.3.0)
>>>>>>  Rsamtools              1.24.0   2016-05-04 Bioconductor
>>>>>>  rtracklayer          * 1.32.0   2016-05-05 Bioconductor
>>>>>>  S4Vectors            * 0.10.0   2016-05-04 Bioconductor
>>>>>>  SummarizedExperiment   1.2.0    2016-05-04 Bioconductor
>>>>>>  withr                  1.0.1    2016-02-04 CRAN (R 3.3.0)
>>>>>>  XML                    3.98-1.4 2016-03-01 CRAN (R 3.3.0)
>>>>>>  XVector                0.12.0   2016-05-04 Bioconductor
>>>>>>  zlibbioc               1.18.0   2016-05-04 Bioconductor
>>>>>> >
>>>>>>
>>>>>> On Thu, May 5, 2016 at 5:24 PM, Michael Lawrence
>>>>>> <lawrence.michael at gene.com> wrote:
>>>>>> > The URL redirection is something I can try to add. For the other error,
>>>>>> > you
>>>>>> > need to get openssl installed and made visible to pkg-config, so that
>>>>>> > rtracklayer finds it during its build process.
>>>>>> >
>>>>>> > Michael
>>>>>> >
>>>>>> > On Thu, May 5, 2016 at 2:01 PM, Leonardo Collado Torres
>>>>>> > <lcollado at jhu.edu>
>>>>>> > wrote:
>>>>>> >>
>>>>>> >> Hi Michael,
>>>>>> >>
>>>>>> >> I have a use case that is similar to
>>>>>> >> https://support.bioconductor.org/p/81267/#82142 and looks to me like
>>>>>> >> it might need some changes in rtracklayer to work. That's why I'm
>>>>>> >> posting it here this time.
>>>>>> >>
>>>>>> >> Basically, I'm trying to use rtracklayer to import a bigwig file over
>>>>>> >> the web which is in a different type of url than before. Using
>>>>>> >> utils::download.file() with the defaults doesn't work, I have to use
>>>>>> >> method = 'curl' and extra = '-L'.
>>>>>> >>
>>>>>> >> More specifically, the original url
>>>>>> >> http://duffel.rail.bio/recount/DRP000366/bw/DRR000897.bw has an
>>>>>> >> effective url
>>>>>> >>
>>>>>> >> https://content-na.drive.amazonaws.com/cdproxy/templink/i_aQAPZJkJ9d9lN1NO5DJJtlbpvAdgbNuc1SkqSTHFouFiZq5
>>>>>> >>
>>>>>> >> Now, using the second url with utils::download.file() and default
>>>>>> >> methods also doesn't work. It does on the browser though.
>>>>>> >>
>>>>>> >>
>>>>>> >> As you can see, downloading the file doesn't work out of the box.
>>>>>> >> Which I guess that it's not surprising that using rtracklayer I get
>>>>>> >> errors like:
>>>>>> >>
>>>>>> >> In seqinfo(ranges) :
>>>>>> >>   No openssl available in netConnectHttps for
>>>>>> >> content-na.drive.amazonaws.com : 443
>>>>>> >>
>>>>>> >> You can find further details (code and log file) at
>>>>>> >> https://gist.github.com/lcolladotor/c500dd79d49aed1ef33ade5417111453
>>>>>> >>
>>>>>> >> Thanks,
>>>>>> >> Leo
>>>>>> >
>>>>>> >
>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel



More information about the Bioc-devel mailing list