[BioC] Rsamtools BAM File Opening Takes Long Time
Martin Morgan
mtmorgan at fhcrc.org
Tue Jan 17 03:48:13 CET 2012
On 01/16/2012 06:00 PM, Dario Strbenac wrote:
> Hello,
>
> I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection.
>
> Here is a small example :
>
> library(Rsamtools)
> fName<- "http://genomesavant.com/savant//data/examples/pulmonary.bam"
>> system.time(file<- open(BamFile(fName)))
> user system elapsed
> 0.09 0.02 989.95
>
> There is a pulmonary.bam.bai file in the same server directory.
>
> Does anyone else have web-accessible BAM files to test this out on ?
The opposite of what you asked for, but maybe a useful data point anyway
> system.time(file <- open(BamFile(fName)))
user system elapsed
0.024 0.016 0.294
Warning message:
In open.BamFile(BamFile(fName)) :
[knet_seek] SEEK_END is not supported for HTTP. Offset is unchanged.
and
> system.time(countBam(file, param=ScanBamParam(which=GRanges("chr18",
IRanges(1, 1000000)))))
user system elapsed
0.040 0.008 0.682
As Paul alludes to, using the remote BAM might be a false economy, if
over the course of your analysis you download a substantial amount of
the file anyway.
Martin
>
>> sessionInfo()
> R version 2.14.0 (2011-10-31)
> Platform: x86_64-pc-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
> [5] LC_TIME=English_Australia.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] Rsamtools_1.6.3 Biostrings_2.22.0 GenomicRanges_1.6.4 IRanges_1.12.5
> [5] RCurl_1.6-10.1 bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.22.0 rtracklayer_1.14.0 tools_2.14.0 XML_3.4-2.2
> [5] zlibbioc_1.0.0
>
> --------------------------------------
> Dario Strbenac
> Research Assistant
> Cancer Epigenetics
> Garvan Institute of Medical Research
> Darlinghurst NSW 2010
> Australia
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioconductor
mailing list