[BioC] Rsamtools BAM File Opening Takes Long Time
Paul Leo
p.leo at uq.edu.au
Tue Jan 17 03:28:34 CET 2012
It was a while back that I tried this... But I used then
ftpBase <-
"ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/"
which was faster(at the time than)
ftpBase <- "ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/"
There are sub-directories in those folders that contain the bam and bai
that you can test on like
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/pilot_data/data/NA06984/alignment/
I'm not aware of a 1000genome mirror in OZ ...
My experience with this was that it took several minutes per region to
get back the data and I had to do a lot of extra error checking cause of
drop outs..
Not aware of an aussie 1000 genome mirror with public access. For larger
dataset sets I just use the VCF files.
Cheers
Paul
Dr Paul Leo
Senior Bioinformatician
UQ Diamantina Institute for Cancer, Immunology and Metabolic Medicine
-----Original Message-----
From: Dario Strbenac <D.Strbenac at garvan.org.au>
Reply-to: "D.Strbenac at garvan.org.au" <D.Strbenac at garvan.org.au>
To: bioconductor at r-project.org <bioconductor at r-project.org>
Subject: [BioC] Rsamtools BAM File Opening Takes Long Time
Date: Tue, 17 Jan 2012 12:00:10 +1000
Hello,
I'm trying to open a connection to a BAM file and it takes 16 minutes just to open the connection.
Here is a small example :
library(Rsamtools)
fName <- "http://genomesavant.com/savant//data/examples/pulmonary.bam"
> system.time(file <- open(BamFile(fName)))
user system elapsed
0.09 0.02 989.95
There is a pulmonary.bam.bai file in the same server directory.
Does anyone else have web-accessible BAM files to test this out on ?
> sessionInfo()
R version 2.14.0 (2011-10-31)
Platform: x86_64-pc-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
[3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Rsamtools_1.6.3 Biostrings_2.22.0 GenomicRanges_1.6.4 IRanges_1.12.5
[5] RCurl_1.6-10.1 bitops_1.0-4.1
loaded via a namespace (and not attached):
[1] BSgenome_1.22.0 rtracklayer_1.14.0 tools_2.14.0 XML_3.4-2.2
[5] zlibbioc_1.0.0
--------------------------------------
Dario Strbenac
Research Assistant
Cancer Epigenetics
Garvan Institute of Medical Research
Darlinghurst NSW 2010
Australia
_______________________________________________
Bioconductor mailing list
Bioconductor at r-project.org
https://stat.ethz.ch/mailman/listinfo/bioconductor
Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list