[Bioc-devel] very slow to use intronsByTranscript in GenomicFeatures

Hervé Pagès hpages at fhcrc.org
Fri Dec 20 18:05:37 CET 2013


Hi Jianhong,

According to my timings, it's a little bit slower than exonsBy() but
not that much. It has to do a little bit more work too as the introns
are not explicitly stored in the SQLite db (the exons are) but are
inferred from the exons and transcript boundaries.
So intronsByTranscript() has to retrieve all the exons + all the
transcripts from the db.

intronsByTranscript():

   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   system.time(introns <- 
intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
   #   user  system elapsed
   #  9.165   0.076   9.263
   system.time(introns <- 
intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
   #   user  system elapsed
   #  4.824   0.064   4.896

exonsBy():

   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   system.time(exons <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
   #   user  system elapsed
   #  7.720   0.072   7.812
   system.time(exons <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
   #   user  system elapsed
   #  4.229   0.028   4.265

transcripts():

   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
   system.time(tx <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
   #   user  system elapsed
   #  1.424   0.008   1.436
   system.time(tx <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
   #   user  system elapsed
   #  0.776   0.012   0.790

Less than 10 sec. to retrieve all the exons and transcripts from disk
and compute the 659327 introns. It's actually not that bad.

Cheers,
H.


On 12/20/2013 08:25 AM, Ou, Jianhong wrote:
> Dear all,
>
> When I try to use intronsByTranscript to get introns for hg19 known genes, I found it is unacceptable slow. Does any body has the same problem?
>
> My code:
> library(GenomicFeatures)
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> introns <- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)
>
>> sessionInfo()
> R Under development (unstable) (2013-12-12 r64453)
> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
> [3] AnnotationDbi_1.25.9                     Biobase_2.23.3
> [5] GenomicRanges_1.15.15                    XVector_0.3.5
> [7] IRanges_1.21.17                          BiocGenerics_0.9.2
>
> loaded via a namespace (and not attached):
>   [1] biomaRt_2.19.1           Biostrings_2.31.5        bitops_1.0-6             BSgenome_1.31.7
>   [5] DBI_0.2-7                GenomicAlignments_0.99.9 RCurl_1.95-4.1           Rsamtools_1.15.15
>   [9] RSQLite_0.11.4           rtracklayer_1.23.6       stats4_3.1.0             tools_3.1.0
> [13] XML_3.98-1.1             zlibbioc_1.9.0
>
> Yours sincerely,
>
> Jianhong Ou
>
> LRB 670A
> Program in Gene Function and Expression
> 364 Plantation Street Worcester,
> MA 01605
>
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioc-devel mailing list