[Bioc-devel] very slow to use intronsByTranscript in GenomicFeatures

Ou, Jianhong Jianhong.Ou at umassmed.edu
Fri Dec 20 18:31:10 CET 2013


In my case, looks like never end.

I need to check my R first.

Yours sincerely,

Jianhong Ou

LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605




On 12/20/13 12:05 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:

>Hi Jianhong,
>
>According to my timings, it's a little bit slower than exonsBy() but
>not that much. It has to do a little bit more work too as the introns
>are not explicitly stored in the SQLite db (the exons are) but are
>inferred from the exons and transcript boundaries.
>So intronsByTranscript() has to retrieve all the exons + all the
>transcripts from the db.
>
>intronsByTranscript():
>
>   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>   system.time(introns <-
>intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
>   #   user  system elapsed
>   #  9.165   0.076   9.263
>   system.time(introns <-
>intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
>   #   user  system elapsed
>   #  4.824   0.064   4.896
>
>exonsBy():
>
>   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>   system.time(exons <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
>   #   user  system elapsed
>   #  7.720   0.072   7.812
>   system.time(exons <- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
>   #   user  system elapsed
>   #  4.229   0.028   4.265
>
>transcripts():
>
>   library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>   system.time(tx <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
>   #   user  system elapsed
>   #  1.424   0.008   1.436
>   system.time(tx <- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
>   #   user  system elapsed
>   #  0.776   0.012   0.790
>
>Less than 10 sec. to retrieve all the exons and transcripts from disk
>and compute the 659327 introns. It's actually not that bad.
>
>Cheers,
>H.
>
>
>On 12/20/2013 08:25 AM, Ou, Jianhong wrote:
>> Dear all,
>>
>> When I try to use intronsByTranscript to get introns for hg19 known
>>genes, I found it is unacceptable slow. Does any body has the same
>>problem?
>>
>> My code:
>> library(GenomicFeatures)
>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>> introns <- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>
>>> sessionInfo()
>> R Under development (unstable) (2013-12-12 r64453)
>> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>  base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
>> [3] AnnotationDbi_1.25.9                     Biobase_2.23.3
>> [5] GenomicRanges_1.15.15                    XVector_0.3.5
>> [7] IRanges_1.21.17                          BiocGenerics_0.9.2
>>
>> loaded via a namespace (and not attached):
>>   [1] biomaRt_2.19.1           Biostrings_2.31.5        bitops_1.0-6
>>         BSgenome_1.31.7
>>   [5] DBI_0.2-7                GenomicAlignments_0.99.9 RCurl_1.95-4.1
>>         Rsamtools_1.15.15
>>   [9] RSQLite_0.11.4           rtracklayer_1.23.6       stats4_3.1.0
>>         tools_3.1.0
>> [13] XML_3.98-1.1             zlibbioc_1.9.0
>>
>> Yours sincerely,
>>
>> Jianhong Ou
>>
>> LRB 670A
>> Program in Gene Function and Expression
>> 364 Plantation Street Worcester,
>> MA 01605
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>
>
>-- 
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone:  (206) 667-5791
>Fax:    (206) 667-1319



More information about the Bioc-devel mailing list