[Bioc-devel] very slow to use intronsByTranscript in GenomicFeatures

Ou, Jianhong Jianhong.Ou at umassmed.edu
Fri Dec 20 21:23:45 CET 2013


Thanks Herve, Robert,

Now it works good. 

Yours sincerely,

Jianhong Ou

LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605




On 12/20/13 2:25 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:

>Hi Robert, Jianhong,
>
>This could be related to some changes to the relist() and split() code
>that I made a few days ago in IRanges. I didn't immediately make the
>corresponding changes to GenomicRanges and GenomicAlignments so
>relisting or splitting GRanges and GAlignments objects was broken for
>a couple of days, which had all kinds of nasty consequences in many
>places (relist() and split() are used a lot internally).
>
>Updated versions of GenomicRanges and GenomicAlignments are now online
>so please make sure you have the latest versions (1.15.17 for
>GenomicRanges and 0.99.10 for GenomicAlignments).
>
>Sorry for the inconvenience and please let me know if you still run
>into problems with this.
>
>H.
>
>On 12/20/2013 10:15 AM, Robert Castelo wrote:
>> hi,
>>
>> i can reproduce what Jianhong says, i noticed it earlier this week but
>> didn't mention because we all know devel is a moving target and so on,
>> but since this has been raised now i'll report what i'm getting.
>>
>> so, this is for Jianhong, if you downgrade the following packages to
>> these particular versions:
>>
>> Biostrings_2.31.3.tar.gz
>> GenomicRanges_1.15.15.tar.gz
>> IRanges_1.21.13.tar.gz
>> XVector_0.3.2.tar.gz
>>
>> you'll be all fine, unless you need some functionality of later versions
>> of them, here is the test with the session information:
>>
>> 
>>suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>)
>> Warning messages:
>> 1: multiple methods tables found for Œrname¹
>> 2: multiple methods tables found for Œrname<-¹
>> 3: multiple methods tables found for Œcigar¹
>> 4: multiple methods tables found for Œqwidth¹
>> 5: multiple methods tables found for Œintrons¹
>> system.time(txbygene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,
>> "gene"))
>>     user  system elapsed
>>    2.524   0.046   2.575
>>
>> sessionInfo()
>> R Under development (unstable) (2013-10-20 r64082)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C
>>   [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8
>>   [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
>>   [7] LC_PAPER=en_US.UTF8       LC_NAME=C
>>   [9] LC_ADDRESS=C              LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>> [8] base
>>
>> other attached packages:
>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
>>   [2] GenomicFeatures_1.15.4
>>   [3] AnnotationDbi_1.25.9
>>   [4] Biobase_2.23.3
>>   [5] GenomicRanges_1.15.11
>>   [6] XVector_0.3.2
>>   [7] IRanges_1.21.13
>>   [8] BiocGenerics_0.9.2
>>   [9] vimcom_0.9-92
>> [10] setwidth_1.0-3
>> [11] colorout_1.0-1
>>
>> loaded via a namespace (and not attached):
>>   [1] biomaRt_2.19.1           Biostrings_2.31.3        bitops_1.0-6
>>   [4] BSgenome_1.31.7          DBI_0.2-7 GenomicAlignments_0.99.9
>>   [7] RCurl_1.95-4.1           Rsamtools_1.15.15        RSQLite_0.11.4
>> [10] rtracklayer_1.23.6       stats4_3.1.0             tools_3.1.0
>> [13] XML_3.98-1.1             zlibbioc_1.9.0
>>
>>
>> however, if you go to the bleeding edge of devel BioC:
>>
>> 
>>suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>)
>> system.time(txbygene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,
>> "gene"))
>>
>> the previous call never ends until you press CTRL+C:
>>
>> ^C
>> Error in unlist(lapply(c("seqnames", "ranges", "strand", "mcols"),
>> checkCoreGetterReturnedLength)) :
>>    error in evaluating the argument 'x' in selecting a method for
>> function 'unlist': Error in NROW(get(getter)(x)) :
>>    error in evaluating the argument 'x' in selecting a method for
>> function 'NROW': Error in get(getter)(x) :
>>    error in evaluating the argument 'x' in selecting a method for
>> function 'ranges':
>> Timing stopped at: 24.5 0.072 24.619
>>
>> sessionInfo()
>> R Under development (unstable) (2013-10-20 r64082)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>   [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C LC_TIME=en_US.UTF8
>> LC_COLLATE=en_US.UTF8
>>   [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8
>> LC_PAPER=en_US.UTF8       LC_NAME=C
>>   [9] LC_ADDRESS=C              LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel  stats     graphics  grDevices utils     datasets  methods
>>    base
>>
>> other attached packages:
>>   [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
>>   [3] AnnotationDbi_1.25.9                     Biobase_2.23.3
>>   [5] GenomicRanges_1.15.15                    XVector_0.3.5
>>   [7] IRanges_1.21.17                          BiocGenerics_0.9.2
>>   [9] vimcom_0.9-92                            setwidth_1.0-3
>> [11] colorout_1.0-1
>>
>> loaded via a namespace (and not attached):
>>   [1] biomaRt_2.19.1           Biostrings_2.31.5        bitops_1.0-6
>>           BSgenome_1.31.7
>>   [5] DBI_0.2-7                GenomicAlignments_0.99.9 RCurl_1.95-4.1
>>           Rsamtools_1.15.15
>>   [9] RSQLite_0.11.4           rtracklayer_1.23.6       stats4_3.1.0
>>           tools_3.1.0
>> [13] XML_3.98-1.1             zlibbioc_1.9.0
>>
>>
>>
>> cheers,
>> robert.
>>
>>
>> On 12/20/2013 06:31 PM, Ou, Jianhong wrote:
>>> In my case, looks like never end.
>>>
>>> I need to check my R first.
>>>
>>> Yours sincerely,
>>>
>>> Jianhong Ou
>>>
>>> LRB 670A
>>> Program in Gene Function and Expression
>>> 364 Plantation Street Worcester,
>>> MA 01605
>>>
>>>
>>>
>>>
>>> On 12/20/13 12:05 PM, "Hervé Pagès"<hpages at fhcrc.org>  wrote:
>>>
>>>> Hi Jianhong,
>>>>
>>>> According to my timings, it's a little bit slower than exonsBy() but
>>>> not that much. It has to do a little bit more work too as the introns
>>>> are not explicitly stored in the SQLite db (the exons are) but are
>>>> inferred from the exons and transcript boundaries.
>>>> So intronsByTranscript() has to retrieve all the exons + all the
>>>> transcripts from the db.
>>>>
>>>> intronsByTranscript():
>>>>
>>>>    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>    system.time(introns<-
>>>> intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>>    #   user  system elapsed
>>>>    #  9.165   0.076   9.263
>>>>    system.time(introns<-
>>>> intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>>    #   user  system elapsed
>>>>    #  4.824   0.064   4.896
>>>>
>>>> exonsBy():
>>>>
>>>>    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>    system.time(exons<- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>>    #   user  system elapsed
>>>>    #  7.720   0.072   7.812
>>>>    system.time(exons<- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>>    #   user  system elapsed
>>>>    #  4.229   0.028   4.265
>>>>
>>>> transcripts():
>>>>
>>>>    library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>    system.time(tx<- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>>    #   user  system elapsed
>>>>    #  1.424   0.008   1.436
>>>>    system.time(tx<- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>>    #   user  system elapsed
>>>>    #  0.776   0.012   0.790
>>>>
>>>> Less than 10 sec. to retrieve all the exons and transcripts from disk
>>>> and compute the 659327 introns. It's actually not that bad.
>>>>
>>>> Cheers,
>>>> H.
>>>>
>>>>
>>>> On 12/20/2013 08:25 AM, Ou, Jianhong wrote:
>>>>> Dear all,
>>>>>
>>>>> When I try to use intronsByTranscript to get introns for hg19 known
>>>>> genes, I found it is unacceptable slow. Does any body has the same
>>>>> problem?
>>>>>
>>>>> My code:
>>>>> library(GenomicFeatures)
>>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>> introns<- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>>
>>>>>> sessionInfo()
>>>>> R Under development (unstable) (2013-12-12 r64453)
>>>>> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel  stats     graphics  grDevices utils     datasets
>>>>>methods
>>>>>   base
>>>>>
>>>>> other attached packages:
>>>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
>>>>> [3] AnnotationDbi_1.25.9                     Biobase_2.23.3
>>>>> [5] GenomicRanges_1.15.15                    XVector_0.3.5
>>>>> [7] IRanges_1.21.17                          BiocGenerics_0.9.2
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>    [1] biomaRt_2.19.1           Biostrings_2.31.5        bitops_1.0-6
>>>>>          BSgenome_1.31.7
>>>>>    [5] DBI_0.2-7                GenomicAlignments_0.99.9
>>>>>RCurl_1.95-4.1
>>>>>          Rsamtools_1.15.15
>>>>>    [9] RSQLite_0.11.4           rtracklayer_1.23.6       stats4_3.1.0
>>>>>          tools_3.1.0
>>>>> [13] XML_3.98-1.1             zlibbioc_1.9.0
>>>>>
>>>>> Yours sincerely,
>>>>>
>>>>> Jianhong Ou
>>>>>
>>>>> LRB 670A
>>>>> Program in Gene Function and Expression
>>>>> 364 Plantation Street Worcester,
>>>>> MA 01605
>>>>>
>>>>>     [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone:  (206) 667-5791
>>>> Fax:    (206) 667-1319
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>
>-- 
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone:  (206) 667-5791
>Fax:    (206) 667-1319



More information about the Bioc-devel mailing list