[Bioc-devel] very slow to use intronsByTranscript in GenomicFeatures
Ou, Jianhong
Jianhong.Ou at umassmed.edu
Fri Dec 20 21:23:45 CET 2013
Thanks Herve, Robert,
Now it works good.
Yours sincerely,
Jianhong Ou
LRB 670A
Program in Gene Function and Expression
364 Plantation Street Worcester,
MA 01605
On 12/20/13 2:25 PM, "Hervé Pagès" <hpages at fhcrc.org> wrote:
>Hi Robert, Jianhong,
>
>This could be related to some changes to the relist() and split() code
>that I made a few days ago in IRanges. I didn't immediately make the
>corresponding changes to GenomicRanges and GenomicAlignments so
>relisting or splitting GRanges and GAlignments objects was broken for
>a couple of days, which had all kinds of nasty consequences in many
>places (relist() and split() are used a lot internally).
>
>Updated versions of GenomicRanges and GenomicAlignments are now online
>so please make sure you have the latest versions (1.15.17 for
>GenomicRanges and 0.99.10 for GenomicAlignments).
>
>Sorry for the inconvenience and please let me know if you still run
>into problems with this.
>
>H.
>
>On 12/20/2013 10:15 AM, Robert Castelo wrote:
>> hi,
>>
>> i can reproduce what Jianhong says, i noticed it earlier this week but
>> didn't mention because we all know devel is a moving target and so on,
>> but since this has been raised now i'll report what i'm getting.
>>
>> so, this is for Jianhong, if you downgrade the following packages to
>> these particular versions:
>>
>> Biostrings_2.31.3.tar.gz
>> GenomicRanges_1.15.15.tar.gz
>> IRanges_1.21.13.tar.gz
>> XVector_0.3.2.tar.gz
>>
>> you'll be all fine, unless you need some functionality of later versions
>> of them, here is the test with the session information:
>>
>>
>>suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>)
>> Warning messages:
>> 1: multiple methods tables found for Œrname¹
>> 2: multiple methods tables found for Œrname<-¹
>> 3: multiple methods tables found for Œcigar¹
>> 4: multiple methods tables found for Œqwidth¹
>> 5: multiple methods tables found for Œintrons¹
>> system.time(txbygene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,
>> "gene"))
>> user system elapsed
>> 2.524 0.046 2.575
>>
>> sessionInfo()
>> R Under development (unstable) (2013-10-20 r64082)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C
>> [3] LC_TIME=en_US.UTF8 LC_COLLATE=en_US.UTF8
>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
>> [7] LC_PAPER=en_US.UTF8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
>> [2] GenomicFeatures_1.15.4
>> [3] AnnotationDbi_1.25.9
>> [4] Biobase_2.23.3
>> [5] GenomicRanges_1.15.11
>> [6] XVector_0.3.2
>> [7] IRanges_1.21.13
>> [8] BiocGenerics_0.9.2
>> [9] vimcom_0.9-92
>> [10] setwidth_1.0-3
>> [11] colorout_1.0-1
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.19.1 Biostrings_2.31.3 bitops_1.0-6
>> [4] BSgenome_1.31.7 DBI_0.2-7 GenomicAlignments_0.99.9
>> [7] RCurl_1.95-4.1 Rsamtools_1.15.15 RSQLite_0.11.4
>> [10] rtracklayer_1.23.6 stats4_3.1.0 tools_3.1.0
>> [13] XML_3.98-1.1 zlibbioc_1.9.0
>>
>>
>> however, if you go to the bleeding edge of devel BioC:
>>
>>
>>suppressPackageStartupMessages(library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>)
>> system.time(txbygene <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene,
>> "gene"))
>>
>> the previous call never ends until you press CTRL+C:
>>
>> ^C
>> Error in unlist(lapply(c("seqnames", "ranges", "strand", "mcols"),
>> checkCoreGetterReturnedLength)) :
>> error in evaluating the argument 'x' in selecting a method for
>> function 'unlist': Error in NROW(get(getter)(x)) :
>> error in evaluating the argument 'x' in selecting a method for
>> function 'NROW': Error in get(getter)(x) :
>> error in evaluating the argument 'x' in selecting a method for
>> function 'ranges':
>> Timing stopped at: 24.5 0.072 24.619
>>
>> sessionInfo()
>> R Under development (unstable) (2013-10-20 r64082)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>> [1] LC_CTYPE=en_US.UTF8 LC_NUMERIC=C LC_TIME=en_US.UTF8
>> LC_COLLATE=en_US.UTF8
>> [5] LC_MONETARY=en_US.UTF8 LC_MESSAGES=en_US.UTF8
>> LC_PAPER=en_US.UTF8 LC_NAME=C
>> [9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF8
>> LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
>> [3] AnnotationDbi_1.25.9 Biobase_2.23.3
>> [5] GenomicRanges_1.15.15 XVector_0.3.5
>> [7] IRanges_1.21.17 BiocGenerics_0.9.2
>> [9] vimcom_0.9-92 setwidth_1.0-3
>> [11] colorout_1.0-1
>>
>> loaded via a namespace (and not attached):
>> [1] biomaRt_2.19.1 Biostrings_2.31.5 bitops_1.0-6
>> BSgenome_1.31.7
>> [5] DBI_0.2-7 GenomicAlignments_0.99.9 RCurl_1.95-4.1
>> Rsamtools_1.15.15
>> [9] RSQLite_0.11.4 rtracklayer_1.23.6 stats4_3.1.0
>> tools_3.1.0
>> [13] XML_3.98-1.1 zlibbioc_1.9.0
>>
>>
>>
>> cheers,
>> robert.
>>
>>
>> On 12/20/2013 06:31 PM, Ou, Jianhong wrote:
>>> In my case, looks like never end.
>>>
>>> I need to check my R first.
>>>
>>> Yours sincerely,
>>>
>>> Jianhong Ou
>>>
>>> LRB 670A
>>> Program in Gene Function and Expression
>>> 364 Plantation Street Worcester,
>>> MA 01605
>>>
>>>
>>>
>>>
>>> On 12/20/13 12:05 PM, "Hervé Pagès"<hpages at fhcrc.org> wrote:
>>>
>>>> Hi Jianhong,
>>>>
>>>> According to my timings, it's a little bit slower than exonsBy() but
>>>> not that much. It has to do a little bit more work too as the introns
>>>> are not explicitly stored in the SQLite db (the exons are) but are
>>>> inferred from the exons and transcript boundaries.
>>>> So intronsByTranscript() has to retrieve all the exons + all the
>>>> transcripts from the db.
>>>>
>>>> intronsByTranscript():
>>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>> system.time(introns<-
>>>> intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>> # user system elapsed
>>>> # 9.165 0.076 9.263
>>>> system.time(introns<-
>>>> intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>> # user system elapsed
>>>> # 4.824 0.064 4.896
>>>>
>>>> exonsBy():
>>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>> system.time(exons<- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>> # user system elapsed
>>>> # 7.720 0.072 7.812
>>>> system.time(exons<- exonsBy(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>> # user system elapsed
>>>> # 4.229 0.028 4.265
>>>>
>>>> transcripts():
>>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>> system.time(tx<- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>> # user system elapsed
>>>> # 1.424 0.008 1.436
>>>> system.time(tx<- transcripts(TxDb.Hsapiens.UCSC.hg19.knownGene))
>>>> # user system elapsed
>>>> # 0.776 0.012 0.790
>>>>
>>>> Less than 10 sec. to retrieve all the exons and transcripts from disk
>>>> and compute the 659327 introns. It's actually not that bad.
>>>>
>>>> Cheers,
>>>> H.
>>>>
>>>>
>>>> On 12/20/2013 08:25 AM, Ou, Jianhong wrote:
>>>>> Dear all,
>>>>>
>>>>> When I try to use intronsByTranscript to get introns for hg19 known
>>>>> genes, I found it is unacceptable slow. Does any body has the same
>>>>> problem?
>>>>>
>>>>> My code:
>>>>> library(GenomicFeatures)
>>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>> introns<- intronsByTranscript(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>>>
>>>>>> sessionInfo()
>>>>> R Under development (unstable) (2013-12-12 r64453)
>>>>> Platform: x86_64-apple-darwin12.5.0 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>>>>
>>>>> attached base packages:
>>>>> [1] parallel stats graphics grDevices utils datasets
>>>>>methods
>>>>> base
>>>>>
>>>>> other attached packages:
>>>>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1 GenomicFeatures_1.15.4
>>>>> [3] AnnotationDbi_1.25.9 Biobase_2.23.3
>>>>> [5] GenomicRanges_1.15.15 XVector_0.3.5
>>>>> [7] IRanges_1.21.17 BiocGenerics_0.9.2
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>> [1] biomaRt_2.19.1 Biostrings_2.31.5 bitops_1.0-6
>>>>> BSgenome_1.31.7
>>>>> [5] DBI_0.2-7 GenomicAlignments_0.99.9
>>>>>RCurl_1.95-4.1
>>>>> Rsamtools_1.15.15
>>>>> [9] RSQLite_0.11.4 rtracklayer_1.23.6 stats4_3.1.0
>>>>> tools_3.1.0
>>>>> [13] XML_3.98-1.1 zlibbioc_1.9.0
>>>>>
>>>>> Yours sincerely,
>>>>>
>>>>> Jianhong Ou
>>>>>
>>>>> LRB 670A
>>>>> Program in Gene Function and Expression
>>>>> 364 Plantation Street Worcester,
>>>>> MA 01605
>>>>>
>>>>> [[alternative HTML version deleted]]
>>>>>
>>>>> _______________________________________________
>>>>> Bioc-devel at r-project.org mailing list
>>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>>>
>>>>
>>>> --
>>>> Hervé Pagès
>>>>
>>>> Program in Computational Biology
>>>> Division of Public Health Sciences
>>>> Fred Hutchinson Cancer Research Center
>>>> 1100 Fairview Ave. N, M1-B514
>>>> P.O. Box 19024
>>>> Seattle, WA 98109-1024
>>>>
>>>> E-mail: hpages at fhcrc.org
>>>> Phone: (206) 667-5791
>>>> Fax: (206) 667-1319
>>>
>>> _______________________________________________
>>> Bioc-devel at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>>>
>>
>
>--
>Hervé Pagès
>
>Program in Computational Biology
>Division of Public Health Sciences
>Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N, M1-B514
>P.O. Box 19024
>Seattle, WA 98109-1024
>
>E-mail: hpages at fhcrc.org
>Phone: (206) 667-5791
>Fax: (206) 667-1319
More information about the Bioc-devel
mailing list