[Bioc-devel] RE : AnnotationDbi and select function
Marc Carlson
mcarlson at fhcrc.org
Wed Mar 12 23:09:59 CET 2014
Also,
There is nothing wrong with using GENEID the way that you initially
did. It was just a small bug that prevented some internal subsetting
from working properly and that is now fixed.
It just happened that GENEID was equivalent to ENTREZID in this case.
And that ends up making it a slower choice just because the software has
to do more work (in case GENEID is something else). So since you know
that these are in fact ENTREZIDs, you can take Jims suggestion as a
short cut and thus get a little performance boost.
But it's still a less specific thing to request than GENEID (which could
potentially be another kind of ID). So the two things (GENEID and
ENTREZID) are not always the same kind of thing. They just happened to
both be ENTREZID in *this* case. In a different scenario GENEID from
the associated TranscriptDb might be something like an ensembl gene ID.
And then to use a shortcut would mean using ENSEMBL instead of ENTREZID
to do the shortcut...
In contrast: GENEID should normally always work (but it should also be a
tiny bit slower).
Sorry if you know all this stuff, but I think its better to be explicit
than to say too little.
Marc
On 03/12/2014 02:53 PM, Marc Carlson wrote:
> I just checked a fix in for this bug to GenomicFeatures (which happens
> to be where the problem was). It should percolate out to the build
> system soon.
>
> Marc
>
>
> On 03/12/2014 02:19 PM, Servant Nicolas wrote:
>> Hi guys,
>>
>> Thanks for your feedbacks.
>> Indeed I put GENEID because it is used in the txdb database.
>>
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>> columns(txdb)
>> [1] "CDSID" "CDSNAME" "CDSCHROM" "CDSSTRAND" "CDSSTART"
>> [6] "CDSEND" "EXONID" "EXONNAME" "EXONCHROM" "EXONSTRAND"
>> [11] "EXONSTART" "EXONEND" "GENEID" "TXID" "EXONRANK"
>> [16] "TXNAME" "TXCHROM" "TXSTRAND" "TXSTART" "TXEND"
>>
>> I will move to ENTREZID which is much faster !
>> I'm glad It could help
>> Nicolas
>>
>> ________________________________________
>> De : bioc-devel-bounces at r-project.org
>> [bioc-devel-bounces at r-project.org] de la part de Marc Carlson
>> [mcarlson at fhcrc.org]
>> Date d'envoi : mercredi 12 mars 2014 20:18
>> À : bioc-devel at r-project.org
>> Objet : Re: [Bioc-devel] AnnotationDbi and select function
>>
>> Thanks Nicolaus! That's a good bug. I will work on a fix. The reason
>> why James work-around here functions is because the number of databases
>> that it has to query is fewer by one. It is also faster for this
>> reason. So when you say GENEID you mean the ids used in the associated
>> txdb database which means that these have to be checked against that DB
>> (and anything related to it extracted) and then merged with the results
>> of the symbol information by joining on the foreign key for these two
>> DBs. So thats actually much more complex than just extracting all the
>> same data from just the org package even though the end result (in this
>> case) is the same. The bug is probably happening in the associated
>> merge step.
>>
>> Marc
>>
>>
>>
>> On 03/12/2014 10:06 AM, James W. MacDonald wrote:
>>> Hi Nicolas,
>>>
>>> On 3/12/2014 12:39 PM, Servant Nicolas wrote:
>>>> Dear all,
>>>>
>>>> I have an error using the select function from the AnnotationDbi
>>>> package.
>>>> I try to convert some geneID into Symbol, but for some strange
>>>> reasons it crashed.
>>>>
>>>>
>>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>>>> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
>>>> isActiveSeq(txdb)[seqlevels(txdb)] <- FALSE
>>>> isActiveSeq(txdb)[c("chr16","chr1")] <- TRUE
>>>> geneGR <- exonsBy(txdb, "gene")
>>>> library(Homo.sapiens)
>>>> symbol <- select(Homo.sapiens, keys = names(geneGR), keytype =
>>>> "GENEID", columns = "SYMBOL")
>>>> Erreur dans head(select(Homo.sapiens, keys = names(geneGR)[1:1001],
>>>> keytype = "GENEID", :
>>>> erreur d'évaluation de l'argument 'x' lors de la sélection d'une
>>>> méthode pour la fonction 'head' : Erreur dans res[,
>>>> .reverseColAbbreviations(x, cnames), drop = FALSE] :
>>>>
>>>>> length(geneGR)
>>>> [1] 3269
>>>> ## The first 1K work
>>>>> symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1000], keytype
>>>>> = "GENEID", columns = "SYMBOL")
>>>> ## The 1K+1 does not !
>>>>> symbol <- select(Homo.sapiens, keys = names(geneGR)[1:1001], keytype
>>>>> = "GENEID", columns = "SYMBOL")
>>>> Erreur dans res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
>>>> nombre de dimensions incorrect
>>>>
>>>> It looks like I cannot convert more than 1K elements ?? Any reason
>>>> for that ?
>>>> Thank you very much
>>>> Nicolas
>>> Not sure what 'GENEID' is in this context - it appears to be Entrez
>>> Gene. But anyway, if you use "ENTREZID" instead, it works fine:
>>>
>>>> symbol <- select(Homo.sapiens, names(geneGR), "SYMBOL", "ENTREZID")
>>>> symbol <- select(Homo.sapiens, names(geneGR), "GENEID", "ENTREZID")
>>> Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
>>> incorrect number of dimensions
>>>> symbol <- select(Homo.sapiens, names(geneGR)[1:1000], "GENEID",
>>> "ENTREZID")
>>>> symbol <- select(Homo.sapiens, names(geneGR)[1:1001], "GENEID",
>>> "ENTREZID")
>>> Error in res[, .reverseColAbbreviations(x, cnames), drop = FALSE] :
>>> incorrect number of dimensions
>>>
>>> Best,
>>>
>>> Jim
>>>
>>>
>>>
>>>>> sessionInfo()
>>>> R Under development (unstable) (2014-03-05 r65125)
>>>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
>>>> [3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
>>>> [5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
>>>> [7] LC_PAPER=fr_FR.UTF-8 LC_NAME=C
>>>> [9] LC_ADDRESS=C LC_TELEPHONE=C
>>>> [11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
>>>>
>>>> attached base packages:
>>>> [1] parallel stats graphics grDevices utils datasets methods
>>>> [8] base
>>>>
>>>> other attached packages:
>>>> [1] Homo.sapiens_1.1.2
>>>> [2] org.Hs.eg.db_2.10.1
>>>> [3] GO.db_2.10.1
>>>> [4] RSQLite_0.11.4
>>>> [5] DBI_0.2-7
>>>> [6] OrganismDbi_1.5.3
>>>> [7] XVector_0.3.7
>>>> [8] TxDb.Hsapiens.UCSC.hg19.knownGene_2.10.1
>>>> [9] GenomicFeatures_1.15.9
>>>> [10] AnnotationDbi_1.25.14
>>>> [11] GenomeInfoDb_0.99.17
>>>> [12] Biobase_2.23.6
>>>> [13] GenomicRanges_1.15.32
>>>> [14] IRanges_1.21.32
>>>> [15] BiocGenerics_0.9.3
>>>> [16] RColorBrewer_1.0-5
>>>> [17] reshape2_1.2.2
>>>> [18] reshape_0.8.4
>>>> [19] plyr_1.8.1
>>>> [20] ggplot2_0.9.3.1
>>>> [21] Matrix_1.1-2-2
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] BatchJobs_1.2 BBmisc_1.5
>>>> [3] BiocParallel_0.5.16 biomaRt_2.19.3
>>>> [5] Biostrings_2.31.14 bitops_1.0-6
>>>> [7] brew_1.0-6 BSgenome_1.31.12
>>>> [9] codetools_0.2-8 colorspace_1.2-4
>>>> [11] dichromat_2.0-0 digest_0.6.4
>>>> [13] fail_1.2 foreach_1.4.1
>>>> [15] GenomicAlignments_0.99.29 graph_1.41.3
>>>> [17] grid_3.1.0 gtable_0.1.2
>>>> [19] iterators_1.0.6 labeling_0.2
>>>> [21] lattice_0.20-27 MASS_7.3-29
>>>> [23] munsell_0.4.2 proto_0.3-10
>>>> [25] RBGL_1.39.2 Rcpp_0.11.0
>>>> [27] RCurl_1.95-4.1 Rsamtools_1.15.32
>>>> [29] rtracklayer_1.23.15 scales_0.2.3
>>>> [31] sendmailR_1.1-2 stats4_3.1.0
>>>> [33] stringr_0.6.2 tools_3.1.0
>>>> [35] XML_3.98-1.1 zlibbioc_1.9.0
>>>>
>>>> [[alternative HTML version deleted]]
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioc-devel at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>> _______________________________________________
>> Bioc-devel at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/bioc-devel
>
> _______________________________________________
> Bioc-devel at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/bioc-devel
More information about the Bioc-devel
mailing list