[BioC] problem with TXNAME -> SYMBOL mapping in Homo.sapiens library

Marc Carlson mcarlson at fhcrc.org
Fri Sep 27 23:53:36 CEST 2013


Hi Aleksandra,

That's a good question!

So first of all you may want to know that the newer packages don't even 
have a name for that transcript.  It has been dropped from the latest 
Transcriptomes coming out of UCSC.

But it's still a great question, so allow me to also answer about what 
is happening in this older data that you are using.  In these older 
packages, there was a transcript name from UCSC, but it was *not* 
associated with any GENE IDs.  Thus it is a valid key, because it can be 
mapped to "some" values inside the transcriptome, but it is not mappable 
to anything "outside" of the Transcriptome.  You almost had enough 
information to see this for yourself with the select queries that you ran.

So for example if you did the following select:

select(Homo.sapiens, cols=c("GENEID","TXSTART"), keys= "uc021wml.1", 
keytype="TXNAME")

You will get:
       TXNAME GENEID  TXSTART
1 uc021wml.1   <NA> 22385572

This actually tells you that while there *is* transcript information for 
this name ("TXCHROM" etc. will also work), there is still no GENEID 
associated with it.  Unfortunately: no gene ID means there is also no 
way to look up information like gene SYMBOL or any other data that is 
associated at the gene level.

So the short answer is that there is no gene symbol for this transcript 
name because we don't have any way to know what gene it belongs to.

Hope this helps,


   Marc


On 09/27/2013 02:28 AM, Aleksandra Pfeifer [guest] wrote:
> Hello,
>    I have a problem with the maping from txname to symbol of the gene. For most transcripts it works ok, but for some it doesn't:
>
>> library(Homo.sapiens)
>> select(Homo.sapiens, cols="SYMBOL", keys= "uc021wml.1", keytype="TXNAME")
> Error in .testIfKeysAreOfProposedKeytype(x, keys, keytype) :
>    None of the keys entered are valid keys for the keytype specified.
>
> The traceback is as follows:
>> traceback()
> 10: stop("None of the keys entered are valid keys for the keytype specified.")
> 9: .testIfKeysAreOfProposedKeytype(x, keys, keytype)
> 8: .select(x, keys, cols, keytype, jointype = jointype)
> 7: .local(x, keys, cols, keytype, ...)
> 6: select(.makeReal(nodeName), keys = fromKeys, cols = needCols[[nodeName]],
>         keytype = toKey)
> 5: select(.makeReal(nodeName), keys = fromKeys, cols = needCols[[nodeName]],
>         keytype = toKey)
> 4: .getSelects(x, keytype, keys, needCols, visitNodes)
> 3: .select(x, keys, cols, keytype, ...)
> 2: select(Homo.sapiens, cols = "SYMBOL", keys = "uc021wml.1", keytype = "TXNAME")
> 1: select(Homo.sapiens, cols = "SYMBOL", keys = "uc021wml.1", keytype = "TXNAME")
>
>
> However, When I try to check whether the problematic txname is present in Homo.sapiens database, it occurs that it is there. I can also find some other information about this transcript:
>> "uc021wml.1" %in% keys(Homo.sapiens, keytype="TXNAME")
> [1] TRUE
>> select(Homo.sapiens, cols="TXSTART", keys= "uc021wml.1", keytype="TXNAME")
>        TXNAME  TXSTART
> 1 uc021wml.1 22385572
>
> Is there a way to solve that problem? I would be appreciated for your help.
>
> Best regards,
> Aleksandra Pfeifer
>
>
>
>   -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] C
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] Homo.sapiens_1.1.1
>   [2] TxDb.Hsapiens.UCSC.hg19.knownGene_2.9.2
>   [3] org.Hs.eg.db_2.9.0
>   [4] GO.db_2.9.0
>   [5] RSQLite_0.11.4
>   [6] DBI_0.2-7
>   [7] OrganismDbi_1.2.0
>   [8] GenomicFeatures_1.12.4
>   [9] GenomicRanges_1.12.5
> [10] IRanges_1.18.4
> [11] AnnotationDbi_1.22.6
> [12] Biobase_2.20.1
> [13] BiocGenerics_0.6.0
>
> loaded via a namespace (and not attached):
>   [1] BSgenome_1.28.0    Biostrings_2.28.0  RBGL_1.36.2        RCurl_1.95-4.1
>   [5] Rsamtools_1.12.4   XML_3.98-1.1       biomaRt_2.16.0     bitops_1.0-6
>   [9] graph_1.38.3       rtracklayer_1.20.4 stats4_3.0.1       tools_3.0.1
> [13] zlibbioc_1.6.0
>
> --
> Sent via the guest posting facility at bioconductor.org.
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor



More information about the Bioconductor mailing list