[BioC] known gene => gene symbol for UCSC
Ido Tamir
tamir at imp.ac.at
Tue Apr 2 16:20:55 CEST 2013
Dear James,
thank you very much. It did not work with:
a fresh session (a) and an old session (b)
Your packages seem newer, but my ids are from the old package, so they
should be consistent.
I just installed the bioconductor packages Mus.musculus and TxDb.Mmusculus.UCSC.mm10.ensGene today.
best,
ido
a) fresh session
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Mus.musculus_1.0.0
[2] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
[3] org.Mm.eg.db_2.8.0
[4] GO.db_2.8.0
[5] RSQLite_0.11.2
[6] DBI_0.2-5
[7] OrganismDbi_1.0.3
[8] GenomicFeatures_1.10.2
[9] GenomicRanges_1.10.7
[10] IRanges_1.16.6
[11] AnnotationDbi_1.20.6
[12] Biobase_2.18.0
[13] BiocGenerics_0.4.0
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 Biostrings_2.26.3 bitops_1.0-5 BSgenome_1.26.1
[5] graph_1.36.2 parallel_2.15.1 RBGL_1.34.0 RCurl_1.95-4.1
[9] Rsamtools_1.10.2 rtracklayer_1.18.2 stats4_2.15.1 tools_2.15.1
[13] XML_3.95-0.2 zlibbioc_1.4.0
b) my old session:
> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-unknown-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] TxDb.Mmusculus.UCSC.mm10.knownGene_2.8.0
[2] BiocInstaller_1.8.3
[3] Mus.musculus_1.0.0
[4] TxDb.Mmusculus.UCSC.mm10.ensGene_2.8.0
[5] org.Mm.eg.db_2.8.0
[6] GO.db_2.8.0
[7] RSQLite_0.11.2
[8] DBI_0.2-5
[9] OrganismDbi_1.0.3
[10] TxDb.Mmusculus.UCSC.mm9.knownGene_2.8.0
[11] GenomicFeatures_1.10.2
[12] AnnotationDbi_1.20.6
[13] Biobase_2.18.0
[14] Rsamtools_1.10.2
[15] Biostrings_2.26.3
[16] TransView_1.0.7
[17] Repitools_1.4.2
[18] GenomicRanges_1.10.7
[19] IRanges_1.16.6
[20] BiocGenerics_0.4.0
[21] ggbio_1.6.6
[22] ggplot2_0.9.3.1
loaded via a namespace (and not attached):
[1] biomaRt_2.14.0 biovizBase_1.6.2 bitops_1.0-5
[4] BSgenome_1.26.1 cluster_1.14.3 colorspace_1.2-1
[7] dichromat_2.0-0 digest_0.6.3 edgeR_3.0.8
[10] gdata_2.12.0 gplots_2.11.0 graph_1.36.2
[13] grid_2.15.1 gridExtra_0.9.1 gtable_0.1.2
[16] gtools_2.7.0 Hmisc_3.10-1 labeling_0.1
[19] lattice_0.20-13 limma_3.14.4 MASS_7.3-23
[22] munsell_0.4 parallel_2.15.1 plyr_1.8
[25] proto_0.3-10 RBGL_1.34.0 RColorBrewer_1.0-5
[28] RCurl_1.95-4.1 reshape2_1.2.2 rtracklayer_1.18.2
[31] scales_0.2.3 stats4_2.15.1 stringr_0.6.2
[34] tools_2.15.1 VariantAnnotation_1.4.12 XML_3.95-0.2
[37] zlibbioc_1.4.0
On Apr 2, 2013, at 3:49 PM, James W. MacDonald wrote:
> Hi Ido,
>
> You don't give sessionInfo() results, but this works for me
>
>> select(Mus.musculus, "uc009veu.1", "SYMBOL","TXNAME")
> TXNAME SYMBOL
> 1 uc009veu.1 Zglp1
>
>> sessionInfo()
> R Under development (unstable) (2013-01-22 r61734)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] parallel stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] Mus.musculus_1.1.0
> [2] TxDb.Mmusculus.UCSC.mm10.knownGene_2.9.0
> [3] org.Mm.eg.db_2.9.0
> [4] GO.db_2.9.0
> [5] RSQLite_0.11.2
> [6] DBI_0.2-5
> [7] OrganismDbi_1.1.14
> [8] GenomicFeatures_1.11.16
> [9] GenomicRanges_1.11.44
> [10] IRanges_1.17.42
> [11] AnnotationDbi_1.21.16
> [12] Biobase_2.19.3
> [13] BiocGenerics_0.5.6
>
> loaded via a namespace (and not attached):
> [1] biomaRt_2.15.1 Biostrings_2.27.14 bitops_1.0-5
> [4] BSgenome_1.27.1 graph_1.37.7 RBGL_1.35.0
> [7] RCurl_1.95-4.1 Rsamtools_1.11.27 rtracklayer_1.19.11
> [10] stats4_3.0.0 tools_3.0.0 XML_3.96-1.1
> [13] zlibbioc_1.5.0
>
>
>
> On 4/2/2013 9:43 AM, Ido Tamir wrote:
>> Hi,
>> how is one supposed to go from ucsc known gene id to gene symbols.
>>
>>> cols(TxDb.Mmusculus.UCSC.mm9.knownGene)
>> [1] "CDSID" "CDSNAME" "CDSCHROM" "CDSSTRAND" "CDSSTART"
>> [6] "CDSEND" "EXONID" "EXONNAME" "EXONCHROM" "EXONSTRAND"
>> [11] "EXONSTART" "EXONEND" "GENEID" "TXID" "EXONRANK"
>> [16] "TXNAME" "TXCHROM" "TXSTRAND" "TXSTART" "TXEND"
>>
>> I don't see anything that would me allow to link this with e.g. Mus.musculus
>>
>>> select(txdb, keys=c(100009600), cols=cols(txdb) ,keytype="GENEID")
>> GENEID CDSID CDSNAME CDSCHROM CDSSTRAND CDSSTART CDSEND EXONID EXONNAME
>> 1 100009600 112799<NA> chr9 - 20871384 20871523 129355<NA>
>> 2 100009600 112798<NA> chr9 - 20870468 20870821 129354<NA>
>> 3 100009600 112797<NA> chr9 - 20867758 20867840 129353<NA>
>> 4 100009600 112796<NA> chr9 - 20867338 20867431 129352<NA>
>> 5 100009600 112795<NA> chr9 - 20867032 20867161 129351<NA>
>> EXONCHROM EXONSTRAND EXONSTART EXONEND TXID EXONRANK TXNAME TXCHROM
>> 1 chr9 - 20871384 20872369 28943 1 uc009veu.1 chr9
>> 2 chr9 - 20870468 20870821 28943 2 uc009veu.1 chr9
>> 3 chr9 - 20867758 20867840 28943 3 uc009veu.1 chr9
>> 4 chr9 - 20867338 20867431 28943 4 uc009veu.1 chr9
>> 5 chr9 - 20866837 20867161 28943 5 uc009veu.1 chr9
>> TXSTRAND TXSTART TXEND
>> 1 - 20866837 20872369
>> 2 - 20866837 20872369
>> 3 - 20866837 20872369
>> 4 - 20866837 20872369
>> 5 - 20866837 20872369
>>
>>> cols(Mus.musculus)
>> [1] "GOID" "TERM" "ONTOLOGY" "DEFINITION" "ENTREZID"
>> [6] "PFAM" "IPI" "PROSITE" "ACCNUM" "ALIAS"
>> [11] "CHR" "CHRLOC" "CHRLOCEND" "ENZYME" "PATH"
>> [16] "PMID" "REFSEQ" "SYMBOL" "UNIGENE" "ENSEMBL"
>> [21] "ENSEMBLPROT" "ENSEMBLTRANS" "GENENAME" "UNIPROT" "GO"
>> [26] "EVIDENCE" "GOALL" "EVIDENCEALL" "ONTOLOGYALL" "MGI"
>> [31] "CDSID" "CDSNAME" "CDSCHROM" "CDSSTRAND" "CDSSTART"
>> [36] "CDSEND" "EXONID" "EXONNAME" "EXONCHROM" "EXONSTRAND"
>> [41] "EXONSTART" "EXONEND" "GENEID" "TXID" "EXONRANK"
>> [46] "TXNAME" "TXCHROM" "TXSTRAND" "TXSTART" "TXEND"
>>
>>
>>> select(Mus.musculus,keys="uc009veu.1", cols=c("SYMBOL"), keytype="TXNAME")
>> Error in .testIfKeysAreOfProposedKeytype(x, keys, keytype) :
>> None of the keys entered are valid keys for the keytype specified.
>>
>> thank you very much,
>> ido
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> University of Washington
> Environmental and Occupational Health Sciences
> 4225 Roosevelt Way NE, # 100
> Seattle WA 98105-6099
>
More information about the Bioconductor
mailing list