[Bioc-sig-seq] questions about GenomicFeatures
Martin Morgan
mtmorgan at fhcrc.org
Thu Jun 30 08:13:44 CEST 2011
On 06/29/2011 04:18 PM, Kunbin Qu wrote:
> Hi,
>
> When I tried to use GenomicFeatures, how could I get the gene symbol from the transcripts built from hg19? I had the following commands, and I expected to see names like "TP53", "MMP11", "UBE2E" etc, but instead, I only had the numbers (which did not add much value) when I used the names().
>
> -Kunbin
>
>
>> hg19kg<-makeTranscriptDbFromUCSC(genome="hg19", tablename="knownGene")
>> GR<-transcripts(hg19kg, vals<-list(tx_chrom="chr1", tx_strand="+"))
>> GRList<-transcriptsBy(hg19kg, by="gene")
>> names(GRList)[1:20]
> [1] "1" "10" "100" "1000" "10000" "100008586"
> [7] "100008587" "100009676" "10001" "10002" "10003" "100033413"
> [13] "100033414" "100033415" "100033416" "100033417" "100033420" "100033422"
> [19] "100033423" "100033424"
Hi Kunbin
These are ENTREZ gene ids, and you're after (the much more ambiguous)
SYMBOL identifiers. Use
nms <- GRList[1:20]
library(org.Hs.eg.db)
map <- org.Hs.egSYMBOL
toTable(map[nms])
or maybe mget(nms, map, ifnotfound=MA) and processing
Martin
>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-unknown-linux-gnu
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] tcltk stats graphics grDevices utils datasets methods
> [8] base
>
> other attached packages:
> [1] GenomicFeatures_1.0.4 DEGseq_2.0.0 samr_1.28
> [4] impute_1.24.0 ShortRead_1.6.2 Rsamtools_1.0.1
> [7] lattice_0.19-11 Biostrings_2.16.7 GenomicRanges_1.0.1
> [10] IRanges_1.6.8 qvalue_1.22.0
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.8.0 biomaRt_2.4.0 BSgenome_1.17.1 DBI_0.2-5
> [5] grid_2.11.0 hwriter_1.3 RCurl_1.4-3 RSQLite_0.9-2
> [9] rtracklayer_1.8.1 tools_2.11.0 XML_3.1-1
>>
>
>
> ______________________________________________________________________
> The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer.
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
More information about the Bioc-sig-sequencing
mailing list