[Bioc-sig-seq] questions about GenomicFeatures

Martin Morgan mtmorgan at fhcrc.org
Thu Jun 30 08:13:44 CEST 2011


On 06/29/2011 04:18 PM, Kunbin Qu wrote:
> Hi,
>
> When I tried to use GenomicFeatures, how could I get the gene symbol from the transcripts built from hg19? I had the following commands, and I expected to see names like "TP53", "MMP11", "UBE2E" etc, but instead, I only had the numbers (which did not add much value) when I used the names().
>
> -Kunbin
>
>
>> hg19kg<-makeTranscriptDbFromUCSC(genome="hg19", tablename="knownGene")
>> GR<-transcripts(hg19kg, vals<-list(tx_chrom="chr1", tx_strand="+"))
>> GRList<-transcriptsBy(hg19kg, by="gene")
>> names(GRList)[1:20]
>   [1] "1"         "10"        "100"       "1000"      "10000"     "100008586"
>   [7] "100008587" "100009676" "10001"     "10002"     "10003"     "100033413"
> [13] "100033414" "100033415" "100033416" "100033417" "100033420" "100033422"
> [19] "100033423" "100033424"

Hi Kunbin

These are ENTREZ gene ids, and you're after (the much more ambiguous) 
SYMBOL identifiers. Use

   nms <- GRList[1:20]
   library(org.Hs.eg.db)
   map <- org.Hs.egSYMBOL
   toTable(map[nms])

or maybe mget(nms, map, ifnotfound=MA) and processing

Martin

>> sessionInfo()
> R version 2.11.0 (2010-04-22)
> x86_64-unknown-linux-gnu
>
> locale:
>   [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>   [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>   [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>   [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>   [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] tcltk     stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
>   [1] GenomicFeatures_1.0.4 DEGseq_2.0.0          samr_1.28
>   [4] impute_1.24.0         ShortRead_1.6.2       Rsamtools_1.0.1
>   [7] lattice_0.19-11       Biostrings_2.16.7     GenomicRanges_1.0.1
> [10] IRanges_1.6.8         qvalue_1.22.0
>
> loaded via a namespace (and not attached):
>   [1] Biobase_2.8.0     biomaRt_2.4.0     BSgenome_1.17.1   DBI_0.2-5
>   [5] grid_2.11.0       hwriter_1.3       RCurl_1.4-3       RSQLite_0.9-2
>   [9] rtracklayer_1.8.1 tools_2.11.0      XML_3.1-1
>>
>
>
> ______________________________________________________________________
> The contents of this electronic message, including any attachments, are intended only for the use of the individual or entity to which they are addressed and may contain confidential information. If you are not the intended recipient, you are hereby notified that any use, dissemination, distribution, or copying of this message or any attachment is strictly prohibited. If you have received this transmission in error, please send an e-mail to postmaster at genomichealth.com and delete this message, along with any attachments, from your computer.
> 	[[alternative HTML version deleted]]
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> Bioc-sig-sequencing at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


-- 
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793



More information about the Bioc-sig-sequencing mailing list