[BioC] Queries about getting annotation post-Limma analysis
James W. MacDonald
jmacdon at uw.edu
Tue May 14 21:28:10 CEST 2013
Hi Paul (and Jeremy),
On 5/14/2013 3:11 PM, Paul Shannon wrote:
> Hi Jeremy,
>
>> how can I map the Affy ID which is found in the results from topTable to an ENSEMBL and an ENTREZ gene ID
> The bioc annotation package "hugene10stprobeset.db" and the "select" interface should provide all of you need.
Jeremy is using the Human Exon ST array, and summarizing at the
transcript level. So he needs the (non-existant)
huex10sttranscriptcluster.db package to do what you suggest.
I am building such a beast as we speak, but this array has almost 340K
probesets when summarized at the transcript level, so this is going
sloooowly.
Best,
Jim
>
>
> biocLite("hugene10stprobeset.db")
> library(hugene10stprobeset.db)
>
> # what kinds of data (what columns) are store in this annotation package?
> keytypes(hugene10stprobeset.db)
>
> # do a quick survey of each column
> for(key in keytypes(hugene10stprobeset.db)){
> print(paste("---", key))
> print(head(keys(hugene10stprobeset.db, keytype=key)))
> }
>
> # get a random sample of probe ids to use for testing
> sample.probe.ids<- sample(keys(hugene10stprobeset.db,keytype="PROBEID"), size=10)
>
> # look these up using the select command. a data.frame is returned
> select(hugene10stprobeset.db, keys=sample.probe.ids, cols=c("ENTREZID", "ENSEMBL", "SYMBOL"))
> PROBEID ENTREZID ENSEMBL SYMBOL
> 1 8165444 29952 ENSG00000176978 DPP7
> 2 7970230 23263 ENSG00000126217 MCF2L
> 3 8045081 2840 ENSG00000144230 GPR17
> 4 7989809 54878 ENSG00000074603 DPP8
> 5 8015557 23415 ENSG00000089558 KCNH4
> 6 7930787 5406 ENSG00000175535 PNLIP
> 7 7984142 9960 ENSG00000140455 USP3
> 8 7894624<NA> <NA> <NA>
> 9 8170511 79057 ENSG00000130032 PRRG3
> 10 8105007 55100 ENSG00000082068 WDR70
>
>
> - Paul
>
>
>
> On May 14, 2013, at 8:42 AM, Jeremy Ng wrote:
>
>> Dear all,
>>
>> Following the RMA normalization of data from Affy Human Exon ST1.0 array
>> using the package Oligo (at the transcript level using target="core"), I
>> then conducted a limma analysis.
>>
>> The topTable argument in limma would then retrieve the top genes (in my
>> case, cause I am interested in subsequently doing GSEA analysis, I set
>> number=100) which are differentially addressed.
>>
>> The question I have is how can I map the Affy ID which is found in the
>> results from topTable to an ENSEMBL and an ENTREZ gene ID. Intuitively,
>> biomaRt comes to mind, and I did a biomaRt query for the list of top 100
>> genes which I had gotten, but I get only 14 hgnc symbols. I'd like to think
>> that it's due to a lack of annotations, but I highly doubt so (14 in a list
>> of 100 seems too little to me).
>>
>> My code for biomaRt is as follows:
>> mart<- useMart("ensembl", "hsapiens_gene_ensembl")
>>
>> hgnc<- getBM(attributes=c("hgnc_symbol",
>> "ensembl_gene_id"),values=top100$ID, filters="affy_huex_1_0_st_v2",
>> mart=mart)
>>
>> I was wondering 2 things:
>> 1. Is there any plausible explanation to why the query only returns 14 IDs;
>> and
>> 2. Are there other ways that I can use to fetch annotations from a
>> post-Limma analysis?
>>
>> My session info is as follows:
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] C
>>
>> Thanks for any advice!
>>
>> Jeremy
>>
>> [[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list