[BioC] Queries about getting annotation post-Limma analysis

Tue May 14 21:11:02 CEST 2013

Hi Jeremy,

> how can I map the Affy ID which is found in the results from topTable to an ENSEMBL and an ENTREZ gene ID

The bioc annotation package "hugene10stprobeset.db" and the "select" interface should provide all of you need.

     biocLite("hugene10stprobeset.db")
     library(hugene10stprobeset.db)

        # what kinds of data (what columns) are store in this annotation package?
     keytypes(hugene10stprobeset.db)

        # do a quick survey of each column
     for(key in keytypes(hugene10stprobeset.db)){
         print(paste("---", key))
         print(head(keys(hugene10stprobeset.db, keytype=key)))
         }

         # get a random sample of probe ids to use for testing
     sample.probe.ids  <- sample(keys(hugene10stprobeset.db,keytype="PROBEID"), size=10)

         # look these up using the select command. a data.frame is returned
     select(hugene10stprobeset.db, keys=sample.probe.ids, cols=c("ENTREZID", "ENSEMBL", "SYMBOL"))
        PROBEID ENTREZID         ENSEMBL SYMBOL
     1  8165444    29952 ENSG00000176978   DPP7
     2  7970230    23263 ENSG00000126217  MCF2L
     3  8045081     2840 ENSG00000144230  GPR17
     4  7989809    54878 ENSG00000074603   DPP8
     5  8015557    23415 ENSG00000089558  KCNH4
     6  7930787     5406 ENSG00000175535  PNLIP
     7  7984142     9960 ENSG00000140455   USP3
     8  7894624     <NA>            <NA>   <NA>
     9  8170511    79057 ENSG00000130032  PRRG3
     10 8105007    55100 ENSG00000082068  WDR70

 - Paul

On May 14, 2013, at 8:42 AM, Jeremy Ng wrote:

> Dear all,
> 
> Following the RMA normalization of data from Affy Human Exon ST1.0 array
> using the package Oligo (at the transcript level using target="core"), I
> then conducted a limma analysis.
> 
> The topTable argument in limma would then retrieve the top genes (in my
> case, cause I am interested in subsequently doing GSEA analysis, I set
> number=100) which are differentially addressed.
> 
> The question I have is how can I map the Affy ID which is found in the
> results from topTable to an ENSEMBL and an ENTREZ gene ID. Intuitively,
> biomaRt comes to mind, and I did a biomaRt query for the list of top 100
> genes which I had gotten, but I get only 14 hgnc symbols. I'd like to think
> that it's due to a lack of annotations, but I highly doubt so (14 in a list
> of 100 seems too little to me).
> 
> My code for biomaRt is as follows:
> mart <- useMart("ensembl", "hsapiens_gene_ensembl")
> 
> hgnc <- getBM(attributes=c("hgnc_symbol",
> "ensembl_gene_id"),values=top100$ID, filters="affy_huex_1_0_st_v2",
> mart=mart)
> 
> I was wondering 2 things:
> 1. Is there any plausible explanation to why the query only returns 14 IDs;
> and
> 2. Are there other ways that I can use to fetch annotations from a
> post-Limma analysis?
> 
> My session info is as follows:
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
> 
> locale:
> [1] C
> 
> Thanks for any advice!
> 
> Jeremy
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor