[BioC] Queries about getting annotation post-Limma analysis
Paul Shannon
pshannon at fhcrc.org
Tue May 14 21:11:02 CEST 2013
Hi Jeremy,
> how can I map the Affy ID which is found in the results from topTable to an ENSEMBL and an ENTREZ gene ID
The bioc annotation package "hugene10stprobeset.db" and the "select" interface should provide all of you need.
biocLite("hugene10stprobeset.db")
library(hugene10stprobeset.db)
# what kinds of data (what columns) are store in this annotation package?
keytypes(hugene10stprobeset.db)
# do a quick survey of each column
for(key in keytypes(hugene10stprobeset.db)){
print(paste("---", key))
print(head(keys(hugene10stprobeset.db, keytype=key)))
}
# get a random sample of probe ids to use for testing
sample.probe.ids <- sample(keys(hugene10stprobeset.db,keytype="PROBEID"), size=10)
# look these up using the select command. a data.frame is returned
select(hugene10stprobeset.db, keys=sample.probe.ids, cols=c("ENTREZID", "ENSEMBL", "SYMBOL"))
PROBEID ENTREZID ENSEMBL SYMBOL
1 8165444 29952 ENSG00000176978 DPP7
2 7970230 23263 ENSG00000126217 MCF2L
3 8045081 2840 ENSG00000144230 GPR17
4 7989809 54878 ENSG00000074603 DPP8
5 8015557 23415 ENSG00000089558 KCNH4
6 7930787 5406 ENSG00000175535 PNLIP
7 7984142 9960 ENSG00000140455 USP3
8 7894624 <NA> <NA> <NA>
9 8170511 79057 ENSG00000130032 PRRG3
10 8105007 55100 ENSG00000082068 WDR70
- Paul
On May 14, 2013, at 8:42 AM, Jeremy Ng wrote:
> Dear all,
>
> Following the RMA normalization of data from Affy Human Exon ST1.0 array
> using the package Oligo (at the transcript level using target="core"), I
> then conducted a limma analysis.
>
> The topTable argument in limma would then retrieve the top genes (in my
> case, cause I am interested in subsequently doing GSEA analysis, I set
> number=100) which are differentially addressed.
>
> The question I have is how can I map the Affy ID which is found in the
> results from topTable to an ENSEMBL and an ENTREZ gene ID. Intuitively,
> biomaRt comes to mind, and I did a biomaRt query for the list of top 100
> genes which I had gotten, but I get only 14 hgnc symbols. I'd like to think
> that it's due to a lack of annotations, but I highly doubt so (14 in a list
> of 100 seems too little to me).
>
> My code for biomaRt is as follows:
> mart <- useMart("ensembl", "hsapiens_gene_ensembl")
>
> hgnc <- getBM(attributes=c("hgnc_symbol",
> "ensembl_gene_id"),values=top100$ID, filters="affy_huex_1_0_st_v2",
> mart=mart)
>
> I was wondering 2 things:
> 1. Is there any plausible explanation to why the query only returns 14 IDs;
> and
> 2. Are there other ways that I can use to fetch annotations from a
> post-Limma analysis?
>
> My session info is as follows:
> R version 3.0.0 (2013-04-03)
> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>
> locale:
> [1] C
>
> Thanks for any advice!
>
> Jeremy
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list