[BioC] Queries about getting annotation post-Limma analysis

Tue May 14 21:28:10 CEST 2013

Hi Paul (and Jeremy),

On 5/14/2013 3:11 PM, Paul Shannon wrote:
> Hi Jeremy,
>
>> how can I map the Affy ID which is found in the results from topTable to an ENSEMBL and an ENTREZ gene ID
> The bioc annotation package "hugene10stprobeset.db" and the "select" interface should provide all of you need.

Jeremy is using the Human Exon ST array, and summarizing at the 
transcript level. So he needs the (non-existant) 
huex10sttranscriptcluster.db package to do what you suggest.

I am building such a beast as we speak, but this array has almost 340K 
probesets when summarized at the transcript level, so this is going 
sloooowly.

Best,

Jim

>
>
>       biocLite("hugene10stprobeset.db")
>       library(hugene10stprobeset.db)
>
>          # what kinds of data (what columns) are store in this annotation package?
>       keytypes(hugene10stprobeset.db)
>
>          # do a quick survey of each column
>       for(key in keytypes(hugene10stprobeset.db)){
>           print(paste("---", key))
>           print(head(keys(hugene10stprobeset.db, keytype=key)))
>           }
>
>           # get a random sample of probe ids to use for testing
>       sample.probe.ids<- sample(keys(hugene10stprobeset.db,keytype="PROBEID"), size=10)
>
>           # look these up using the select command. a data.frame is returned
>       select(hugene10stprobeset.db, keys=sample.probe.ids, cols=c("ENTREZID", "ENSEMBL", "SYMBOL"))
>          PROBEID ENTREZID         ENSEMBL SYMBOL
>       1  8165444    29952 ENSG00000176978   DPP7
>       2  7970230    23263 ENSG00000126217  MCF2L
>       3  8045081     2840 ENSG00000144230  GPR17
>       4  7989809    54878 ENSG00000074603   DPP8
>       5  8015557    23415 ENSG00000089558  KCNH4
>       6  7930787     5406 ENSG00000175535  PNLIP
>       7  7984142     9960 ENSG00000140455   USP3
>       8  7894624<NA>             <NA>    <NA>
>       9  8170511    79057 ENSG00000130032  PRRG3
>       10 8105007    55100 ENSG00000082068  WDR70
>
>
>   - Paul
>
>
>
> On May 14, 2013, at 8:42 AM, Jeremy Ng wrote:
>
>> Dear all,
>>
>> Following the RMA normalization of data from Affy Human Exon ST1.0 array
>> using the package Oligo (at the transcript level using target="core"), I
>> then conducted a limma analysis.
>>
>> The topTable argument in limma would then retrieve the top genes (in my
>> case, cause I am interested in subsequently doing GSEA analysis, I set
>> number=100) which are differentially addressed.
>>
>> The question I have is how can I map the Affy ID which is found in the
>> results from topTable to an ENSEMBL and an ENTREZ gene ID. Intuitively,
>> biomaRt comes to mind, and I did a biomaRt query for the list of top 100
>> genes which I had gotten, but I get only 14 hgnc symbols. I'd like to think
>> that it's due to a lack of annotations, but I highly doubt so (14 in a list
>> of 100 seems too little to me).
>>
>> My code for biomaRt is as follows:
>> mart<- useMart("ensembl", "hsapiens_gene_ensembl")
>>
>> hgnc<- getBM(attributes=c("hgnc_symbol",
>> "ensembl_gene_id"),values=top100$ID, filters="affy_huex_1_0_st_v2",
>> mart=mart)
>>
>> I was wondering 2 things:
>> 1. Is there any plausible explanation to why the query only returns 14 IDs;
>> and
>> 2. Are there other ways that I can use to fetch annotations from a
>> post-Limma analysis?
>>
>> My session info is as follows:
>> R version 3.0.0 (2013-04-03)
>> Platform: x86_64-apple-darwin10.8.0 (64-bit)
>>
>> locale:
>> [1] C
>>
>> Thanks for any advice!
>>
>> Jeremy
>>
>> 	[[alternative HTML version deleted]]
>>
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at r-project.org
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099