[BioC] affy - Anotation of Afymetrix PrimeView chipe

Wed Oct 23 17:38:42 CEST 2013

Hi Adam,

On Wednesday, October 23, 2013 10:44:34 AM, Adam Olejnik wrote:
> Dear all
> I am a ner person in analyzing microarrays with R.
> I ma working on dataset GSE41960.
> As far as I know RMA method summarises probes into genes, hoverer when I
> run limma and use topTable I have probe ID instead of gene names.
> Where in the pipeline I should merge the probes ?

As you note, RMA summarizes probes, but not into genes, but into 
probesets. Probesets are intended to interrogate transcripts, which are 
certainly not genes. However, most people end up collapsing transcripts 
back to gene IDs, so maybe that is not relevant.

But to answer your question, you summarize when you run rma(). I am 
assuming you are doing something like

library(affy)
dat <- ReadAffy()
eset <- rma(dat)

In which case the ExpressionSet object called 'eset' now contains the 
summarized data, and limma knows what to do with it.

So if you fit a model and end up with an MArrayLM object

fit <- lmFit(eset, design)
fit2 <- contrasts.fit(fit, contrast)
fit2 <- eBayes(fit2)

You can then annotate these data using a primeview.db package. But note 
that there isn't such a thing on the BioC website. It doesn't really 
matter, as it is simple to create using the AnnotationForge package. 
All you need to do is go to the Affy website, and get the primeview 
annotation csv file

http://www.affymetrix.com/Auth/analysis/downloads/na32/ivt/PrimeView.na32.annot.csv.zip

Install AnnotationDbi and the human.db0 packages, and then do what I 
recommend here:

https://stat.ethz.ch/pipermail/bioconductor/attachments/20130711/7e4d77fb/attachment.pl

You can then do

install.packages("primeview.db", type="source", repos=NULL)

and then

library(primeview.db)
gns <- select(primeview.db, featureNames(eset), c("SYMBOL","GENENAME"))

and if you don't get an error about 1 to many mappings you can do

fit2$genes <- gns

otherwise you can be super naive and just take the first mapping 
available

fit2$genes <- gns[!duplicated(gns[,1]),]

and then topTable(fit, coef=1) will have annotated genes in it.

>
> The second question is about CDF file. I know these are file used for
> description and mapping of the probes. But I did not figured out how to use
> it with affy and limma.

This happens automatically.

Best,

Jim

>
> I am familiar with user guide for affy and limma.
>
> Many thanks in advance
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099