[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays

Tue Mar 5 23:22:44 CET 2013

Hi Kamila,

On 3/5/2013 4:45 PM, Naxerova, Kamila wrote:
> Dear all,
>
> I am analyzing a set of Affymetrix Mouse Gene 2.0 ST arrays. I am quite familiar with 3'-biased chips, but this is my first time looking at data from WT arrays. I have a few general questions -- any advice would be appreciated to speed up my learning process.
>
> 1) I have already read on this mailing list that the good old affy package does not work well with WT arrays (can anybody point me to any literature on why that is?). So I have installed the oligo and xps packages -- what are the advantages/disadvantages for each? Any opinions on which one is the right "starter kit"?

The affy package was never intended to work with these arrays - it was 
designed specifically for the 3' biased arrays, which had pre-defined 
probesets, and which didn't share probes between probesets. In addition, 
the makecdfenv package is designed to work with the old style CDF 
packages, and Affy has never released a CDF for these new chips that 
they are willing to support in any meaningful way.

There were some changes made to the affy package in order to accommodate 
the fact that probes could be shared between probesets, and it is 
possible to use functions in affxparser to re-create conventional CDF 
packages using the newer pgf and clf files. So hypothetically you could 
still use the affy package (and hypothetically you could still use an 
Apple IIe for all your computing needs, but that's crazy, so let's move on).

I don't think you will find much difference between oligo and xps, other 
than the fact that xps requires the additional installation of ROOT. You 
might play around with both and see which suits you better.

I should throw in my obligatory cautionary statement about summarizing 
Gene ST data at the probeset (as compared to the transcript) level. If 
you look at the number of probes/probeset, there are a huge number with 
< 4 probes. So hypothetically you can do this, but I wouldn't.

>
> 2) I see with some dread that there seems to be no annotation package for the 2.0 array yet. I have never built my own... any quick bullet points on how I would go about doing that for a WT array?

No dread should be required. All you need to do is get the 
transcript-level annotation file from Affy 
(http://www.affymetrix.com/Auth/analysis/downloads/na33/wtgene/MoGene-2_0-st-v1.na33.mm10.transcript.csv.zip) 
and then the AnnotationForge, mouse.db0, and org.Mm.eg.db packages. Then 
something like

library(AnnotationForge)
library(mouse.db0)
library(org.Mm.eg.db)
makeDBPackage("MOUSECHIP_DB",
affy=TRUE,
prefix="mogene20sttranscriptcluster",
fileName="MoGene-2_0-st-v1.na33.mm10.transcript.csv",
outputDir = ".",
version="2.11.1",
manufacturer = "Affymetrix",
chipName = "Human Gene 2.1 ST Array",
manufacturerUrl = "http://www.affymetrix.com",
author = "Kamila Naxerova",
maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")

should do the trick. You can then install directly from within R by

install.packages("mogene20sttranscriptcluster.db", repos=NULL, 
type="source")

And see 
http://bioconductor.org/packages/2.11/bioc/vignettes/AnnotationForge/inst/doc/SQLForge.pdf 

>
> 3) It seems that RMA is also used for normalization of WT arrays, so that part I am comfortable with. But are there any differences in preprocessing between 3' and WT arrays that I should watch out for?

Not really. I don't use xps, so cannot say for certain how you do things 
with that package, but with oligo it's a simple

abatch <- read.celfiles(list.celfiles())
eset <- rma(abatch)

To normalize and summarize at the transcript level. Note however that 
the annotation for the resulting ExpressionSet will be the 
pd.mogene.2.0.st.v1 package, and if you use annotation(eset) in any 
further calls to do gene annotation, it won't work out. You need to 
first do

annotation(eset) <- "mogene20sttranscriptcluster.db"

One further note: the intronic controls (especially) have an irritating 
habit of popping up in lists of differentially expressed genes. This is 
IMO likely due to mRNA that has not been fully processed to excise the 
introns, but regardless, these probesets tend to have no annotation at 
all, so are not useful without extra work to figure out what they are 
supposed to be measuring. My usual MO is to just summarily excise them 
after e.g., the eBayes() step of an analysis using limma. If you are 
interested, there is a function in the affycoretools package called 
getMainProbes() that will do this for you.

Best,

Jim

>
> Thanks so much!
> Kamila
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099