[BioC] Analysis of Affymetrix Mouse Gene 2.0 ST arrays
James W. MacDonald
jmacdon at uw.edu
Tue Mar 5 23:22:44 CET 2013
Hi Kamila,
On 3/5/2013 4:45 PM, Naxerova, Kamila wrote:
> Dear all,
>
> I am analyzing a set of Affymetrix Mouse Gene 2.0 ST arrays. I am quite familiar with 3'-biased chips, but this is my first time looking at data from WT arrays. I have a few general questions -- any advice would be appreciated to speed up my learning process.
>
> 1) I have already read on this mailing list that the good old affy package does not work well with WT arrays (can anybody point me to any literature on why that is?). So I have installed the oligo and xps packages -- what are the advantages/disadvantages for each? Any opinions on which one is the right "starter kit"?
The affy package was never intended to work with these arrays - it was
designed specifically for the 3' biased arrays, which had pre-defined
probesets, and which didn't share probes between probesets. In addition,
the makecdfenv package is designed to work with the old style CDF
packages, and Affy has never released a CDF for these new chips that
they are willing to support in any meaningful way.
There were some changes made to the affy package in order to accommodate
the fact that probes could be shared between probesets, and it is
possible to use functions in affxparser to re-create conventional CDF
packages using the newer pgf and clf files. So hypothetically you could
still use the affy package (and hypothetically you could still use an
Apple IIe for all your computing needs, but that's crazy, so let's move on).
I don't think you will find much difference between oligo and xps, other
than the fact that xps requires the additional installation of ROOT. You
might play around with both and see which suits you better.
I should throw in my obligatory cautionary statement about summarizing
Gene ST data at the probeset (as compared to the transcript) level. If
you look at the number of probes/probeset, there are a huge number with
< 4 probes. So hypothetically you can do this, but I wouldn't.
>
> 2) I see with some dread that there seems to be no annotation package for the 2.0 array yet. I have never built my own... any quick bullet points on how I would go about doing that for a WT array?
No dread should be required. All you need to do is get the
transcript-level annotation file from Affy
(http://www.affymetrix.com/Auth/analysis/downloads/na33/wtgene/MoGene-2_0-st-v1.na33.mm10.transcript.csv.zip)
and then the AnnotationForge, mouse.db0, and org.Mm.eg.db packages. Then
something like
library(AnnotationForge)
library(mouse.db0)
library(org.Mm.eg.db)
makeDBPackage("MOUSECHIP_DB",
affy=TRUE,
prefix="mogene20sttranscriptcluster",
fileName="MoGene-2_0-st-v1.na33.mm10.transcript.csv",
outputDir = ".",
version="2.11.1",
manufacturer = "Affymetrix",
chipName = "Human Gene 2.1 ST Array",
manufacturerUrl = "http://www.affymetrix.com",
author = "Kamila Naxerova",
maintainer = "Kamila Naxerova <naxerova at fas.harvard.edu>")
should do the trick. You can then install directly from within R by
install.packages("mogene20sttranscriptcluster.db", repos=NULL,
type="source")
And see
http://bioconductor.org/packages/2.11/bioc/vignettes/AnnotationForge/inst/doc/SQLForge.pdf
>
> 3) It seems that RMA is also used for normalization of WT arrays, so that part I am comfortable with. But are there any differences in preprocessing between 3' and WT arrays that I should watch out for?
Not really. I don't use xps, so cannot say for certain how you do things
with that package, but with oligo it's a simple
abatch <- read.celfiles(list.celfiles())
eset <- rma(abatch)
To normalize and summarize at the transcript level. Note however that
the annotation for the resulting ExpressionSet will be the
pd.mogene.2.0.st.v1 package, and if you use annotation(eset) in any
further calls to do gene annotation, it won't work out. You need to
first do
annotation(eset) <- "mogene20sttranscriptcluster.db"
One further note: the intronic controls (especially) have an irritating
habit of popping up in lists of differentially expressed genes. This is
IMO likely due to mRNA that has not been fully processed to excise the
introns, but regardless, these probesets tend to have no annotation at
all, so are not useful without extra work to figure out what they are
supposed to be measuring. My usual MO is to just summarily excise them
after e.g., the eBayes() step of an analysis using limma. If you are
interested, there is a function in the affycoretools package called
getMainProbes() that will do this for you.
Best,
Jim
>
> Thanks so much!
> Kamila
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
--
James W. MacDonald, M.S.
Biostatistician
University of Washington
Environmental and Occupational Health Sciences
4225 Roosevelt Way NE, # 100
Seattle WA 98105-6099
More information about the Bioconductor
mailing list