[BioC] CDF file for Mouse Gene 1.1 ST Array
Guido.Hooiveld at wur.nl
Thu Dec 2 23:48:01 CET 2010
Indeed from experience I also know that you cannot use the unofficial, affymetrix-provided CDF of the cardridge Gene ST array (v1.0) for the analysis of the plate Gene ST (v1.1) array.
Although according to Affymetrix the content that is probed is identical between the two arrays, the dimensions of the arrays and the number of probes on the arrays are NOT identical. The mouse Gene ST cardridge (v1.0) array is square (1050cols x 1050rows), whereas the mouse Gene ST plate (v1.1) array is non-square (990cols x 1190rows). You can easily check this yourselves using the ReadCelHeader function from the library 'affxparser' on a sample v1.0 and v1.1 array (you can obtain these directly from Affymetrix).
See below for condensed output:
> GeneSTv1.0 <- readCelHeader("MouseTP_Brain_01_mGENE.CEL")
> GeneSTv1.1 <- readCelHeader("MouseBrain_1.CEL")
In this respect I think it *may* also be of interest to mention that within the 'aroma.affymetrix' project a function called 'PdInfo2Cdf' is made available, which allows you to generate an experimental CDF file from a pdInfo package and a auxillary CEL file for the same chip type. In other words, with this function you can create a CDF file from a platform design package and a CEL file. The pd.xxx packages are build with pdInfoBuilder for use with oligo and are made from files directly obtained from Affymetrix; you will need pd.mogene.1.1.st.v1 which is already provided at bioC. You can find the function here:
Entry page (section: "Creating CDF file from R package built from pdInfoBuilder")
Please note that the PdInfo2Cdf function was originally written to create an experimental CDF for an EXON array, so you will have to modify the query command to retrieve the so-called meta-probesets (= transcripts) instead of the exon-based probesets as well as two additional lines that speed-up the splitting.
Therefore you have to replace this line:
ff <- dbGetQuery(db(pd), "select * from pmfeature");
by this line:
ff <- dbGetQuery(db(pd), "SELECT fid, meta_fsetid, atom, y, x FROM pmfeature INNER JOIN core_mps USING(fsetid)") #gives meta-probesets/transcripts
replace these two lines:
ffs <- split(ff, substr(ff$fsetid,1,4))
ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), recursive=FALSE)
by these two lines:
ffs <- split(ff, substr(ff$meta_fsetid,1,4)) #original fsetid is replaced by meta_fsetid
ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$meta_fsetid)), recursive=FALSE) #original fsetid is replaced by meta_fsetid
FYI: if you do not modify the function at these two places you will get a CDF for the mouse Gene ST array v1.1 consisting of 241576 units (probesets, i.e. exons); querying for meta-probesets/transcripts results in a CDF consisting of 35556 units.
You can then convert the CDF file into a CDF environment to be used in BioC using the library makecdfenv. This CDF environment is used with e.g. the library affy.
!! Another very important note: due to an issue on how the x and y coordinates of a CDF file were interpreted by the libraries affy, affyio, affyPLM and makecdfenv, you will have to use the development versions of these libraries, i.e. version 1.29.1 (or higher) for affy, 1.19.2 for affyio, 1.29.1 for makecdfenv, and 1.27.2 for affyPLM, in order to have everything running fine!
! Also, I would like to emphasize what has been said many times before: packages such as oligo and xps have been specifically designed to handle the Gene ST arrays (incl. annotation info), and these are therefore the preferred packages for analyzing these arrays in BioC! Only use this experimental CDF if you really have to!
Hope this is useful,
Everything summerized in some lines of code:
# Create the pdInfo CDF file utilizing the (modified) function PdInfo2Cdf
# Now convert the pdInfo CDF-file into an CDF environment utilizing the "makecdfenv" method.
make.cdf.package(file="pdmogene11stv1.cdf", packagename = "mogene11stv1cdf", author="Your Name", maintainer="Your Name <y.name at domain.org>", version="2.1.0", species="Mus_musculus")
# A directory "mogene11stv1cdf" has been created that contains the affy CDF environment. Convert it to a R source package and install it.
system("R CMD build --force mogene11stv1cdf")
system("R CMD INSTALL ./mogene11stv1cdf_2.1.0.tar.gz")
# Now load the three mouse brain Gene ST v1.1 CEL-files for RMA normalization
x <- ReadAffy()
size of arrays=990x1190 features (17 kb)
cdf=MoGene-1_1-st-v1 (35556 affyids)
number of samples=3
number of genes=35556
x.rma <- rma(x)
ExpressionSet (storageMode: lockedEnvironment)
assayData: 35556 features, 3 samples
element names: exprs
sampleNames: MouseBrain_1.CEL MouseBrain_2.CEL MouseBrain_3.CEL
sampleNames: MouseBrain_1.CEL MouseBrain_2.CEL MouseBrain_3.CEL
experimentData: use 'experimentData(object)'
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)
 LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
 LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
 LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
 LC_PAPER=en_US.UTF-8 LC_NAME=C
 LC_ADDRESS=C LC_TELEPHONE=C
 LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
 stats graphics grDevices utils datasets methods base
other attached packages:
 mogene11stv1cdf_2.1.0 affy_1.29.1
 makecdfenv_1.29.1 pd.mogene.1.1.st.v1_3.0.2
 pdInfoBuilder_1.14.1 oligo_1.14.0
 oligoClasses_1.12.1 RSQLite_0.9-4
 DBI_0.2-5 Biobase_2.10.0
loaded via a namespace (and not attached):
 affyio_1.19.2 Biostrings_2.18.0 IRanges_1.8.3
 preprocessCore_1.12.0 splines_2.12.0 tools_2.12.0
Guido Hooiveld, PhD
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition
Biotechnion, Bomenweg 2
NL-6703 HD Wageningen
tel: (+)31 317 485788
fax: (+)31 317 483342
email: guido.hooiveld at wur.nl
> -----Original Message-----
> From: bioconductor-bounces at r-project.org
> [mailto:bioconductor-bounces at r-project.org] On Behalf Of
> Pascal Gellert
> Sent: Thursday, December 02, 2010 21:39
> To: Lucia Peixoto
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] CDF file for Mouse Gene 1.1 ST Array
> Hey Lucia,
> I just want to note, that if you want to stick to the
> Affymetrix Power Tools, you can use PGF, CLF and MPS files
> rather than a CDF. I am normalize my gene arrays this way.
> These files are officially supported by Affymetrix and
> available on their web site.
> On 12/02/2010 08:11 PM, Lucia Peixoto wrote:
> > Hi,
> > thanks for all the help, I used to Affy powertools to get
> > normalization quickly using RMA I tried mouse 1.0 CDF but
> affy didn't
> > like it I will work on a new pipeline using oligo
> > Any advice on how to do array quality metrics coming from oligo?
> > thanks again for all the help
> > Lucia
> > On Thu, Dec 2, 2010 at 6:11 AM, Pascal Gellert
> > <pascal.gellert at mpi-bn.mpg.de <mailto:pascal.gellert at mpi-bn.mpg.de>>
> > wrote:
> > Hi Lucia,
> > as far as I know are the Mouse Gene 1.1 ST Array Plates
> > to Mouse Gene 1.0 ST Cartridge Arrays. Meaning that
> you could use
> > the 1.0 CDF file for the 1.1 Plate!? And these can be normalized
> > with oligo as Mark suggested.
> > Pascal
> > On 01/-10/-28163 08:59 PM, Lucia Peixoto wrote:
> > Hi All,
> > I just got all my data back from my experiment and I can't
> > find the cdf file
> > for the Mouse Gene 1.1 ST Array anywhere
> > anyone knows how to get it or do I have to build it myself?
> > thanks for your help
> > Lucia
> > [[alternative HTML version deleted]]
> [[alternative HTML version deleted]]
> Bioconductor mailing list
> Bioconductor at r-project.org
> Search the archives:
More information about the Bioconductor