[BioC] CDF file for Mouse Gene 1.1 ST Array

Thu Dec 2 23:48:01 CET 2010

Hi,

Indeed from experience I also know that you cannot use the unofficial, affymetrix-provided CDF of the cardridge Gene ST array (v1.0) for the analysis of the plate Gene ST (v1.1) array.
Although according to Affymetrix the content that is probed is identical between the two arrays, the dimensions of the arrays and the number of probes on the arrays are NOT identical. The mouse Gene ST cardridge (v1.0) array is square (1050cols x 1050rows), whereas the mouse Gene ST plate (v1.1) array is non-square (990cols x 1190rows). You can easily check this yourselves using the ReadCelHeader function from the library 'affxparser' on a sample v1.0 and v1.1 array (you can obtain these directly from Affymetrix).

See below for condensed output:
> library(affxparser)
> GeneSTv1.0 <- readCelHeader("MouseTP_Brain_01_mGENE.CEL")
> GeneSTv1.0
$filename
[1] "./MouseTP_Brain_01_mGENE.CEL"

$version
[1] 1

$cols
[1] 1050

$rows
[1] 1050

$total
[1] 1102500
<<SNIP>>

> GeneSTv1.1 <- readCelHeader("MouseBrain_1.CEL")
> GeneSTv1.1
$filename
[1] "./MouseBrain_1.CEL"

$version
[1] 1

$cols
[1] 990

$rows
[1] 1190

$total
[1] 1178100
<<SNIP>>

In this respect I think it *may* also be of interest to mention that within the 'aroma.affymetrix' project a function called 'PdInfo2Cdf' is made available, which allows you to generate an experimental CDF file from a pdInfo package and a auxillary CEL file for the same chip type. In other words, with this function you can create a CDF file from a platform design package and a CEL file. The pd.xxx packages are build with pdInfoBuilder for use with oligo and are made from files directly obtained from Affymetrix; you will need pd.mogene.1.1.st.v1 which is already provided at bioC. You can find the function here:
Direct link:
http://bioinf.wehi.edu.au/folders/mrobinson/aroma/PdInfo2Cdf.R
Entry page (section: "Creating CDF file from R package built from pdInfoBuilder")
http://www.aroma-project.org/node/41
Please note that the PdInfo2Cdf function was originally written to create an experimental CDF for an EXON array, so you will have to modify the query command to retrieve the so-called meta-probesets (= transcripts) instead of the exon-based probesets as well as two additional lines that speed-up the splitting.
Therefore you have to replace this line:
ff <- dbGetQuery(db(pd), "select * from pmfeature");
by this line:
ff <- dbGetQuery(db(pd), "SELECT fid, meta_fsetid, atom, y, x FROM pmfeature INNER JOIN core_mps USING(fsetid)") #gives meta-probesets/transcripts

AND also
replace these two lines:
  ffs <- split(ff, substr(ff$fsetid,1,4))
  ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$fsetid)), recursive=FALSE)
by these two lines:
  ffs <- split(ff, substr(ff$meta_fsetid,1,4)) #original fsetid is replaced by meta_fsetid 
  ffs <- unlist( lapply(ffs, FUN=function(u) split(u,u$meta_fsetid)), recursive=FALSE) #original fsetid is replaced by meta_fsetid 

FYI: if you do not modify the function at these two places you will get a CDF for the mouse Gene ST array v1.1 consisting of 241576 units (probesets, i.e. exons); querying for meta-probesets/transcripts results in a CDF consisting of 35556 units.
You can then convert the CDF file into a CDF environment to be used in BioC using the library makecdfenv. This CDF environment is used with e.g. the library affy.
!! Another very important note: due to an issue on how the x and y coordinates of a CDF file were interpreted by the libraries affy, affyio, affyPLM and makecdfenv, you will have to use the development versions of these libraries, i.e. version 1.29.1 (or higher) for affy, 1.19.2 for affyio, 1.29.1 for makecdfenv, and 1.27.2 for affyPLM, in order to have everything running fine!
! Also, I would like to emphasize what has been said many times before: packages such as oligo and xps have been specifically designed to handle the Gene ST arrays (incl. annotation info), and these are therefore the preferred packages for analyzing these arrays in BioC! Only use this experimental CDF if you really have to!

Hope this is useful,
Guido

Everything summerized in some lines of code:
# Create the pdInfo CDF file utilizing the (modified) function PdInfo2Cdf
source("PdInfo2Cdf.R")
PdInfo2Cdf("pd.mogene.1.1.st.v1", "MouseBrain_1.CEL");

# Now convert the pdInfo CDF-file into an CDF environment utilizing the "makecdfenv" method.
library(makecdfenv)
make.cdf.package(file="pdmogene11stv1.cdf", packagename = "mogene11stv1cdf", author="Your Name", maintainer="Your Name <y.name at domain.org>", version="2.1.0", species="Mus_musculus")

# A directory "mogene11stv1cdf" has been created that contains the affy CDF environment. Convert it to a R source package and install it.
system("R CMD build --force mogene11stv1cdf")
system("R CMD INSTALL ./mogene11stv1cdf_2.1.0.tar.gz")

# Now load the three mouse brain Gene ST v1.1 CEL-files for RMA normalization
library(affy)
x <- ReadAffy()
> x
AffyBatch object
size of arrays=990x1190 features (17 kb)
cdf=MoGene-1_1-st-v1 (35556 affyids)
number of samples=3
number of genes=35556
annotation=mogene11stv1
notes

x.rma <- rma(x)
> x.rma
ExpressionSet (storageMode: lockedEnvironment)
assayData: 35556 features, 3 samples 
  element names: exprs 
protocolData
  sampleNames: MouseBrain_1.CEL MouseBrain_2.CEL MouseBrain_3.CEL
  varLabels: ScanDate
  varMetadata: labelDescription
phenoData
  sampleNames: MouseBrain_1.CEL MouseBrain_2.CEL MouseBrain_3.CEL
  varLabels: sample
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation: mogene11stv1 
>
> sessionInfo()
R version 2.12.0 (2010-10-15)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] mogene11stv1cdf_2.1.0     affy_1.29.1              
 [3] makecdfenv_1.29.1         pd.mogene.1.1.st.v1_3.0.2
 [5] pdInfoBuilder_1.14.1      oligo_1.14.0             
 [7] oligoClasses_1.12.1       RSQLite_0.9-4            
 [9] DBI_0.2-5                 Biobase_2.10.0           
[11] affxparser_1.22.0        

loaded via a namespace (and not attached):
[1] affyio_1.19.2         Biostrings_2.18.0     IRanges_1.8.3        
[4] preprocessCore_1.12.0 splines_2.12.0        tools_2.12.0         
> 

------------------------------------------------ 
Guido Hooiveld, PhD 
Nutrition, Metabolism & Genomics Group
Division of Human Nutrition 
Wageningen University 
Biotechnion, Bomenweg 2 
NL-6703 HD Wageningen 
the Netherlands 
tel: (+)31 317 485788 
fax: (+)31 317 483342 
internet:   http://nutrigene.4t.com
email:      guido.hooiveld at wur.nl

> -----Original Message-----
> From: bioconductor-bounces at r-project.org 
> [mailto:bioconductor-bounces at r-project.org] On Behalf Of 
> Pascal Gellert
> Sent: Thursday, December 02, 2010 21:39
> To: Lucia Peixoto
> Cc: bioconductor at stat.math.ethz.ch
> Subject: Re: [BioC] CDF file for Mouse Gene 1.1 ST Array
> 
> Hey Lucia,
> 
> I just want to note, that if you want to stick to the 
> Affymetrix Power Tools, you can use PGF, CLF and MPS files 
> rather than a CDF. I am normalize my gene arrays this way. 
> These files are officially supported by Affymetrix and 
> available on their web site.
> 
> Pascal
> 
> 
> On 12/02/2010 08:11 PM, Lucia Peixoto wrote:
> > Hi,
> > thanks for all the help, I used to Affy powertools to get 
> > normalization quickly using RMA I tried mouse 1.0 CDF but 
> affy didn't 
> > like it I will work on a new pipeline using oligo
> >
> > Any advice on how to do array quality metrics coming from oligo?
> >
> > thanks again for all the help
> >
> > Lucia
> >
> >
> > On Thu, Dec 2, 2010 at 6:11 AM, Pascal Gellert 
> > <pascal.gellert at mpi-bn.mpg.de <mailto:pascal.gellert at mpi-bn.mpg.de>>
> > wrote:
> >
> >     Hi Lucia,
> >
> >     as far as I know are the Mouse Gene 1.1 ST Array Plates 
> identical
> >     to Mouse Gene  1.0 ST Cartridge Arrays. Meaning that 
> you could use
> >     the 1.0 CDF file for the 1.1 Plate!? And these can be normalized
> >     with oligo as Mark suggested.
> >
> >
> >     Pascal
> >
> >
> >
> >
> >
> >     On 01/-10/-28163 08:59 PM, Lucia Peixoto wrote:
> >
> >         Hi All,
> >
> >         I just got all my data back from my experiment and I can't
> >         find the cdf file
> >         for the  Mouse Gene 1.1 ST Array anywhere
> >         anyone knows how to get it or do I have to build it myself?
> >         thanks for your help
> >
> >         Lucia
> >
> >                [[alternative HTML version deleted]]
> >
> >
> >
> >
> 
> 
> 	[[alternative HTML version deleted]]
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
> 
>