[BioC] filtered Exon Arrays: Core vs Extended Dataset

Wed May 6 22:55:25 CEST 2009

Hi,
I have used Limma with both the core (~17,000) and extended (~120,000)
Affymetrix datasets.   Do you think that significant transcripts in the
core dataset would also be found to be significant in the extended 
dataset?

I have found that ~88% of the significant expressed transcripts from the

core dataset are not found in the significant expressed transcripts from

the extended dataset.  
Furthermore, 86% (1352/1575) of those significant core transcripts are
found in the
filtered extended dataset (input to Limma), but are not found to be
significant in the filtered extended dataset.

	                              Core	    Extended
Intersection
Limma:adj.pvalue=0.05	            1575	      1142
225
overlap extended filtered dataset	1352 (86%)		
datasets	                        17,939	112,213	
filtered datasets	                  17,939	61,717	

Filtering was performed by standard deviation according to the
following code.

rs = rowSds(GL.un)
lambda = 0.45
filtered = GL.un[ rs > quantile(rs, lambda, na.rm=T), ]

What are your suggestions for this discrepancy?

Lana Schaffer
Biostatistics/Informatics
The Scripps Research Institute
DNA Array Core Facility
La Jolla, CA 92037
(858) 784-2263
(858) 784-2994
schaffer at scripps.edu