[BioC] filtered Exon Arrays: Core vs Extended Dataset

Thu May 7 02:39:11 CEST 2009

Hi Lana.

I can offer my view for what you are seeing.

So, the thing is, some of the 120,000 transcript clusters in the  
extended set are represented in the core set, but just with more  
probesets included in them.  You might say the extended set is a super  
set of the core set ... I'm assuming when you say extended, you really  
mean core+extended.  Because the extended set includes probesets based  
on lower confidence annotation (e.g. EST only evidence), these extra  
probes will be measuring background at a higher rate.

So, would a diff. expressed (DE) core transcript be DE in the extended  
set?  Some of the time.  But, a lot of the time the extra probes that  
make up the probeset will measure non-existent ESTs (i.e. background)  
and dilute the ability to detect DE.

Of course, I could be wrong.  You might verify this for yourself by  
looking at the probe-level data for a transcript that is very DE in  
the core set and not DE in the extended data ...

Cheers,
Mark

On 07/05/2009, at 6:55 AM, Lana Schaffer wrote:

> Hi,
> I have used Limma with both the core (~17,000) and extended (~120,000)
> Affymetrix datasets.   Do you think that significant transcripts in  
> the
> core dataset would also be found to be significant in the extended
> dataset?
>
>
> I have found that ~88% of the significant expressed transcripts from  
> the
>
> core dataset are not found in the significant expressed transcripts  
> from
>
> the extended dataset.
> Furthermore, 86% (1352/1575) of those significant core transcripts are
> found in the
> filtered extended dataset (input to Limma), but are not found to be
> significant in the filtered extended dataset.
>
>
> 	                              Core	    Extended
> Intersection
> Limma:adj.pvalue=0.05	            1575	      1142
> 225
> overlap extended filtered dataset	1352 (86%)		
> datasets	                        17,939	112,213	
> filtered datasets	                  17,939	61,717	
>
>
> Filtering was performed by standard deviation according to the
> following code.
>
> rs = rowSds(GL.un)
> lambda = 0.45
> filtered = GL.un[ rs > quantile(rs, lambda, na.rm=T), ]
>
> What are your suggestions for this discrepancy?
>
> Lana Schaffer
> Biostatistics/Informatics
> The Scripps Research Institute
> DNA Array Core Facility
> La Jolla, CA 92037
> (858) 784-2263
> (858) 784-2994
> schaffer at scripps.edu
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor

------------------------------
Mark Robinson
Epigenetics Laboratory, Garvan
Bioinformatics Division, WEHI
e: m.robinson at garvan.org.au
e: mrobinson at wehi.edu.au
p: +61 (0)3 9345 2628
f: +61 (0)3 9347 0852