[BioC] Filtering before differential expression analysis of microarrays - New paper out

Steve Lianoglou mailinglist.honeypot at gmail.com
Wed Jan 14 18:59:53 CET 2009


Thanks, Jim!

Multiplicity as in multiple testing makes sense, I wasn't sure if he  
was referring to something about probes appearing in multiple places  
or something within arrays, or across arrays, or something (which I  
was trying to parse into how that might be relevant here).

Cheers,
-steve

On Jan 14, 2009, at 12:50 PM, James W. MacDonald wrote:

> Hi Steve,
>
> The question wasn't really asked of me, but Gordon is likely in bed  
> right now ;-D
>
> Steve Lianoglou wrote:
>> Hi Gordon,
>> As someone who has been dealing more and more with raw data, I  
>> always appreciate detailed answers from the masters, such as the  
>> one you just wrote. Even after reading several of the published  
>> articles regarding these normalization practices, I always find  
>> these less formal emails quite helpful.
>> That said, one point you mention isn't exactly clear to me, and I'm  
>> wondering if you could elaborate just a bit here:
>>> Filtering non-expressed probes tends not be emphasised on this  
>>> list because users of this list are often sophisticated enough to  
>>> use variance stabilizing normalization methods such as rma, vsn,  
>>> normexp or vst.  This means that low-expression filtering is done  
>>> more for multiplicity issues than for variance stabilization, and  
>>> therefore often doesn't make a huge difference.  When using  
>>> earlier normalization methods such as MAS for Affy or local  
>>> background correction for two-color arrays, expression-filtering  
>>> is absolutely essential, because the normalized expression values  
>>> are so unstable at low intensity levels.
>> When you say "... low-expression filtering is done more for  
>> multiplicity issues than for variance stabilization", what exactly  
>> do you mean by "multiplicity issues"?
>
> By multiplicity issues Gordon was referring to the multiple  
> comparisons problem. A p-value is an estimate of the probability of  
> a type 1 error, in which we say there is a difference when in fact  
> there isn't (a false positive). If we reject the null hypothesis at  
> an alpha level of 0.05, we are in essence taking a 5% chance of  
> being wrong.
>
> For one test this isn't a problem, but as you make more and more  
> tests simultaneously, you expect to see more and more false  
> positives (e.g, if you do 20 tests at an alpha of 0.05, and there  
> are really no differences for any of the tests, you still expect  
> about one of them to appear significant even though none are).
>
> There are lots of ways to adjust for multiple comparisons, but one  
> of the best things you can do is not make so many comparisons in the  
> first place, by filtering out data based on one or more criteria.
>
> Best,
>
> Jim
>> Thanks,
>> -steve
>> -- 
>> Steve Lianoglou
>> Graduate Student: Physiology, Biophysics and Systems Biology
>> Weill Medical College of Cornell University
>> http://cbio.mskcc.org/~lianos
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> -- 
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-5646
> 734-936-8662

--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University

http://cbio.mskcc.org/~lianos



More information about the Bioconductor mailing list