[BioC] Filtering before differential expression analysis of microarrays - New paper out

Wed Jan 14 18:50:54 CET 2009

Hi Steve,

The question wasn't really asked of me, but Gordon is likely in bed 
right now ;-D

Steve Lianoglou wrote:
> Hi Gordon,
> 
> As someone who has been dealing more and more with raw data, I always 
> appreciate detailed answers from the masters, such as the one you just 
> wrote. Even after reading several of the published articles regarding 
> these normalization practices, I always find these less formal emails 
> quite helpful.
> 
> That said, one point you mention isn't exactly clear to me, and I'm 
> wondering if you could elaborate just a bit here:
> 
>> Filtering non-expressed probes tends not be emphasised on this list 
>> because users of this list are often sophisticated enough to use 
>> variance stabilizing normalization methods such as rma, vsn, normexp 
>> or vst.  This means that low-expression filtering is done more for 
>> multiplicity issues than for variance stabilization, and therefore 
>> often doesn't make a huge difference.  When using earlier 
>> normalization methods such as MAS for Affy or local background 
>> correction for two-color arrays, expression-filtering is absolutely 
>> essential, because the normalized expression values are so unstable at 
>> low intensity levels.
> 
> 
> When you say "... low-expression filtering is done more for multiplicity 
> issues than for variance stabilization", what exactly do you mean by 
> "multiplicity issues"?

By multiplicity issues Gordon was referring to the multiple comparisons 
problem. A p-value is an estimate of the probability of a type 1 error, 
in which we say there is a difference when in fact there isn't (a false 
positive). If we reject the null hypothesis at an alpha level of 0.05, 
we are in essence taking a 5% chance of being wrong.

For one test this isn't a problem, but as you make more and more tests 
simultaneously, you expect to see more and more false positives (e.g, if 
you do 20 tests at an alpha of 0.05, and there are really no differences 
for any of the tests, you still expect about one of them to appear 
significant even though none are).

There are lots of ways to adjust for multiple comparisons, but one of 
the best things you can do is not make so many comparisons in the first 
place, by filtering out data based on one or more criteria.

Best,

Jim
> 
> Thanks,
> -steve
> 
> -- 
> Steve Lianoglou
> Graduate Student: Physiology, Biophysics and Systems Biology
> Weill Medical College of Cornell University
> 
> http://cbio.mskcc.org/~lianos
> 
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at stat.math.ethz.ch
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor

-- 
James W. MacDonald, M.S.
Biostatistician
Hildebrandt Lab
8220D MSRB III
1150 W. Medical Center Drive
Ann Arbor MI 48109-5646
734-936-8662