[BioC] Filtering before differential expression analysis of microarrays - New paper out
Steve Lianoglou
mailinglist.honeypot at gmail.com
Wed Jan 14 18:59:53 CET 2009
Thanks, Jim!
Multiplicity as in multiple testing makes sense, I wasn't sure if he
was referring to something about probes appearing in multiple places
or something within arrays, or across arrays, or something (which I
was trying to parse into how that might be relevant here).
Cheers,
-steve
On Jan 14, 2009, at 12:50 PM, James W. MacDonald wrote:
> Hi Steve,
>
> The question wasn't really asked of me, but Gordon is likely in bed
> right now ;-D
>
> Steve Lianoglou wrote:
>> Hi Gordon,
>> As someone who has been dealing more and more with raw data, I
>> always appreciate detailed answers from the masters, such as the
>> one you just wrote. Even after reading several of the published
>> articles regarding these normalization practices, I always find
>> these less formal emails quite helpful.
>> That said, one point you mention isn't exactly clear to me, and I'm
>> wondering if you could elaborate just a bit here:
>>> Filtering non-expressed probes tends not be emphasised on this
>>> list because users of this list are often sophisticated enough to
>>> use variance stabilizing normalization methods such as rma, vsn,
>>> normexp or vst. This means that low-expression filtering is done
>>> more for multiplicity issues than for variance stabilization, and
>>> therefore often doesn't make a huge difference. When using
>>> earlier normalization methods such as MAS for Affy or local
>>> background correction for two-color arrays, expression-filtering
>>> is absolutely essential, because the normalized expression values
>>> are so unstable at low intensity levels.
>> When you say "... low-expression filtering is done more for
>> multiplicity issues than for variance stabilization", what exactly
>> do you mean by "multiplicity issues"?
>
> By multiplicity issues Gordon was referring to the multiple
> comparisons problem. A p-value is an estimate of the probability of
> a type 1 error, in which we say there is a difference when in fact
> there isn't (a false positive). If we reject the null hypothesis at
> an alpha level of 0.05, we are in essence taking a 5% chance of
> being wrong.
>
> For one test this isn't a problem, but as you make more and more
> tests simultaneously, you expect to see more and more false
> positives (e.g, if you do 20 tests at an alpha of 0.05, and there
> are really no differences for any of the tests, you still expect
> about one of them to appear significant even though none are).
>
> There are lots of ways to adjust for multiple comparisons, but one
> of the best things you can do is not make so many comparisons in the
> first place, by filtering out data based on one or more criteria.
>
> Best,
>
> Jim
>> Thanks,
>> -steve
>> --
>> Steve Lianoglou
>> Graduate Student: Physiology, Biophysics and Systems Biology
>> Weill Medical College of Cornell University
>> http://cbio.mskcc.org/~lianos
>> _______________________________________________
>> Bioconductor mailing list
>> Bioconductor at stat.math.ethz.ch
>> https://stat.ethz.ch/mailman/listinfo/bioconductor
>> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
>
> --
> James W. MacDonald, M.S.
> Biostatistician
> Hildebrandt Lab
> 8220D MSRB III
> 1150 W. Medical Center Drive
> Ann Arbor MI 48109-5646
> 734-936-8662
--
Steve Lianoglou
Graduate Student: Physiology, Biophysics and Systems Biology
Weill Medical College of Cornell University
http://cbio.mskcc.org/~lianos
More information about the Bioconductor
mailing list