[BioC] Understanding limma, fdr and topTable
aaron.j.mackey at gsk.com
aaron.j.mackey at gsk.com
Tue Jul 8 17:17:02 CEST 2008
> I would add that removing those genes that are unchanged in any sample
> will also help reduce the multiplicity problem. Regardless of the
> expression level, those genes that never change expression are
> uninteresting by default, so e.g., if beta-actin is highly expressed at
> the same level in all samples we don't really care to test for
> differential expression for that gene since it apparently is not
> differentially expressed.
This doesn't make sense. How can I choose to filter out "unchanged"
probesets without fitting a model of some sort, and making a probabilistic
decision for each probeset about whether it is "unchanged" or not. Every
probeset (save those below the detection limit) will exhibit variance
(though the variance may be below the precision of the instrument to
measure), right? You're not suggesting that there are some probesets with
zero variance?
It seems to me that this approach leads to a false/erroneous reduction in
the multiplicity problem, as you've just moved the hypothesis testing into
a separate "phase" of the analysis. And it also would mess up pooled
variance estimates such as those used in eBayes-based methods (e.g.
limma).
So, while I might be willing to filter out known "dead" probesets (that I
never see above detection threshold over many hundreds of assays), I'm in
the camp that the statistics are corrupt if you filter without regard to
its affect on multiplicity corrections.
As an aside, it should be possible to fit some of the models using
truncated/censored distributions (wherein the statistical model gets to
know that there were X number of probesets with values < threshold, but
doesn't pretend that those values are real). That's an idea for the model
developers to ponder ...
-Aaron
More information about the Bioconductor
mailing list