[BioC] Understanding limma, fdr and topTable
James MacDonald
jmacdon at med.umich.edu
Wed Jul 9 08:17:27 CEST 2008
aaron.j.mackey at gsk.com wrote:
>> I would add that removing those genes that are unchanged in any sample
>> will also help reduce the multiplicity problem. Regardless of the
>> expression level, those genes that never change expression are
>> uninteresting by default, so e.g., if beta-actin is highly expressed at
>> the same level in all samples we don't really care to test for
>> differential expression for that gene since it apparently is not
>> differentially expressed.
>
> This doesn't make sense. How can I choose to filter out "unchanged"
> probesets without fitting a model of some sort, and making a probabilistic
> decision for each probeset about whether it is "unchanged" or not. Every
> probeset (save those below the detection limit) will exhibit variance
> (though the variance may be below the precision of the instrument to
> measure), right? You're not suggesting that there are some probesets with
> zero variance?
I don't really understand your point here. First, I never suggested
fitting a model of any kind to select unchanged probesets, unless
computing the variance is some kind of newfangled model fitting that I
don't understand.
In addition, are you really claiming that a probeset that is 'below the
detection limit' (whatever that means) will _not_ have any variance? I
would say that doesn't make any sense. All expression values will
exhibit some level of variance regardless of whether you might think
they are 'below the detection limit'.
>
> It seems to me that this approach leads to a false/erroneous reduction in
> the multiplicity problem, as you've just moved the hypothesis testing into
> a separate "phase" of the analysis. And it also would mess up pooled
> variance estimates such as those used in eBayes-based methods (e.g.
> limma).
So yes, if I had actually advocated fitting a model you would be
correct. However, simply deciding to exclude probesets that have a low
variance will not affect the hypothesis testing. Although it could have
an effect on the computation of the pooled variance estimates if you
remove too many probesets as the pooled variance might increase.
But the same can be said for any filtering method. If you remove a lot
of probesets of low intensity (say all those with an absent call) then
you very well could be removing probesets with a higher variance and
then mess up the estimate of the pooled variance as well.
As with all statistics there are tradeoffs and assumptions that are
being made regardless of what you do.
>
> So, while I might be willing to filter out known "dead" probesets (that I
> never see above detection threshold over many hundreds of assays), I'm in
> the camp that the statistics are corrupt if you filter without regard to
> its affect on multiplicity corrections.
I don't really know what you mean by 'detection limit'. Has someone
published something somewhere that says a probeset with an expression
value below X means the mRNA for that gene has not been detected?
I am not sure how the filtering step will affect multiplicity
corrections. If one were to use a two-stage modeling procedure that you
seem to think I am advocating then of course the p-values themselves
would be questionable as assumptions would have been violated. But I
don't know where multiplicity correction comes into the equation.
But personally I am not that much of a purist about multiplicity anyway.
I have been known to select probesets based on adjusted p-value and a
fold change criterion as well, which completely invalidates the meaning
of the adjusted p-values.
Best,
Jim
>
> As an aside, it should be possible to fit some of the models using
> truncated/censored distributions (wherein the statistical model gets to
> know that there were X number of probesets with values < threshold, but
> doesn't pretend that those values are real). That's an idea for the model
> developers to ponder ...
>
> -Aaron
>
--
James W. MacDonald, MS
Biostatistician
UMCCC cDNA and Affymetrix Core
University of Michigan
1500 E Medical Center Drive
7410 CCGC
Ann Arbor MI 48109
734-647-5623
More information about the Bioconductor
mailing list