[BioC] Multiple test question in micrarray- FDR

Sat Dec 13 21:11:03 CET 2008

On Sat, Dec 13, 2008 at 12:36 PM, Wayne Xu <wxu at msi.umn.edu> wrote:
> Hello,
> I am not sure this is a right place to ask this question, but it is about
> micrarray data analysis:
>
> In two group t test, the multiple test Q values are depending on the total
> number of genes in the test. If I filter the gene list first, for example, I
> only use those genes that have1.2 fold changes for T test and multiple test,
> this gene list is much smaller than the total gene list, then the multiple
> test q values are much smaller.
>
> Do you think above is a correct way? People who do not do that way may
> consider the statistical power may be lost? But how much power lost and how
> to calculate the power in this case?

No, you cannot filter based on fold change.  However, you can filter
based on variance or some other measure that does not depend on the
two groups being compared.  Anything that filters genes based on
"knowing" the two groups will lead to a biased test.  Remember that
filtering removes genes from consideration from further analysis.

For further details, there are MANY discussions of this topic in the
mailing list.

> When people report multiple test Q values, they usually do not mention how
> many genes are used in this multiple test. You can get different Q values
> (even use the same method, e.g. Benjamin and Holm adjust method) in the same
> dataset. Then how can it make sense if the same genes have different Q
> values?

A good manuscript should describe in detail the preprocessing and
filtering steps, the statistical tests used, and the methods for
correcting for multiple testing.  You are correct that many papers do
not do so.

As for different q-values in the same dataset using different methods,
it is important to note that one should not do an analysis, get a
result, and then, based on that result, go back and redo the analysis
with different parameters to get a "better" result.  It is very
important that each step of an analysis (preprocessing, filtering,
testing, multiple-testing correction) be justifiable independent of
the other steps in order for the results to be interpretable.

Sean