A simultaneous lower bound for the number of true discoveries in multiple testing under general dependence

Nicolai Meinshausen

Abstract

It is common practice nowadays in diverse fields, notably genetics, to test many variables simultaneously for association with a response variable of interest. The goal is often to filter out a few promising candidate hypotheses that deserve further investigation.

The question arises, how many hypotheses should be rejected and examined in more detail. We adopt a utility-based view, where each rejection carries a certain cost (as further investigation is laborious), while every rejected true non-null hypothesis, a so-called true discovery, is associated with a gain. A utility function U(r) measures both the cost and gains from making r rejections. Ideally, one would like to choose the number r of rejections so as to maximize the utility function.

The missing quantity (for direct optimization of the utility) is the number of true discoveries as a function of the number of rejections. We propose a lower bound for this function, which holds with high probability simultaneously for all possible number of rejections. The bound is valid under arbitrary and unknown dependence between the test statistics. The number of rejections can then be chosen so as to maximize the guaranteed lower bound for the utility. However, the lower bound is of interest independently from the utility-based approach. It allows to choose suitable rejection regions in an explorative fashion, while retaining at the same time strong control about the number of false discoveries. Furthermore, a one-sided confidence interval for the total number of true non-null hypotheses is obtained.

Encouraging numerical results from simulation studies are shown and the method is applied successfully to microarray gene expression experiments.

Download:

Compressed Postscript (111 Kb)
PDF (465 Kb).


Go back to the Research Reports from Seminar für Statistik.