[BioC] help with multiple testing

efthimiosm efthimiosm at bii.a-star.edu.sg
Mon Jun 25 13:15:23 CEST 2012


Hi all,

My name is Mike and I am a post-doctoral fellow in Bioinformatics. I 
have a question regarding multiple testing p-values adjustment and I 
wonder if someone could give me a piece of advice.

I have multiple gene pairs (approximately 8,256) composed by all 
possible combinations of 129 genes. For each pair A-B (A different from 
B) four values are recorded: number of tumors found in both A and B 
(TT),  number of tumors only in A (TF), number of tumors only in B (FT), 
number of tumors found neither in A nor in B (FF). The data are in the 
form of 2x2 contingency tables. E.g.

Gene 1    Gene 2    TT    TF    FT    FF
g1    g2    5    1    1    27
g1    g3    4    1    1    28
g2    g3    4    2    0    28
...
...
...

Notice that each gene is paired with all others and thus it is 
represented 128 times in this list. I want to find which of the 8,256 
gene pairs (tests) show significant associations between rows (in A, not 
in A) and columns (in B, not in B) by Fisher or Barnard test. 
Subsequently I have to perform p-value adjustment for multiple testing.

At 5% I find approximately 500 significant gene pairs but, naturally, 
all p-value adjustment procedures I tried (for independent tests: BH, 
q-value; for dependent tests: BY, adaptiveBH and BlaRoq from package 
"multtest") produce adj. p-values > 0.3. I think that the problem is 
that the highly dependent nature of the data (50% of the genes have very 
small number of mutations which gives high p-values for all pair they 
generate) affects dramatically the adjustment procedure.

Is there a better way (method) to run the p-values adjustment?

Do you think if I created multiple lists of gene pairs, where each gene 
is represented only once, and then estimate q-value (multiple q-values 
for each pair) would be an appropriate solution?


Thank you,
Mike



More information about the Bioconductor mailing list