[R] multiple hypothesis testing
Neil Shephard
nshephard at gmail.com
Tue Mar 17 12:47:20 CET 2009
Vijaykumar Muley wrote:
>
> Dear all,
>
> Myself Vijaykumar Muley working as senior research fellow. By training I
> am
> a computational biologist with not a strong knowledge of statistics. I
> have
> done some analysis which is explained as follows,
>
> I have 10340 (X) profiles of binary vectors with same length(N=845), I
> will
> call then "gene profiles"
> for example...
>
> v1 v2 v3 v4.....vN
> a 1 0 1 0 1
> b 0 0 1 0 0
> c 1 0 1 1 1
> d 0 1 1 1 1
> e 0 0 1 1 1
> . . . . ........
> . . . . ........
> . . . . ........
> upto
> 10340
>
>
> then I have some other binary profiles with same length (N=845), here I
> will
> call then "expression profile";
> v1 v2 v3 v4.....vN
> f1 1 0 1 0 1
> f2 0 0 1 0 0
> f3 1 0 1 1 1
>
>
> now I am comparing profile f1 with all X profiles using hypergeometic
> distribution function. What I am getting is p-value(probability) of the
> similarity between profile f1 and all X profiles i.e. 10340 by random
> chance
> alone.
>
> for example,
>
> #pair p-value
>
> f1,a 1e-20
> f1,b 0.01
> .
> .
> upto
> f1,10340 0.05
>
> same thing i am doing with f2 and f3.
>
> if we arrange this data(output) in better readable format, it looks like
>
> f1 f2 f3
> a 1e-20 0.01 0.10
> b 0.01 1e-9 0.02
> c 1e-3 0.1 0.30
> d 0.03 0.07 1e-5
> e 1e-1 0.01 1e-9
> . . . . ........
> . . . . ........
> . . . . ........
> upto
> 10340
>
>
> I hope everyone understood what type of output I am getting.
>
> Now I want to perform multiple hypothesis comparision(P-value adjustment)
> on
> this data , so that I will get the statistically significant associations
> between various "expression profiles" and "gene profiles" at specific
> alpha
> level;
>
> Most conservative method for p-value adjustment is bonferroni and many
> others with less conservation, I dont care which method I use but the
> problem here is
>
> according to what parameter I should use for correct or adjust p-values ?.
>
> so in case of Bonferroni correction,
> should I multiply the each p-value with 10340 or
> as I have compared 3 expression profiles against 10340 gene profiles,
> should
> I multiply p-value with 3*10340
>
> I am aksing this for understanding. What I want to do is
>
>>From the above gene, p-value table, I want to calculate the percentage of
> false positive rate at each p-values from 0.0001 to 0.05
> So that I can use a good cutoff as significance level (alpha) to exclude
> the
> gene profiles which are weakly associated with all expression profiles.
> (If I am correct, to do this I need to use other p-value correction
> methods,
> either simulation based, resampling or
> Benjamini and Hochberg (B&H).
>
> Please can any one suuggests me about p-value adjustment or p-value
> correction, I mean statistically or technically which number should I
> consider for correction, 10340 or 3 * 10340, as I have three features to
> associate with same 10340 gene set. or if I am wrong, can any one tell me
> the protocol which I should refer to get fair number of significant
> associations between genes and expression profiles.
>
> I am using package "multtest" for p-value adjustment but literally I am
> not
> getting for correction,
> should I give p-values for each expression profile alone or give it all
> p-values ie. 3*10340.
>
> I have gone through many tutorials and articles for multiple hypothesis
> testing but really couldnt get exactly, what is it.
>
> Please give me some clues, some of you may be actively working on p-value
> adjustment / multiple hypothesis testing, I expect some suggestions.
>
> I will be grateful for you kind help.
>
> sincerely,
>
>
Please do NOT reply to a digest when posting to the list, you should start a
new thread (or at the very least delete the digest to which you are replying
from your email).
You may be interested False Discovery Rate (FDR) methods proposed by
Benjamini & Hochberg[1] and various related work/papers/software[2][3]
Neil
[1] Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J. R. Statist Soc B
57:289-300
[2] http://genomics.princeton.edu/storeylab/qvalue/
--
View this message in context: http://www.nabble.com/multiple-hypothesis-testing-tp22512331p22557450.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list