[BioC] BH vs BY in p.adjust
Wolfgang Huber
huber at ebi.ac.uk
Sat Jul 29 12:31:48 CEST 2006
Hi Caroline,
> hi wolfgang,
> well.... not at all uniform:
That is good - the distribution that you see is expected to be a mixture
of uniform (for the non differentially expressed genes) and something
which is concentrated near p=0 (for the differentially expressed genes).
The power of your test (e.g. the sample size) determines how well the
differentially expressed genes indeed get p-values close to 0.
>> x <- hist(fit2$p.value[,1], breaks=30, col="orange",main="distribution
>> of raw p-values",labels=T)
>
>> cbind(x$mids,x$counts)
> [,1] [,2]
> [1,] 0.025 1891
> [2,] 0.075 270
> [3,] 0.125 209
> [4,] 0.175 123
> [5,] 0.225 100
> [6,] 0.275 101
> [7,] 0.325 79
> [8,] 0.375 79
> [9,] 0.425 85
> [10,] 0.475 57 .....
>
> but from here on, the distribution is uniform (around 50 in every bin until
> p-val=1). so there are a lot of differential probesets in this contrast.
> but
> between 519 and 1032 as estimated from BY and BH adjustments with 1% FDR,
> there's quite a difference.... or can i estimate it directly from this
> histogram .....substracting the baseline gives me 2439 probesets, almost
> 70% of
> the whole set:
>
>> baseline <- mean(x$counts[11:20])
>> sum(x$counts-baseline)
> [1] 2439
>
> how safe is this ?
This is a good estimate of the number of differentially expressed genes if
- your p-values are indeed uniformly distributed for those genes that
fall under the null hypothesis
- your test has an OK power to find the alternatives
and of course it is more difficult to decide which ones they are.
> by the way, in cases that it's not uniformly distributed, from the range
> values
> of the over-represented bins on the histogram, can we not get an idea of
> the
> effect size associated with the differential probesets responsible for this
> non-uniformity ?
> or the other way around, if i happened to know that there were differential
> probesets but all of only moderate effect size, i might expect a bulge at
> moderate p-values, while lower ones could well instead be uniformly
> distributed, right?
In principle yes, but that would mean that your test is underpowered.
Also, the p-value is (generally) the result of two things: effect size
and sample size.
> but then if that were the case, could it also be that if all differential
> probesets had similar p-values, say 0.2, they could more easily be
> discovered
> than the same number associated to a lower but wider ranger of p-values,
> only
> because they would add significance to each other?
This seems like a very artificial scenario, and unlikely due to
stochastic effects.
> this doesn't quite sound right if it's true that the adjustment procedure
> preserves the rank that the genes have from the p-value.
>
Best wishes
------------------------------------------------------------------
Wolfgang Huber EBI/EMBL Cambridge UK http://www.ebi.ac.uk/huber
More information about the Bioconductor
mailing list