[BioC] p.adjust BH generates duplicate values

Fri Sep 16 01:48:06 CEST 2011

Dear David,

You have described the first step of the BH algorithm.  However there is a 
second step which ensures that the adjusted p-values are monotonic in the 
original p-values.  It is this second step that sometimes causes a series 
of genes to get the same adjusted p-value.  This occurs whenever the 
first-step adjusted p-value for a less significant gene is lower than that 
for a more significant gene.

Best wishes
Gordon

> Date: Tue, 13 Sep 2011 16:09:45 -0700
> From: David Young <davidelyoung at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] p.adjust BH generates duplicate values
>
> Hi all,
>
> I was doing an RMA->limma (ebayes) analysis of an affymetrix mouse 430a
> experiment and noticed that while the p-values listed in toptable were all
> different, the adjusted p-values (adjust="BH") contained duplicate values.
> I don't think this is incorrect necessarily, but I was wondering why a
> different alpha wasn't generated for each gene.  From what I understand, the
> BH method gets the adjusted p-value (alpha) from [P_k*n*c(n) ] / k < alpha,
> where n = total number of genes (tests), P_k = p-value at kth gene (genes
> ordered from low to high p-value), and k = number of genes with p-value less
> than or equal to P_k.  I'm not entirely sure how the c(n) (dependence
> correction) part works, but it seems like a unique adjusted p-value (alpha)
> could be generated for each gene.  Instead I get:
>
>
>> top<-topTable(efit, adjust="BH", n=nrow(exprs(rmadata)))
>> write.table(top, "output.xls", sep="\t")
>
> from output.xls...
> ID                   adj.P.Val           P.Value
> Mm.277921    0.039259664     3.17E-06
> Mm.272646    0.050424143     9.93E-06
> Mm.148886    0.050424143     1.64E-05
> Mm.235998    0.050424143     2.02E-05
> Mm.4598       0.050424143     2.04E-05
> Mm.10728     0.101013086     4.89E-05
> Mm.162744   0.106930684     6.34E-05
> Mm.247564   0.106930684     6.91E-05
> Mm.269384   0.115716969     8.62E-05
> Mm.212428   0.115716969     9.34E-05
> Mm.457989   0.118548889     0.000126578
> Mm.154662   0.118548889     0.000128005
> Mm.21005     0.118548889     0.000133975
> Mm.5109       0.149489879     0.000196053
> Mm.207432   0.149489879     0.00020444
>
> Does anyone know why several probesets have the same adjusted p value even
> though the regular p value is different for each gene?  I'm 90% sure this is
> just my ignorance about the BH method, but I'll be very thankful to anyone
> who can point me in the right direction.  Thanks in advance,
>
> Dave Young
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] limma_3.8.3              mouse430a2mmugcdf_14.1.0 simpleaffy_2.28.0
>   gcrma_2.24.1
> [5] genefilter_1.34.0        affy_1.30.0              Biobase_2.12.2
>
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0         annotate_1.30.1       AnnotationDbi_1.14.1
> Biostrings_2.20.3
> [5] DBI_0.2-5             IRanges_1.10.6        preprocessCore_1.14.0
> RSQLite_0.9-4
> [9] splines_2.13.1        survival_2.36-9       tools_2.13.1
> xtable_1.5-6

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}