[BioC] p.adjust BH generates duplicate values
Gordon K Smyth
smyth at wehi.EDU.AU
Fri Sep 16 01:48:06 CEST 2011
Dear David,
You have described the first step of the BH algorithm. However there is a
second step which ensures that the adjusted p-values are monotonic in the
original p-values. It is this second step that sometimes causes a series
of genes to get the same adjusted p-value. This occurs whenever the
first-step adjusted p-value for a less significant gene is lower than that
for a more significant gene.
Best wishes
Gordon
> Date: Tue, 13 Sep 2011 16:09:45 -0700
> From: David Young <davidelyoung at gmail.com>
> To: bioconductor at r-project.org
> Subject: [BioC] p.adjust BH generates duplicate values
>
> Hi all,
>
> I was doing an RMA->limma (ebayes) analysis of an affymetrix mouse 430a
> experiment and noticed that while the p-values listed in toptable were all
> different, the adjusted p-values (adjust="BH") contained duplicate values.
> I don't think this is incorrect necessarily, but I was wondering why a
> different alpha wasn't generated for each gene. From what I understand, the
> BH method gets the adjusted p-value (alpha) from [P_k*n*c(n) ] / k < alpha,
> where n = total number of genes (tests), P_k = p-value at kth gene (genes
> ordered from low to high p-value), and k = number of genes with p-value less
> than or equal to P_k. I'm not entirely sure how the c(n) (dependence
> correction) part works, but it seems like a unique adjusted p-value (alpha)
> could be generated for each gene. Instead I get:
>
>
>> top<-topTable(efit, adjust="BH", n=nrow(exprs(rmadata)))
>> write.table(top, "output.xls", sep="\t")
>
> from output.xls...
> ID adj.P.Val P.Value
> Mm.277921 0.039259664 3.17E-06
> Mm.272646 0.050424143 9.93E-06
> Mm.148886 0.050424143 1.64E-05
> Mm.235998 0.050424143 2.02E-05
> Mm.4598 0.050424143 2.04E-05
> Mm.10728 0.101013086 4.89E-05
> Mm.162744 0.106930684 6.34E-05
> Mm.247564 0.106930684 6.91E-05
> Mm.269384 0.115716969 8.62E-05
> Mm.212428 0.115716969 9.34E-05
> Mm.457989 0.118548889 0.000126578
> Mm.154662 0.118548889 0.000128005
> Mm.21005 0.118548889 0.000133975
> Mm.5109 0.149489879 0.000196053
> Mm.207432 0.149489879 0.00020444
>
> Does anyone know why several probesets have the same adjusted p value even
> though the regular p value is different for each gene? I'm 90% sure this is
> just my ignorance about the BH method, but I'll be very thankful to anyone
> who can point me in the right direction. Thanks in advance,
>
> Dave Young
>
>> sessionInfo()
> R version 2.13.1 (2011-07-08)
> Platform: i386-pc-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
> States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
>
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] limma_3.8.3 mouse430a2mmugcdf_14.1.0 simpleaffy_2.28.0
> gcrma_2.24.1
> [5] genefilter_1.34.0 affy_1.30.0 Biobase_2.12.2
>
>
> loaded via a namespace (and not attached):
> [1] affyio_1.20.0 annotate_1.30.1 AnnotationDbi_1.14.1
> Biostrings_2.20.3
> [5] DBI_0.2-5 IRanges_1.10.6 preprocessCore_1.14.0
> RSQLite_0.9-4
> [9] splines_2.13.1 survival_2.36-9 tools_2.13.1
> xtable_1.5-6
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list