[BioC] Drastic change in edgeR p-value calculation?
Gordon K Smyth
smyth at wehi.EDU.AU
Thu Dec 20 01:45:10 CET 2012
Dear Lee Brooks,
The current version will behave almost the same as the older version if
you set the prior.df to the old default value:
estimateTagwiseDisp(y, prior.df=20)
It seems that you have some genes with large fold changes but also large
variability between replicates. With a large prior.df, the dispersions
for these genes are forced down towards the global trend, and this is
apparently enough to allow them to be statistically significant. With a
smaller prior.df, the tagwise dispersions remain closer to individual
dispersion values, so the genes are penalized more for their large
variability, and they become non-significant.
Best wishes
Gordon
> Date: Tue, 18 Dec 2012 13:35:11 -0500
> From: "Lionel (Lee) Brooks 3rd" <Lionel.Brooks.GR at Dartmouth.edu>
> To: bioconductor at r-project.org
> Subject: [BioC] Drastic change in edgeR p-value calculation?
>
> Hello Gentlepeople,
>
> I have been using the edgeR package to identify differentially abundant
> tags in an RNA-seq experiment.
>
> I was happy when the edgeR package version 2.4.6 called 41
> differentially abundant tags.
>
> However, now I am unhappy because the same analysis with edgeR package
> version 3.0.2 does not call any differentially abundant tags.
>
> I've compared the output of the two versions and I can see that both
> versions calculate the same fold changes - however - the p-value
> calculation is different.
>
> I feel that the optimal solution for me is to determine the command
> arguments that I can use to emulate the previous default behavior.
>
> Can anyone help me figure that out?
> I would be extremely grateful for some guidance.
> This way, I could keep my bioconductor packages up to date.
>
> If you think you can help then please see below, for the code and
> session Info.
>
> Thank you for your time.
>
> sincerely,
> Lee Brooks
>
> ----
> HERE IS THE CODE PART
>
> Here is the procedure that I have been using,
> where x is the data frame object containing my data:
> y <- DGEList(counts=x,group=group)
> y <- calcNormFactors(y)
> y <- estimateCommonDisp(y)
> y <- estimateTagwiseDisp(y)
> et <- exactTest(y)
> topTags(et)
> de <- decideTestsDGE(et, p=0.05, adjust="BH")
> detags <- et[as.logical(de)]
> detags <- topTags(detags, n = length(detags))
>
> ----
> HERE IS AN EXAMPLE OUTPUT FROM EACH VERSION:
>
> Tables Tables
>
> logFC logCPM PValue FDR
> fooTag_edgeR_2.4.6 5.12318 12.6577 2.9E-30 8.9E-26
> fooTag_edgeR_3.0.2 5.12318 12.6612 0.00717 1
>
>
> -----
> HERE IS THE sessionInfo() PART...
>
> THE CONFIGURATION THAT WORKED FOR ME:
>
> R version 2.14.1 (2011-12-22)
> Platform: i686-pc-linux-gnu (32-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_2.4.6 limma_3.10.3
>
> THE CONFIGURATION THAT DOES NOT WORK FOR ME:
>
> R version 2.15.2 (2012-10-26)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C
> [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8
> [5] LC_MONETARY=en_US.utf8 LC_MESSAGES=en_US.utf8
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] edgeR_3.0.2 limma_3.14.1
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list