[BioC] edgeR: dispersion estimation
Gordon K Smyth
smyth at wehi.EDU.AU
Wed May 14 01:47:18 CEST 2014
Dear Yanzhu,
Thanks for providing your data. edgeR will run quickly and correctly on
your data if you use the classic dispersion estimation functions instead
of the GLM functions. The classic dispersion estimators are correct for
your design matrix, because you are including all interactions in your
factorial model. You can run glmFit and glmLRT using the classic
dispersion estimates if you wish, or you can do exactTests between the
groups. My code on your data is given below.
The GLM functions fail because you aren't doing enough filtering of genes
with low counts. Your data has 96 different groups. Keeping genes with
just one count means that 95 out of the 96 groups have zero counts for all
cases and the other has just one count. No proper estimate of the
dispersion is possible for these genes, and it seems that the GLM Cox-Reid
code doesn't fully cope with this.
Best wishes
Gordon
> x <- read.delim("newPHTSeqRE_05092014.txt",row.names="gene")
> x2 <- x[rowSums(x)>0,]
> logCPM <- cpm(x2,log=TRUE,prior.count=5)
> mds <- plotMDS(logCPM,labels=1:ncol(x2),gene.selection="common")
> targets <- readTargets("newPHTSeqCondRE_05092014.txt")
> Rep <- factor(targets$Rep)
> Line <- factor(targets$Line)
> Sex <- factor(targets$Sex)
> Group <- paste(Line,Sex,Rep,sep=".")
> y <- DGEList(counts=x2,group=Group)
> y <- calcNormFactors(y,method="TMM")
> y <- estimateCommonDisp(y)
> y$common.disp
[1] 0.1308894
> y <- estimateTagwiseDisp(y)
> summary(y$tagwise.disp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.002045 0.056520 0.163700 0.335500 0.389100 6.376000
On Fri, 9 May 2014, Gordon K Smyth wrote:
> Dear Yanzhu,
>
> Please send me your data offlist so that we can trouble-shoot.
>
> Best wishes
> Gordon
>
>> Date: Wed, 7 May 2014 11:29:40 -0700 (PDT)
>> From: "Yanzhu [guest]" <guest at bioconductor.org>
>> To: bioconductor at r-project.org, mlinyzh at gmail.com
>> Subject: [BioC] edgeR: dispersion estimation
>>
>> Dear Gordon,
>>
>> Sorry for replying late, I am sick and take some days off. The following
>> are my answers to your questions:
>>
>> I have checked they are all integers, they are read counts generated by
>> HT-Seq (v0.5.3p9). When I type your command I get zero as below:
>>
>> max(y$counts-floor(y$counts))
>> [1] 0
>>
>>
>> I have also posted the session information as below. Both the R and edgeR
>> package are the most recent versions. However, the results are still the
>> same as before.
>>
>> Are there other possible reasons that would cause such issue?
>>
>>
>> Thank you very much for your help! I greatly appreciate it.
>>
>>
>>
>> Yanzhu
>>
>>
>> ########################################
>> Dear Yanzhu,
>>
>> I find it hard to believe that your data are all integers, because there
>> is something seriously wrong, and we have only ever seen the problem you
>> report when there are fractional counts.
>>
>> How have you checked that they are all integers? How were the counts
>> created? What do you see if you type:
>>
>> max(y$counts-floor(y$counts))
>>
>> It also isn't clear that you have installed the current version of edgeR.
>> Have you installed R 3.1.0? Can you please give the sessionInfo() output?
>> The current version of edgeR is 3.6.1, whereas you seem to be using
>> version 3.2.4, which is two Bioconductor releases out of date.
>>
>> Gordon
>>
>>
>>
>> -- output of sessionInfo():
>>
>>> sessionInfo()
>> R version 3.1.0 (2014-04-10)
>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United
>> States.1252 LC_MONETARY=English_United States.1252
>> [4] LC_NUMERIC=C LC_TIME=English_United
>> States.1252
>>
>> attached base packages:
>> [1] parallel stats graphics grDevices utils datasets methods
>> base
>>
>> other attached packages:
>> [1] DESeq_1.16.0 lattice_0.20-29 locfit_1.5-9.1
>> Biobase_2.24.0 BiocGenerics_0.10.0
>> [6] edgeR_3.6.1 limma_3.20.1 BiocInstaller_1.14.2
>>
>> loaded via a namespace (and not attached):
>> [1] annotate_1.42.0 AnnotationDbi_1.26.0 DBI_0.2-7
>> genefilter_1.46.0 geneplotter_1.42.0
>> [6] GenomeInfoDb_1.0.2 grid_3.1.0 IRanges_1.22.6
>> RColorBrewer_1.0-5 RSQLite_0.11.4
>> [11] splines_3.1.0 stats4_3.1.0 survival_2.37-7
>> tools_3.1.0 XML_3.98-1.1
>> [16] xtable_1.7-3
>
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}
More information about the Bioconductor
mailing list