[BioC] EdgeR: dispersion estimation

Gordon K Smyth smyth at wehi.EDU.AU
Fri Apr 25 04:39:50 CEST 2014


Dear Yanzhu,

My guess is that some of your "count data" are not integers.  For example, 
are they perhaps expected counts from RSEM?  In the edgeR version that you 
are using, the GLM dispersion estimation functions do not work correctly 
for non-integer data.  (They weren't intended to.)

Please update your copyies of R and edgeR to the latest versions. 
Bioconductor 2.14 was released a couple of weeks ago.  All edgeR functions 
now permit non-integer "counts".

Also check that your data are counts and not RPKM or similar.  The counts 
should sum to the total sequence depth for each sample.

Best wishes
Gordon

> Date: Wed, 23 Apr 2014 07:58:30 -0700 (PDT)
> From: "Yanzhu [guest]" <guest at bioconductor.org>
> To: bioconductor at r-project.org, mlinyzh at gmail.com
> Subject: [BioC] EdgeR: dispersion estimation
>
>
> Dear community,
>
> I use edgeR to do the data analysis of my RNA-seq project (as mentioned in my previous posts about multi-factor analysis of RNA-Seq project), I meet an issue with dispersion estimation:
> I first used estimateGLMCommonDisp and then used estimateGLMTagwiseDisp to estimate the dispersion, however, I got 3.999943 for y$common.dispersion and 0.0624991 for all of the y$tagwise.dispersion (all of the y$tagwise.dispersion are the same). isn't it that all of the tagwise dispersion should NOT be the same?
>
> The fellowing is the code I used:
> ##Read in count data
> T<-data.frame(HTSeqRE)
>
> ##Factors:
> Design<-data.frame(HTSeqCondRE[,2:4])
> Rep<-as.factor(Design$Rep)
> Line<-as.factor(Design$Line)
> Sex<-as.factor(Design$Sex)
> design<-model.matrix(~Line+Rep+Sex+Line:Rep+Line:Sex+Rep:Sex+Line:Sex:Rep)
>
> group<-paste(Design$Line,Design$Sex,Design$Rep,sep=".")
> y<-DGEList(counts=T,group=group)
>
>
> y<-calcNormFactors(y,method="TMM")
>
> y<-estimateGLMCommonDisp(y,design)
> y<-estimateGLMTagwiseDisp(y,design)
>
> y$common.dispersion
> [1] 3.999943
>
> y$tagwise.dispersion
> [1] 0.0624991 0.0624991 0.0624991 0.0624991 0.0624991
> 13474 more elements ...
>
>
> Yanzhu
>
> -- output of sessionInfo():
>
>> sessionInfo()
> R version 3.0.1 (2013-05-16)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] parallel  stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] DESeq_1.12.1       lattice_0.20-27    locfit_1.5-9.1     Biobase_2.20.1     BiocGenerics_0.6.0 edgeR_3.2.4        limma_3.16.8
>
> loaded via a namespace (and not attached):
> [1] annotate_1.38.0      AnnotationDbi_1.22.6 DBI_0.2-7            genefilter_1.42.0    geneplotter_1.38.0   grid_3.0.1           IRanges_1.18.4
> [8] RColorBrewer_1.0-5   RSQLite_0.11.4       splines_3.0.1        stats4_3.0.1         survival_2.37-4      tools_3.0.1          XML_3.98-1.1
> [15] xtable_1.7-3
>
> --
> Sent via the guest posting facility at bioconductor.org.

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list