[BioC] how edgeR control outliers?

Gordon K Smyth smyth at wehi.EDU.AU
Thu Mar 1 23:50:10 CET 2012

Dear Yuan,

The deviance is a standard quantity in generalized linear model theory, 
analogous to the residual sum of squares in ANOVA.  It is usually treated 
as chisquare distributed, although this approximation can be rough in some 
cases.  See for example:


Yes, when I said to test for outliers using the gof() function in


I meant that outliers are those with large gof statistics.  The 
calculation of p-values to test for outliers is already done for you by 
the gof() function.

Figure 2 of the following article provides some plots of gof() statistics:


The plots are made by

  g <- gof(fit)
  z <- zscoreGamma(g$gof.statistics,shape=gof$df/2,scale=2)

Another very useful diagnostic is to plot the tagwise dispersion against 
abundance.  Outliers may appear as large dispersions.  In the 
developmental version of edgeR, there is a function plotBCV() provided to 
do this.

Best wishes

> Date: Wed, 29 Feb 2012 20:09:06 -0800
> From: Yuan Tian <ytianidyll at ucla.edu>
> To: Bioconductor mailing list <bioconductor at r-project.org>
> Subject: [BioC] how edgeR control outliers?
> Dear all,
> I'm currently using edgeR to detect the differentially expressed genes 
> from a RNAseq datasets, and I'm also using the gof() function to test 
> for potential outliers. I have two questions regarding the outlier 
> detection, and would like to have your suggestions.
> 1) How the outlier is defined? Is it the gene that have a deviance 
> larger than a threshold? How is the deviance contained in the glmfit 
> data calculated?
> 2) In gof() function, it assumes the deviance should follow a 
> chi-squared distribution. But what is the statistic basis for this 
> assumption?
> Thanks!
> Yuan

The information in this email is confidential and intend...{{dropped:4}}

More information about the Bioconductor mailing list