[BioC] GLMs in DESeq (convergence problems)

Gordon K Smyth smyth at wehi.EDU.AU
Fri Mar 11 00:35:18 CET 2011


Hi Seanna,

As Simon says, just increasing the number of iterations may well not solve 
your problem.  DESeq just makes transcript-wise calls to the glm() 
function, and the glm() function doesn't always converge for RNA-Seq data, 
as was pointed out in a number of posts on the Bioconductor mailing list 
late last year.  A less important annoyance is that transcript-wise calls 
to glm() are slow.

The more complete solution is to implement new glm code that modifies the 
glm algorithm to prevent divergence.  This has been done in the glm 
functions in the edgeR package.  The edgeR glm code is purpose designed 
for RNA-Seq data, and has the added advantage of being very much faster.

It is still possible to have convergence problems even with the new edgeR 
code, but we've found that they are dramatically less common.

Best wishes
Gordon

--------------------------------------------
Professor Gordon K Smyth,
NHMRC Senior Research Fellow,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth


> Date: Wed, 09 Mar 2011 13:39:14 +0100
> From: Simon Anders <anders at embl.de>
> To: Seanna McTaggart <smctagga at staffmail.ed.ac.uk>,
> 	"bioconductor at r-project.org" <bioconductor at r-project.org>
> Subject: Re: [BioC] GLMs in DESeq
>
> Hi Seanna
>
> On 03/08/2011 05:13 PM, Seanna McTaggart wrote:
>> I am using DESeq to determine DE of an RNA-Seq project that has
>> multiple genotypes exposed to different treatments. I would like to
>> use the GLM functionality to partition the variance in the count data
>> between genotype and treatment. However, when I follow the
>> suggestions of Simon (Bioconductor Digest, Vol 96, Issue 9), neither
>> model (fit0 or fit1) is reaching convergence, and I was wondering if
>> it was possible for me to increase the number of iterations to see if
>> this would help out. Otherwise I would appreciate any advice on how
>> to proceed.
>
> I've just added a new argument to the function 'nbinomTestGLM', called
> 'glmControl', which is a list of GLM control parameters as described in
> the 'glm.control' help page.
>
> So, you could try something like
>
> fit0 <- nbinomTestGLM( cds, count ~ whatever,
>    glmControl = list( maxit=75 ) )
>
> This should increase the maximum number of iterations from 25 to 75.
>
> However, it is well possible that this does not help much. If your model
> fails to converge for any gene, it might be a deeper problem, and if it
> fails to converge only for a few genes, these might be somehow
> "stubborn" and won't converge at all, no matter how many iterations you
> wait. If necessary, ask again with more details.
>
>   Simon

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}



More information about the Bioconductor mailing list