[BioC] edgeR warnings and then error

john herbert arraystruggles at gmail.com
Tue Nov 13 23:02:43 CET 2012


Dear Gordon,
That is brilliant you tried the data and altered edgeR so quick, thank
you. I had read in the manual and elsewhere that methods like edgeR
and DESeq best performs with raw counts but was tryint it as was
suggested in the eXpress manual.

Thanks about pointnig out quality issues with the data. Accordingly, I
get better results if I leave out the third batch of samples, which
removes those you mention.

Kind regards,

John.





On Tue, Nov 13, 2012 at 6:40 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
> Dear John,
>
> Thanks for supplying example data.  I have committed a fix an hour ago to
> edgeR 3.0.4.  Using the fixed package, the code I give below runs on my
> laptop using your data in a few seconds.
>
> Two additional comments.  First, we warn in the edgeR User's Guide (Section
> 2.5.6) against using estimated counts from cufflinks or eXpress.
>
> Secondly, your data seems to have some serious quality issues.  An MDS plot
> plotMDS(predFC(y)) shows that samples con3, con4 and case3 are massively
> different from the other samples.  The batch effect will not correct for
> this.  The data shows very large dispersion values.  So I don't think you
> will get good results from the code below, even though it runs now without
> errors.
>
> Best wishes
> Gordon
>
>> library(edgeR)
>> x <- read.delim("eXpress_Effective_Counts_Example.txt",row.names=1)
>> group <-
>
> factor(c("c","c","c","c","case","case","case","case"),levels=c("c","case"))
>>
>> batch <- factor(c(1,2,3,3,1,2,3,3))
>> design <- model.matrix(~batch+group)
>> keep <- rowSums(cpm(x)>0.5) >= 3
>> y <- DGEList(counts=x[keep,],group=group)
>
> Calculating library sizes from column totals.
>>
>> y <- calcNormFactors(y)
>> y <- estimateGLMCommonDisp(y)
>> y <- estimateGLMTrendedDisp(y)
>
> Loading required package: splines
>>
>> y <- estimateGLMTagwiseDisp(y)
>>
>> sessionInfo()
>
> R version 2.15.2 (2012-10-26)
> Platform: i386-w64-mingw32/i386 (32-bit)
>
> locale:
> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
> [5] LC_TIME=English_Australia.1252
>
>
> attached base packages:
> [1] splines   stats     graphics  grDevices utils     datasets  methods
> [8] base
>
> other attached packages:
> [1] edgeR_3.0.4  limma_4.15.5
>
>
>
>
> On Mon, 12 Nov 2012, john herbert wrote:
>
>> Dear Gordon,
>> Thanks for your help. I tried the latest version of edgeR but
>> unfortunately there is still an issue.
>>
>> First, I don't have any problems if I use simple count assignments
>> extracted from BAM alignment files. However, I am coming across the
>> problem when using the "Effective counts" values as provided by
>> running the eXpress software (this uses statistics to assign
>> transcript counts onto a de-novo assembled transcriptome).
>>
>> In the manual it recommends I use "Effective counts" from the output
>> of eXpress as input to Bioconductor programs like edgeR and DESeq.
>> Manual is here;
>> http://bio.math.berkeley.edu/eXpress/tutorial.html#output
>>
>> I have tried edgeR with simple counts and this works fine but my
>> problem is hanging in the
>>
>> y4 <- estimateGLMTrendedDisp(y4,design) step with eXpress counts (even
>> with the upgrade).
>>
>> I attach the eXpress assigned effective counts in case you/anyone
>> wants to replicate the error. Please download from;
>>
>>
>> https://docs.google.com/file/d/0B9IUGsKecS4GTS01WW95WGhiWW8/edit?authkey=CNSv7PIB&authkey=CNSv7PIB
>>
>> In the file, the first 4 columns are control and the last four case.
>>
>> group <- factor(c("c","c","c","c","case","case","case","case"),
>> levels=c("c","case"))
>> batch <- factor(c(1,2,3,3,1,2,3,3))
>> design <- model.matrix(~batch+group)
>>
>> Thank you for any advice to help solve this.
>>
>> Kind regards,
>>
>> John.
>>
>> ps. sorry if this appears twice as first time it said large files need
>> moderator of approval for the mailing list.
>>
>>
>>
>> On Sat, Nov 10, 2012 at 6:59 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>
>>> Dear John,
>>>
>>> I'm happy to work with you to solve any programs, but first you need to
>>> be
>>> using the current Bioconductor release.  Bioconductor 2.11 was released
>>> on 3
>>> October 2012, see
>>>
>>>   http://www.bioconductor.org/news/bioc_2_11_release/
>>>
>>> and the current version of edgeR is 3.0.2.  There are substantial
>>> differences between the current version and the version you are using.
>>> The
>>> type of divergence that is being reported in the warnings is now
>>> impossible.
>>>
>>> R has just changed from six monthly releases to annual releases, and
>>> Bioconductor seems to have taken most users by surprise by releasing a
>>> new
>>> version of Bioconductor while the version of R remains the same.
>>>
>>> Best wishes
>>> Gordon
>>>
>>> --------------- original message -----------------
>>> [BioC] edgeR warnings and then error
>>> john herbert arraystruggles at gmail.com
>>> Fri Nov 9 17:22:15 CET 2012
>>>
>>> Hello Bioconductors,
>>>
>>> For some data, the code below works great and I find differentially
>>> expressed genes. However, I have added some biological replicates to
>>> the data, which caused warnings and then an error.
>>>
>>> The warnings come from the estimateGLMTagwiseDisp function, then an
>>> error appears in the glmLRT function.
>>>
>>> I tried the development version of R and edgeR but this now is still
>>> running after several days.
>>>
>>> In summary,  glmLRT works fine if no warnings are produced by
>>> estimateGLMTagwiseDisp as below but with warnings, it breaks down.
>>>
>>> There could be a work-around but not seen this yet.
>>>
>>> Thank you in advance of any tips.
>>>
>>> Kind regards,
>>>
>>> John..
>>>
>>>> library(edgeR)
>>>>
>>>> group <-
>>>
>>>
>>>
>>> factor(c("c","c","c","c","sp","sp","sp","sp","lp","lp","lp","lp","ss","ss","ss","ss","ls","ls","ls","ls","ag","ag","ag","ag"),
>>> levels=c("c","sp","lp","ss","ls","ag"))
>>>>
>>>>
>>>> batch <- factor(c(1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3))
>>>> design <- model.matrix(~batch+group)
>>>>
>>>> counts <- read.table("Final_L1_L2_L3_mine_simple_all_reads")
>>>>
>>>> x4=counts
>>>>
>>>> y4 <- DGEList(counts=x4,group=group)
>>>
>>>
>>> Calculating library sizes from column totals.
>>>>
>>>>
>>>>
>>>> y4 <- estimateGLMCommonDisp(y4,design)
>>>>
>>>> y4 <- estimateGLMTrendedDisp(y4,design)
>>>>
>>>> y4 <- estimateGLMTagwiseDisp(y4,design)
>>>
>>>
>>> There were 24 warnings (use warnings() to see them)
>>>>
>>>>
>>>>
>>>> fit4 <- glmFit(y4, design)
>>>>
>>>> lrt4 <- glmLRT(fit4, coef=3)
>>>
>>>
>>> Error in array(x, c(length(x), 1L), if (!is.null(names(x)))
>>> list(names(x), :
>>>   dims [product 0] do not match the length of object [13]
>>>>
>>>>
>>>> warnings()
>>>
>>>
>>> Warning messages:
>>> 1: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 2: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 3: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 4: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 5: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>> exceeded
>>> 6: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>> exceeded
>>> 7: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 8: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 9: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>> exceeded
>>> 10: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 11: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 12: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 13: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 14: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>> iterations exceeded
>>> 15: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 16: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 17: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 18: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>> iterations exceeded
>>> 19: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 20: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 21: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 22: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>> iterations exceeded
>>> 23: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>> 24: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>> iterations exceeded
>>>
>>>> sessionInfo()
>>>
>>>
>>> R version 2.15.2 (2012-10-26)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>> Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_United Kingdom.1252
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets
>>> methods   base
>>>
>>> other attached packages:
>>>  [1] pasilla_0.2.13     DEXSeq_1.2.1       DESeq_1.8.3
>>> locfit_1.5-8       edgeR_2.6.12       plyr_1.7.1
>>> genefilter_1.38.0  Biobase_2.16.0
>>>  [9] BiocGenerics_0.2.0 limma_3.12.3
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] annotate_1.34.0      AnnotationDbi_1.18.0 biomaRt_2.12.0
>>> DBI_0.2-5            geneplotter_1.34.0   grid_2.15.2
>>> hwriter_1.3
>>>  [8] IRanges_1.14.2       lattice_0.20-10      RColorBrewer_1.0-5
>>> RCurl_1.91-1.1       RSQLite_0.11.1       statmod_1.4.16
>>> stats4_2.15.2
>>> [15] stringr_0.6          survival_2.36-14     tools_2.15.2
>>> XML_3.9-4.1          xtable_1.7-0
>
>
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}



More information about the Bioconductor mailing list