[BioC] edgeR warnings and then error

john herbert arraystruggles at gmail.com
Wed Nov 14 18:37:29 CET 2012


Dear Gordon,
I think I found it by checking out the URL;
https://hedgehog.fhcrc.org/bioconductor/branches/RELEASE_2_11/madman/Rpacks/edgeR

Thanks,

Kind regards,

John.

On Wed, Nov 14, 2012 at 4:58 PM, john herbert <arraystruggles at gmail.com> wrote:
> Dear Gordon,
> I want to try your fix but am not sure where to find it?
>
> Does this just appear on the Bioconductor website after some time? I
> see 3.0.3 on the website and if I checkout of svn I get  edgeR_3.1.2.
>
> I notice there is a -r flag with svn but it syntax errors with -r
> 3.0.4. Tried a few things but no luck.
>
> Any help appreciated.
> Thanks,
>
> John.
>
> On Tue, Nov 13, 2012 at 10:02 PM, john herbert <arraystruggles at gmail.com> wrote:
>> Dear Gordon,
>> That is brilliant you tried the data and altered edgeR so quick, thank
>> you. I had read in the manual and elsewhere that methods like edgeR
>> and DESeq best performs with raw counts but was tryint it as was
>> suggested in the eXpress manual.
>>
>> Thanks about pointnig out quality issues with the data. Accordingly, I
>> get better results if I leave out the third batch of samples, which
>> removes those you mention.
>>
>> Kind regards,
>>
>> John.
>>
>>
>>
>>
>>
>> On Tue, Nov 13, 2012 at 6:40 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>> Dear John,
>>>
>>> Thanks for supplying example data.  I have committed a fix an hour ago to
>>> edgeR 3.0.4.  Using the fixed package, the code I give below runs on my
>>> laptop using your data in a few seconds.
>>>
>>> Two additional comments.  First, we warn in the edgeR User's Guide (Section
>>> 2.5.6) against using estimated counts from cufflinks or eXpress.
>>>
>>> Secondly, your data seems to have some serious quality issues.  An MDS plot
>>> plotMDS(predFC(y)) shows that samples con3, con4 and case3 are massively
>>> different from the other samples.  The batch effect will not correct for
>>> this.  The data shows very large dispersion values.  So I don't think you
>>> will get good results from the code below, even though it runs now without
>>> errors.
>>>
>>> Best wishes
>>> Gordon
>>>
>>>> library(edgeR)
>>>> x <- read.delim("eXpress_Effective_Counts_Example.txt",row.names=1)
>>>> group <-
>>>
>>> factor(c("c","c","c","c","case","case","case","case"),levels=c("c","case"))
>>>>
>>>> batch <- factor(c(1,2,3,3,1,2,3,3))
>>>> design <- model.matrix(~batch+group)
>>>> keep <- rowSums(cpm(x)>0.5) >= 3
>>>> y <- DGEList(counts=x[keep,],group=group)
>>>
>>> Calculating library sizes from column totals.
>>>>
>>>> y <- calcNormFactors(y)
>>>> y <- estimateGLMCommonDisp(y)
>>>> y <- estimateGLMTrendedDisp(y)
>>>
>>> Loading required package: splines
>>>>
>>>> y <- estimateGLMTagwiseDisp(y)
>>>>
>>>> sessionInfo()
>>>
>>> R version 2.15.2 (2012-10-26)
>>> Platform: i386-w64-mingw32/i386 (32-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
>>> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
>>> [5] LC_TIME=English_Australia.1252
>>>
>>>
>>> attached base packages:
>>> [1] splines   stats     graphics  grDevices utils     datasets  methods
>>> [8] base
>>>
>>> other attached packages:
>>> [1] edgeR_3.0.4  limma_4.15.5
>>>
>>>
>>>
>>>
>>> On Mon, 12 Nov 2012, john herbert wrote:
>>>
>>>> Dear Gordon,
>>>> Thanks for your help. I tried the latest version of edgeR but
>>>> unfortunately there is still an issue.
>>>>
>>>> First, I don't have any problems if I use simple count assignments
>>>> extracted from BAM alignment files. However, I am coming across the
>>>> problem when using the "Effective counts" values as provided by
>>>> running the eXpress software (this uses statistics to assign
>>>> transcript counts onto a de-novo assembled transcriptome).
>>>>
>>>> In the manual it recommends I use "Effective counts" from the output
>>>> of eXpress as input to Bioconductor programs like edgeR and DESeq.
>>>> Manual is here;
>>>> http://bio.math.berkeley.edu/eXpress/tutorial.html#output
>>>>
>>>> I have tried edgeR with simple counts and this works fine but my
>>>> problem is hanging in the
>>>>
>>>> y4 <- estimateGLMTrendedDisp(y4,design) step with eXpress counts (even
>>>> with the upgrade).
>>>>
>>>> I attach the eXpress assigned effective counts in case you/anyone
>>>> wants to replicate the error. Please download from;
>>>>
>>>>
>>>> https://docs.google.com/file/d/0B9IUGsKecS4GTS01WW95WGhiWW8/edit?authkey=CNSv7PIB&authkey=CNSv7PIB
>>>>
>>>> In the file, the first 4 columns are control and the last four case.
>>>>
>>>> group <- factor(c("c","c","c","c","case","case","case","case"),
>>>> levels=c("c","case"))
>>>> batch <- factor(c(1,2,3,3,1,2,3,3))
>>>> design <- model.matrix(~batch+group)
>>>>
>>>> Thank you for any advice to help solve this.
>>>>
>>>> Kind regards,
>>>>
>>>> John.
>>>>
>>>> ps. sorry if this appears twice as first time it said large files need
>>>> moderator of approval for the mailing list.
>>>>
>>>>
>>>>
>>>> On Sat, Nov 10, 2012 at 6:59 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>>
>>>>> Dear John,
>>>>>
>>>>> I'm happy to work with you to solve any programs, but first you need to
>>>>> be
>>>>> using the current Bioconductor release.  Bioconductor 2.11 was released
>>>>> on 3
>>>>> October 2012, see
>>>>>
>>>>>   http://www.bioconductor.org/news/bioc_2_11_release/
>>>>>
>>>>> and the current version of edgeR is 3.0.2.  There are substantial
>>>>> differences between the current version and the version you are using.
>>>>> The
>>>>> type of divergence that is being reported in the warnings is now
>>>>> impossible.
>>>>>
>>>>> R has just changed from six monthly releases to annual releases, and
>>>>> Bioconductor seems to have taken most users by surprise by releasing a
>>>>> new
>>>>> version of Bioconductor while the version of R remains the same.
>>>>>
>>>>> Best wishes
>>>>> Gordon
>>>>>
>>>>> --------------- original message -----------------
>>>>> [BioC] edgeR warnings and then error
>>>>> john herbert arraystruggles at gmail.com
>>>>> Fri Nov 9 17:22:15 CET 2012
>>>>>
>>>>> Hello Bioconductors,
>>>>>
>>>>> For some data, the code below works great and I find differentially
>>>>> expressed genes. However, I have added some biological replicates to
>>>>> the data, which caused warnings and then an error.
>>>>>
>>>>> The warnings come from the estimateGLMTagwiseDisp function, then an
>>>>> error appears in the glmLRT function.
>>>>>
>>>>> I tried the development version of R and edgeR but this now is still
>>>>> running after several days.
>>>>>
>>>>> In summary,  glmLRT works fine if no warnings are produced by
>>>>> estimateGLMTagwiseDisp as below but with warnings, it breaks down.
>>>>>
>>>>> There could be a work-around but not seen this yet.
>>>>>
>>>>> Thank you in advance of any tips.
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> John..
>>>>>
>>>>>> library(edgeR)
>>>>>>
>>>>>> group <-
>>>>>
>>>>>
>>>>>
>>>>> factor(c("c","c","c","c","sp","sp","sp","sp","lp","lp","lp","lp","ss","ss","ss","ss","ls","ls","ls","ls","ag","ag","ag","ag"),
>>>>> levels=c("c","sp","lp","ss","ls","ag"))
>>>>>>
>>>>>>
>>>>>> batch <- factor(c(1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3))
>>>>>> design <- model.matrix(~batch+group)
>>>>>>
>>>>>> counts <- read.table("Final_L1_L2_L3_mine_simple_all_reads")
>>>>>>
>>>>>> x4=counts
>>>>>>
>>>>>> y4 <- DGEList(counts=x4,group=group)
>>>>>
>>>>>
>>>>> Calculating library sizes from column totals.
>>>>>>
>>>>>>
>>>>>>
>>>>>> y4 <- estimateGLMCommonDisp(y4,design)
>>>>>>
>>>>>> y4 <- estimateGLMTrendedDisp(y4,design)
>>>>>>
>>>>>> y4 <- estimateGLMTagwiseDisp(y4,design)
>>>>>
>>>>>
>>>>> There were 24 warnings (use warnings() to see them)
>>>>>>
>>>>>>
>>>>>>
>>>>>> fit4 <- glmFit(y4, design)
>>>>>>
>>>>>> lrt4 <- glmLRT(fit4, coef=3)
>>>>>
>>>>>
>>>>> Error in array(x, c(length(x), 1L), if (!is.null(names(x)))
>>>>> list(names(x), :
>>>>>   dims [product 0] do not match the length of object [13]
>>>>>>
>>>>>>
>>>>>> warnings()
>>>>>
>>>>>
>>>>> Warning messages:
>>>>> 1: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 2: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 3: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 4: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 5: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>>>> exceeded
>>>>> 6: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>>>> exceeded
>>>>> 7: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 8: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 9: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>>>> exceeded
>>>>> 10: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 11: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 12: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 13: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 14: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>>> iterations exceeded
>>>>> 15: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 16: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 17: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 18: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>>> iterations exceeded
>>>>> 19: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 20: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 21: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 22: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>>> iterations exceeded
>>>>> 23: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>>> 24: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>>> iterations exceeded
>>>>>
>>>>>> sessionInfo()
>>>>>
>>>>>
>>>>> R version 2.15.2 (2012-10-26)
>>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>>
>>>>> locale:
>>>>> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
>>>>> Kingdom.1252    LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>>> [5] LC_TIME=English_United Kingdom.1252
>>>>>
>>>>> attached base packages:
>>>>> [1] splines   stats     graphics  grDevices utils     datasets
>>>>> methods   base
>>>>>
>>>>> other attached packages:
>>>>>  [1] pasilla_0.2.13     DEXSeq_1.2.1       DESeq_1.8.3
>>>>> locfit_1.5-8       edgeR_2.6.12       plyr_1.7.1
>>>>> genefilter_1.38.0  Biobase_2.16.0
>>>>>  [9] BiocGenerics_0.2.0 limma_3.12.3
>>>>>
>>>>> loaded via a namespace (and not attached):
>>>>>  [1] annotate_1.34.0      AnnotationDbi_1.18.0 biomaRt_2.12.0
>>>>> DBI_0.2-5            geneplotter_1.34.0   grid_2.15.2
>>>>> hwriter_1.3
>>>>>  [8] IRanges_1.14.2       lattice_0.20-10      RColorBrewer_1.0-5
>>>>> RCurl_1.91-1.1       RSQLite_0.11.1       statmod_1.4.16
>>>>> stats4_2.15.2
>>>>> [15] stringr_0.6          survival_2.36-14     tools_2.15.2
>>>>> XML_3.9-4.1          xtable_1.7-0
>>>
>>>
>>> ______________________________________________________________________
>>> The information in this email is confidential and intended solely for the
>>> addressee.
>>> You must not disclose, forward, print or use it without the permission of
>>> the sender.
>>> ______________________________________________________________________



More information about the Bioconductor mailing list