[BioC] edgeR warnings and then error
john herbert
arraystruggles at gmail.com
Wed Nov 14 17:58:38 CET 2012
Dear Gordon,
I want to try your fix but am not sure where to find it?
Does this just appear on the Bioconductor website after some time? I
see 3.0.3 on the website and if I checkout of svn I get edgeR_3.1.2.
I notice there is a -r flag with svn but it syntax errors with -r
3.0.4. Tried a few things but no luck.
Any help appreciated.
Thanks,
John.
On Tue, Nov 13, 2012 at 10:02 PM, john herbert <arraystruggles at gmail.com> wrote:
> Dear Gordon,
> That is brilliant you tried the data and altered edgeR so quick, thank
> you. I had read in the manual and elsewhere that methods like edgeR
> and DESeq best performs with raw counts but was tryint it as was
> suggested in the eXpress manual.
>
> Thanks about pointnig out quality issues with the data. Accordingly, I
> get better results if I leave out the third batch of samples, which
> removes those you mention.
>
> Kind regards,
>
> John.
>
>
>
>
>
> On Tue, Nov 13, 2012 at 6:40 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>> Dear John,
>>
>> Thanks for supplying example data. I have committed a fix an hour ago to
>> edgeR 3.0.4. Using the fixed package, the code I give below runs on my
>> laptop using your data in a few seconds.
>>
>> Two additional comments. First, we warn in the edgeR User's Guide (Section
>> 2.5.6) against using estimated counts from cufflinks or eXpress.
>>
>> Secondly, your data seems to have some serious quality issues. An MDS plot
>> plotMDS(predFC(y)) shows that samples con3, con4 and case3 are massively
>> different from the other samples. The batch effect will not correct for
>> this. The data shows very large dispersion values. So I don't think you
>> will get good results from the code below, even though it runs now without
>> errors.
>>
>> Best wishes
>> Gordon
>>
>>> library(edgeR)
>>> x <- read.delim("eXpress_Effective_Counts_Example.txt",row.names=1)
>>> group <-
>>
>> factor(c("c","c","c","c","case","case","case","case"),levels=c("c","case"))
>>>
>>> batch <- factor(c(1,2,3,3,1,2,3,3))
>>> design <- model.matrix(~batch+group)
>>> keep <- rowSums(cpm(x)>0.5) >= 3
>>> y <- DGEList(counts=x[keep,],group=group)
>>
>> Calculating library sizes from column totals.
>>>
>>> y <- calcNormFactors(y)
>>> y <- estimateGLMCommonDisp(y)
>>> y <- estimateGLMTrendedDisp(y)
>>
>> Loading required package: splines
>>>
>>> y <- estimateGLMTagwiseDisp(y)
>>>
>>> sessionInfo()
>>
>> R version 2.15.2 (2012-10-26)
>> Platform: i386-w64-mingw32/i386 (32-bit)
>>
>> locale:
>> [1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252
>> [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
>> [5] LC_TIME=English_Australia.1252
>>
>>
>> attached base packages:
>> [1] splines stats graphics grDevices utils datasets methods
>> [8] base
>>
>> other attached packages:
>> [1] edgeR_3.0.4 limma_4.15.5
>>
>>
>>
>>
>> On Mon, 12 Nov 2012, john herbert wrote:
>>
>>> Dear Gordon,
>>> Thanks for your help. I tried the latest version of edgeR but
>>> unfortunately there is still an issue.
>>>
>>> First, I don't have any problems if I use simple count assignments
>>> extracted from BAM alignment files. However, I am coming across the
>>> problem when using the "Effective counts" values as provided by
>>> running the eXpress software (this uses statistics to assign
>>> transcript counts onto a de-novo assembled transcriptome).
>>>
>>> In the manual it recommends I use "Effective counts" from the output
>>> of eXpress as input to Bioconductor programs like edgeR and DESeq.
>>> Manual is here;
>>> http://bio.math.berkeley.edu/eXpress/tutorial.html#output
>>>
>>> I have tried edgeR with simple counts and this works fine but my
>>> problem is hanging in the
>>>
>>> y4 <- estimateGLMTrendedDisp(y4,design) step with eXpress counts (even
>>> with the upgrade).
>>>
>>> I attach the eXpress assigned effective counts in case you/anyone
>>> wants to replicate the error. Please download from;
>>>
>>>
>>> https://docs.google.com/file/d/0B9IUGsKecS4GTS01WW95WGhiWW8/edit?authkey=CNSv7PIB&authkey=CNSv7PIB
>>>
>>> In the file, the first 4 columns are control and the last four case.
>>>
>>> group <- factor(c("c","c","c","c","case","case","case","case"),
>>> levels=c("c","case"))
>>> batch <- factor(c(1,2,3,3,1,2,3,3))
>>> design <- model.matrix(~batch+group)
>>>
>>> Thank you for any advice to help solve this.
>>>
>>> Kind regards,
>>>
>>> John.
>>>
>>> ps. sorry if this appears twice as first time it said large files need
>>> moderator of approval for the mailing list.
>>>
>>>
>>>
>>> On Sat, Nov 10, 2012 at 6:59 AM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>>>>
>>>> Dear John,
>>>>
>>>> I'm happy to work with you to solve any programs, but first you need to
>>>> be
>>>> using the current Bioconductor release. Bioconductor 2.11 was released
>>>> on 3
>>>> October 2012, see
>>>>
>>>> http://www.bioconductor.org/news/bioc_2_11_release/
>>>>
>>>> and the current version of edgeR is 3.0.2. There are substantial
>>>> differences between the current version and the version you are using.
>>>> The
>>>> type of divergence that is being reported in the warnings is now
>>>> impossible.
>>>>
>>>> R has just changed from six monthly releases to annual releases, and
>>>> Bioconductor seems to have taken most users by surprise by releasing a
>>>> new
>>>> version of Bioconductor while the version of R remains the same.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>> --------------- original message -----------------
>>>> [BioC] edgeR warnings and then error
>>>> john herbert arraystruggles at gmail.com
>>>> Fri Nov 9 17:22:15 CET 2012
>>>>
>>>> Hello Bioconductors,
>>>>
>>>> For some data, the code below works great and I find differentially
>>>> expressed genes. However, I have added some biological replicates to
>>>> the data, which caused warnings and then an error.
>>>>
>>>> The warnings come from the estimateGLMTagwiseDisp function, then an
>>>> error appears in the glmLRT function.
>>>>
>>>> I tried the development version of R and edgeR but this now is still
>>>> running after several days.
>>>>
>>>> In summary, glmLRT works fine if no warnings are produced by
>>>> estimateGLMTagwiseDisp as below but with warnings, it breaks down.
>>>>
>>>> There could be a work-around but not seen this yet.
>>>>
>>>> Thank you in advance of any tips.
>>>>
>>>> Kind regards,
>>>>
>>>> John..
>>>>
>>>>> library(edgeR)
>>>>>
>>>>> group <-
>>>>
>>>>
>>>>
>>>> factor(c("c","c","c","c","sp","sp","sp","sp","lp","lp","lp","lp","ss","ss","ss","ss","ls","ls","ls","ls","ag","ag","ag","ag"),
>>>> levels=c("c","sp","lp","ss","ls","ag"))
>>>>>
>>>>>
>>>>> batch <- factor(c(1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3,1,2,3,3))
>>>>> design <- model.matrix(~batch+group)
>>>>>
>>>>> counts <- read.table("Final_L1_L2_L3_mine_simple_all_reads")
>>>>>
>>>>> x4=counts
>>>>>
>>>>> y4 <- DGEList(counts=x4,group=group)
>>>>
>>>>
>>>> Calculating library sizes from column totals.
>>>>>
>>>>>
>>>>>
>>>>> y4 <- estimateGLMCommonDisp(y4,design)
>>>>>
>>>>> y4 <- estimateGLMTrendedDisp(y4,design)
>>>>>
>>>>> y4 <- estimateGLMTagwiseDisp(y4,design)
>>>>
>>>>
>>>> There were 24 warnings (use warnings() to see them)
>>>>>
>>>>>
>>>>>
>>>>> fit4 <- glmFit(y4, design)
>>>>>
>>>>> lrt4 <- glmLRT(fit4, coef=3)
>>>>
>>>>
>>>> Error in array(x, c(length(x), 1L), if (!is.null(names(x)))
>>>> list(names(x), :
>>>> dims [product 0] do not match the length of object [13]
>>>>>
>>>>>
>>>>> warnings()
>>>>
>>>>
>>>> Warning messages:
>>>> 1: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 2: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 3: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 4: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 5: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>>> exceeded
>>>> 6: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>>> exceeded
>>>> 7: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 8: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 9: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max iterations
>>>> exceeded
>>>> 10: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 11: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 12: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 13: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 14: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>> iterations exceeded
>>>> 15: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 16: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 17: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 18: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>> iterations exceeded
>>>> 19: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 20: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 21: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 22: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>> iterations exceeded
>>>> 23: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : Divergence
>>>> 24: In maximizeInterpolant(spline.pts, apl.smooth[j, ]) : max
>>>> iterations exceeded
>>>>
>>>>> sessionInfo()
>>>>
>>>>
>>>> R version 2.15.2 (2012-10-26)
>>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>>
>>>> locale:
>>>> [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United
>>>> Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
>>>> [5] LC_TIME=English_United Kingdom.1252
>>>>
>>>> attached base packages:
>>>> [1] splines stats graphics grDevices utils datasets
>>>> methods base
>>>>
>>>> other attached packages:
>>>> [1] pasilla_0.2.13 DEXSeq_1.2.1 DESeq_1.8.3
>>>> locfit_1.5-8 edgeR_2.6.12 plyr_1.7.1
>>>> genefilter_1.38.0 Biobase_2.16.0
>>>> [9] BiocGenerics_0.2.0 limma_3.12.3
>>>>
>>>> loaded via a namespace (and not attached):
>>>> [1] annotate_1.34.0 AnnotationDbi_1.18.0 biomaRt_2.12.0
>>>> DBI_0.2-5 geneplotter_1.34.0 grid_2.15.2
>>>> hwriter_1.3
>>>> [8] IRanges_1.14.2 lattice_0.20-10 RColorBrewer_1.0-5
>>>> RCurl_1.91-1.1 RSQLite_0.11.1 statmod_1.4.16
>>>> stats4_2.15.2
>>>> [15] stringr_0.6 survival_2.36-14 tools_2.15.2
>>>> XML_3.9-4.1 xtable_1.7-0
>>
>>
>> ______________________________________________________________________
>> The information in this email is confidential and intended solely for the
>> addressee.
>> You must not disclose, forward, print or use it without the permission of
>> the sender.
>> ______________________________________________________________________
More information about the Bioconductor
mailing list