[BioC] Incorrect result in edgeR.calculateCommonDispersion

Thu Jun 5 03:30:22 CEST 2014

Hi Jacob,

Yes, I see the issue.  The edgeR routines assume that y$samples$group 
doesn't have superfluous factor levels.  The culprit is:

  groups <- groups[sel_cols]

If you change this to

  groups <- factor(groups[sel_cols])

all will be well.

Best wishes
Gordon

On Wed, 4 Jun 2014, Jacob Silterra wrote:

> Hi Gordon,
>
> Thanks for the info. My apologies for being unclear, I meant the function
> estimateCommonDisp (and estimateTagwiseDisp) in the package edgeR. I guess
> the attachment didn't go through, I've pasted it below
>
> -Jacob
>
> R script:
> library(edgeR)
>
>
> groups <- factor(c("A", "A", "B", "B", "C", "C"))
> rows <- 10
> cols <- 6
> counts <- matrix( rnorm(rows*cols,mean=100,sd=20), nrow=rows, ncol=cols)
> counts <- round(counts)
>
> #Everything runs smoothly
> y <- DGEList(counts=counts,group=groups)
> y <- calcNormFactors(y)
> y <- estimateCommonDisp(y)
> print(y$common.disp)
> #[1] 0.0310142
>
> #Take out samples from group "B", estimating the dispersion fails
> sel_cols <- c(1,2,5,6)
> counts <- counts[,sel_cols]
> groups <- groups[sel_cols]
> y <- DGEList(counts=counts,group=groups)
> y <- calcNormFactors(y)
> y <- estimateCommonDisp(y)
> print(y$common.disp)
> #[1] 99.99477
> print(warnings())
>
>
> On Wed, Jun 4, 2014 at 8:21 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Dear Jacob,
>>
>> There is no function called edgeR.calculateCommonDispersion in the edgeR
>> package.
>>
>> There also wasn't any attachment with your posting.
>>
>> If you subset a DGEList in such a way that a group is removed entirely,
>> you can prevent any problems by resetting the levels of the group factor:
>>
>>  dge$samples$group <- factor(dge$samples$group)
>>
>> Best wishes
>> Gordon
>>
>>
>>
>> ----------- original message ------------
>> Jacob Silterra jacob at broadinstitute.org
>> Wed Jun 4 19:45:50 CEST 2014
>>
>>
>> Hello all,
>>
>> I've encountered an issue with edgeR when it calculates dispersion, and
>> there aren't any samples for a given group. I believe it happens with both
>> tagwise and common dispersion; same idea. Basically splitIntoGroups will
>> return an empty matrix for that group, which messes up the dispersion
>> calculation. I think it would be better to ignore groups that have no data
>> associated with them. Example attached. This might seem unnecessary, but I
>> have a situation where I read in a matrix with samples of different classes
>> and then remove some groups entirely
>>
>> Thanks,
>> --
>> Jacob Silterra
>> Associate Computational Biologist
>> Broad Institute

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}