[Bioc-sig-seq] "pooled" dispersion estimation in edgeR

Tue Jul 19 06:54:27 CEST 2011

Hi Sean,

Sorry, the code I gave works as is with the devel version edgeR.  With the 
official release version you have to set:

   design <- matrix(1,ncol(y),1)

to get the same effect.

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

On Mon, 18 Jul 2011, Sean Ruddy wrote:

> Hi Gordon,
>
> I wasn't able to get your suggestion to work. estimateGLMCommonDisp() seems
> to want explicit values for the design. If I leave the design argument empty
> I get the error,
>
> Error in as.matrix(design) :
>  argument "design" is missing, with no default
>
> I have release 2.8 installed. My code is
>
> y <- DGEList( countMat )
> y$offset <- log( totals )
> y <- estimateGLMCommonDisp( y , offset = y$offset )
>
> Sorry if I'm missing something obvious.
>
> Thanks,
> Sean
>
>
> On Fri, Jul 15, 2011 at 7:26 PM, Gordon K Smyth <smyth at wehi.edu.au> wrote:
>
>> Hi Sean,
>>
>> On Fri, 15 Jul 2011, Sean Ruddy wrote:
>>
>>  Hi Gordon,
>>>
>>> Thanks for the response. One of my data sets has 8 conditions and no
>>> replicates and so I wanted to emulate DESeq's way of pooling the samples and
>>> also use an offset matrix. I was hoping to avoid doing it manually so that I
>>> don't mess it up. I could do this all in edgeR and pool the samples but I'm
>>> not sure how well this would work under edgeR vs. DESeq.
>>>
>>
>> edgeR has a very flexible interface, so there was no need to explicitly
>> introduce a "pooled" method.  Instead, this sort of thing can be handled by
>> the usual functions in the usual way.  Suppose you have a data object y,
>> which includes an offset matrix:
>>
>>   y$offset <- your matrix
>>
>> Then you can estimate the "pooled" dispersion simply by:
>>
>>   y <- estimateGLMCommonDisp(y)
>>
>> The fact that you don't supply a design matrix means that the samples are
>> automatically treated as one group, i.e., pooled.  You can estimate a
>> trended or tagwise dispersions in the same way.  Then
>>
>>   fit <- glmFit(y,design)  etc
>>
>> will do any analysis you want using dispersions estimated when the samples
>> were pooled.
>>
>> I and the other edgeR authors are anxious to get feedback, so write again
>> if this doesn't turn out to be clear.
>>
>>  I am curious though what sounds off to you in my previous email. I don't
>>> feel entirely comfortable doing this manually but hopefully it's just
>>> because I left out some details. I was trying to follow the DESeq method and
>>> the only difference I saw was in the size factor calculations which I
>>> changed for my own needs by using the offset values for each tag and sample.
>>>
>>
>> Even if you could estimate the variances yourself, I don't see any manual
>> way that you could perform valid statistical tests, while correctly
>> accounting for the offsets.  The whole negative binomial methodology
>> requires genuine counts rather than adjusted counts.  So handling the
>> offsets needs to be built-in.
>>
>> Best wishes
>> Gordon
>>
>>  I appreciate the help!
>>>
>>> Best,
>>> Sean
>>>
>>> On Fri, Jul 15, 2011 at 12:02 AM, Gordon K Smyth <smyth at wehi.edu.au>
>>> wrote:
>>>
>>>  Hi Sean,
>>>>
>>>> I'm curious to know why not use edgeR, since edgeR does what you want and
>>>> DESeq doesn't?
>>>>
>>>> I might be wrong, but the manual analysis that you describe doesn't sound
>>>> right.
>>>>
>>>> Best wishes
>>>> Gordon
>>>>
>>>>  Date: Thu, 14 Jul 2011 12:54:49 -0700
>>>>
>>>>> From: Sean Ruddy <sruddy17 at gmail.com>
>>>>> To: bioc-sig-sequencing at r-project.****org<bioc-sig-sequencing at r-**
>>>>> project.org <bioc-sig-sequencing at r-project.org>>
>>>>> Subject: [Bioc-sig-seq] Supplying own variance functions and adjusted
>>>>>       counts  to a DESeq dataset
>>>>>
>>>>> Hi,
>>>>>
>>>>> I have a RNA-Seq count data set that requires separate offset values for
>>>>> each tag and sample. DESeq does not appear to take a matrix of offset values
>>>>> (unlike edgeR) in any of its functions so I've carried out the analysis
>>>>> manually, ie. calculating a size factor for each tag of each sample,
>>>>> adjusting the counts, then proceeding to calculate means and variances of
>>>>> the adjusted counts, and finally fitting a curve for each condition to the
>>>>> mean-var plot using locfit().
>>>>>
>>>>> Essentially, I'd like to put these variance functions (or at least all
>>>>> the predicted variances) and adjusted counts inside a DESeq object so that I
>>>>> can take advantage of the other functions DESeq offers, tests, plots, etc...
>>>>>
>>>>> Thanks for the help!
>>>>>
>>>>> Sean

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:4}}