[BioC] Limitations in edgeR?

Fri Apr 4 02:53:10 CEST 2014

Hi Gordon, 

Then I think perhaps this is a formatting issue.

I've been told now that normalizing the dataset outside of edgeR is a big no no, but because we're working with piRNAs and miRNAs, we want to normalize based on sequencing depth alone. When I'm cutting down the original data set, it's because I'm eliminating what we've determined to be pseudocounts. Thus we're rerunning the analysis on about half of the matched piRNAs.

Eleanor 

On Apr 3, 2014, at 4:45 PM, Gordon K Smyth wrote:

> Dear Eleanor,
> 
> Well, a couple of comments.
> 
> First, edgeR does not have a limitation on the number of genes it can run on.
> 
> I suggest that you upgrade the most recent version of edgeR, which I suspect you do not have, and run
> 
>  y <- estimateDisp(y,design)
> 
> Second, given that you have already analyzed the full set of piRNAs successfully, why in the world would you need to rerun the analysis on just half of them?  This does seem like a self-inflicted problem.
> 
> Gordon
> 
>> Date: Wed, 2 Apr 2014 09:58:23 -0700
>> From: Eleanor Su <eleanorjinsu at gmail.com>
>> To: Steve Lianoglou <lianoglou.steve at gene.com>
>> Cc: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: Re: [BioC] Limitations in edgeR?
>> 
>> Hi Steve,
>> 
>> I'm running the same analysis on both datasets (the larger and the
>> smaller). When I rerun the analysis on the smaller dataset (which actually
>> IS half of the identities from the larger data set), I come across an error
>> message when estimating glm trended dispersion. Here are the commands I'm
>> using:
>> 
>>> rawdata<-read.delim("piRNAtotalcount>10.txt", check.names=FALSE,
>> stringsAsFactors=FALSE)
>>> y <- DGEList(counts=rawdata[,2:11], genes=rawdata[,1])
>>> Family<-factor(c(6,6,9,9,11,11,26,26,28,28))
>>> Treatment<-factor(c("C","H","C","H","C","H","C","H","C","H"))
>>> data.frame(Sample=colnames(y),Family,Treatment)
>>  Sample Family Treatment
>> 1      6C      6         C
>> 2      6H      6         H
>> 3      9C      9         C
>> 4      9H      9         H
>> 5     11C     11         C
>> 6     11H     11         H
>> 7     26C     26         C
>> 8     26H     26         H
>> 9     28C     28         C
>> 10    28H     28         H
>>> design<-model.matrix(~Family+Treatment)
>>> rownames(design)<-colnames(y)
>>> y<-estimateGLMTrendedDisp(y,design)
>> Error in optim(par0, fun, y = y.nonzero[i, ], design = design, offset =
>> offset.nonzero[i,  :
>>     function cannot be evaluated at initial parameters
>> 
>> I only encounter this error when running the smaller dataset.
>> 
>> Best,
>> Eleanor
>> 
>> 
>> 
>> On Wed, Apr 2, 2014 at 9:49 AM, Steve Lianoglou <lianoglou.steve at gene.com>wrote:
>> 
>>> Hi Eleanor,
>>> 
>>> On Tue, Apr 1, 2014 at 11:09 AM, Eleanor Su <eleanorjinsu at gmail.com>
>>> wrote:
>>>> Hi All,
>>>> 
>>>> I'm currently trying to analyze differential expression of piRNAs in some
>>>> small data sets but am coming across issues that I didn't before when I
>>>> analyzed with a larger data set. The larger data set contained 324 piRNA
>>>> identities while the smaller data set contained half as many piRNA
>>>> identities. Is there a minimum number of gene identities required in
>>> order
>>>> to analyze differential expression in edgeR?
>>> 
>>> It's hard to help without knowing what the issues are that you are
>>> running into, so ... what's going wrong?
>>> 
>>> One way you could explore this question yourself is to use the larger
>>> (324 piRNA) dataset that "went well" and simply take half of the data
>>> from it and rerun the same analysis on the smaller set. Do you get
>>> different results?
>>> 
>>> While you're playing with that idea, please provide a follow up email
>>> with more specific details about what the issues are that you are
>>> running into with your new (smaller) dataset.
>>> 
>>> HTH,
>>> -steve
>>> 
>>> --
>>> Steve Lianoglou
>>> Computational Biologist
>>> Genentech
> 
> ______________________________________________________________________
> The information in this email is confidential and inte...{{dropped:6}}