[BioC] edgeR on microRNA data

Sun Oct 2 01:52:28 CEST 2011

Dear Helena,

How large are the common and tagwise dispersions for your data?

Best wishes
Gordon

---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth

On Sat, 1 Oct 2011, Gordon K Smyth wrote:

> Dear Helena,
>
> Compared with mRNA-Seq, you have an unusually small number of transcripts but 
> a relatively large number of biological replicates.  This suggests that you 
> should use a relative small value for prior.n but a relatively large value 
> for prop.used.  I am concerned that you have decreased prop.used its default 
> value of 0.3.  I would tend to increase this rather than decrease it.
>
> On the other hand, you have increased prior.n from its default value, which 
> for your data would be a little over 0.5.  Is this simply because it gave 
> better looking results?  Anyway, increasing prior.n does not result in 
> overfitting.  The risk with larger prior.n is simply that it may start to 
> return differentially expressed miRs that are increased or decreased in only 
> a few of the samples, rather than consistently for all samples in a group.
>
> Your experience with prior.n is unintuitive to me.  Generally speaking, 
> choosing prior.n small means that each miR gets to set its own dispersion, so 
> that miR with large variance will not appear in the topTag list.  When you 
> say "variance outliers", do you mean large or small variance?
>
> Since your minimum group sample size is 10, I would have required miRs to 
> satisfy your cpm requirement in >= 10 samples rather than 5.
>
> Best wishes
> Gordon
>
>> Date: Thu, 29 Sep 2011 05:25:14 +0000
>> From: Helena Persson <helena.persson at ki.se>
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: [BioC] edgeR on microRNA data
>> 
>> Hi,
>
>> I would be grateful for some input on using edgeR for small RNA sequence 
>> data. I have been testing edgeR on a set of miRNA data (3 groups with n=10, 
>> 15 and 15). After removing genes that are not expressed at >= 0.2 cpm in >= 
>> 5 samples I have ~600 rows left. I tried calculating the tagwise dispersion 
>> estimate with:
>> 
>> cds1 <- estimateTagwiseDisp(cds1, prior.n=2, trend=TRUE, prop.used=0.1, 
>> grid=FALSE)
>> 
>> Increasing the prior to e.g. 10 gives more differentially expressed genes 
>> that do not look bad. Decreasing the prior to 0 leaves me with extremely 
>> few differentially expressed genes that are mainly variance outliers. I 
>> guess that miRNA data is likely to behave differently from mRNA data since 
>> there are so few genes (but still a very large dynamic range). Is it 
>> possible that I am over-fitting the estimate? Would you recommend changing 
>> any other parameters?
>> 
>> Best regards,
>> Helena
>> _________________________________
>> 
>> Helena Persson, PhD
>> 
>> Karolinska Institutet
>> Dept of Biosciences and Nutrition
>> Hälsovägen 7-9
>> SE-141 83 Huddinge
>> Sweden
>> 
>> Helena.Persson at ki.se
>> 
>> tel. +46-(0)8-52481058

______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}