[BioC] edgeR on microRNA data
Gordon K Smyth
smyth at wehi.EDU.AU
Sun Oct 2 01:52:28 CEST 2011
Dear Helena,
How large are the common and tagwise dispersions for your data?
Best wishes
Gordon
---------------------------------------------
Professor Gordon K Smyth,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
smyth at wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth
On Sat, 1 Oct 2011, Gordon K Smyth wrote:
> Dear Helena,
>
> Compared with mRNA-Seq, you have an unusually small number of transcripts but
> a relatively large number of biological replicates. This suggests that you
> should use a relative small value for prior.n but a relatively large value
> for prop.used. I am concerned that you have decreased prop.used its default
> value of 0.3. I would tend to increase this rather than decrease it.
>
> On the other hand, you have increased prior.n from its default value, which
> for your data would be a little over 0.5. Is this simply because it gave
> better looking results? Anyway, increasing prior.n does not result in
> overfitting. The risk with larger prior.n is simply that it may start to
> return differentially expressed miRs that are increased or decreased in only
> a few of the samples, rather than consistently for all samples in a group.
>
> Your experience with prior.n is unintuitive to me. Generally speaking,
> choosing prior.n small means that each miR gets to set its own dispersion, so
> that miR with large variance will not appear in the topTag list. When you
> say "variance outliers", do you mean large or small variance?
>
> Since your minimum group sample size is 10, I would have required miRs to
> satisfy your cpm requirement in >= 10 samples rather than 5.
>
> Best wishes
> Gordon
>
>> Date: Thu, 29 Sep 2011 05:25:14 +0000
>> From: Helena Persson <helena.persson at ki.se>
>> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
>> Subject: [BioC] edgeR on microRNA data
>>
>> Hi,
>
>> I would be grateful for some input on using edgeR for small RNA sequence
>> data. I have been testing edgeR on a set of miRNA data (3 groups with n=10,
>> 15 and 15). After removing genes that are not expressed at >= 0.2 cpm in >=
>> 5 samples I have ~600 rows left. I tried calculating the tagwise dispersion
>> estimate with:
>>
>> cds1 <- estimateTagwiseDisp(cds1, prior.n=2, trend=TRUE, prop.used=0.1,
>> grid=FALSE)
>>
>> Increasing the prior to e.g. 10 gives more differentially expressed genes
>> that do not look bad. Decreasing the prior to 0 leaves me with extremely
>> few differentially expressed genes that are mainly variance outliers. I
>> guess that miRNA data is likely to behave differently from mRNA data since
>> there are so few genes (but still a very large dynamic range). Is it
>> possible that I am over-fitting the estimate? Would you recommend changing
>> any other parameters?
>>
>> Best regards,
>> Helena
>> _________________________________
>>
>> Helena Persson, PhD
>>
>> Karolinska Institutet
>> Dept of Biosciences and Nutrition
>> Hälsovägen 7-9
>> SE-141 83 Huddinge
>> Sweden
>>
>> Helena.Persson at ki.se
>>
>> tel. +46-(0)8-52481058
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}
More information about the Bioconductor
mailing list