[BioC] edgeR on microRNA data
Gordon K Smyth
smyth at wehi.EDU.AU
Sat Oct 1 10:09:53 CEST 2011
Dear Helena,
Compared with mRNA-Seq, you have an unusually small number of transcripts
but a relatively large number of biological replicates. This suggests
that you should use a relative small value for prior.n but a relatively
large value for prop.used. I am concerned that you have decreased
prop.used its default value of 0.3. I would tend to increase this rather
than decrease it.
On the other hand, you have increased prior.n from its default value,
which for your data would be a little over 0.5. Is this simply because it
gave better looking results? Anyway, increasing prior.n does not result
in overfitting. The risk with larger prior.n is simply that it may start
to return differentially expressed miRs that are increased or decreased in
only a few of the samples, rather than consistently for all samples in a
group.
Your experience with prior.n is unintuitive to me. Generally speaking,
choosing prior.n small means that each miR gets to set its own dispersion,
so that miR with large variance will not appear in the topTag list. When
you say "variance outliers", do you mean large or small variance?
Since your minimum group sample size is 10, I would have required miRs to
satisfy your cpm requirement in >= 10 samples rather than 5.
Best wishes
Gordon
> Date: Thu, 29 Sep 2011 05:25:14 +0000
> From: Helena Persson <helena.persson at ki.se>
> To: "bioconductor at stat.math.ethz.ch" <bioconductor at stat.math.ethz.ch>
> Subject: [BioC] edgeR on microRNA data
>
> Hi,
> I would be grateful for some input on using edgeR for small RNA sequence
> data. I have been testing edgeR on a set of miRNA data (3 groups with
> n=10, 15 and 15). After removing genes that are not expressed at >= 0.2
> cpm in >= 5 samples I have ~600 rows left. I tried calculating the
> tagwise dispersion estimate with:
>
> cds1 <- estimateTagwiseDisp(cds1, prior.n=2, trend=TRUE, prop.used=0.1,
> grid=FALSE)
>
> Increasing the prior to e.g. 10 gives more differentially expressed
> genes that do not look bad. Decreasing the prior to 0 leaves me with
> extremely few differentially expressed genes that are mainly variance
> outliers. I guess that miRNA data is likely to behave differently from
> mRNA data since there are so few genes (but still a very large dynamic
> range). Is it possible that I am over-fitting the estimate? Would you
> recommend changing any other parameters?
>
> Best regards,
> Helena
> _________________________________
>
> Helena Persson, PhD
>
> Karolinska Institutet
> Dept of Biosciences and Nutrition
> Hälsovägen 7-9
> SE-141 83 Huddinge
> Sweden
>
> Helena.Persson at ki.se
>
> tel. +46-(0)8-52481058
______________________________________________________________________
The information in this email is confidential and intend...{{dropped:5}}
More information about the Bioconductor
mailing list