[BioC] DESeq and number of replicates required for RNA-Seq

Naomi Altman naomi at stat.psu.edu
Tue Jun 15 04:02:47 CEST 2010


Hi Michael,
I was working this out for a lecture and here is what I found:

If there is enough expression for the Normal approximation to hold 
then here is a rule of thumb.

Suppose that the total number of reads is identical for all samples 
and that there is NO biological variation.  If Yi is the number of 
reads for a gene in sample i, then
Poisson variation alone leads to log(Yi) approx normal with variance 
1/4.  (This is what the DESeq vignette calls "shot" variance.)

Using the formula for a 2 sample t-test, you see that to detect 
2-fold differences (Log2(2)=1) with 95% power at alpha =.05 you need 
n>32 var/log(fold) which is approximately 8 biological reps per treatment.

However, that is for NO biological variation.  (Have a look at the 
example in the DESeq vignette!) And is assumes alpha=.05 (but we are 
going to use a much smaller alpha due to the multiple comparisons 
adjustment).

--Naomi


At 12:57 PM 6/14/2010, michael watson (IAH-C) wrote:
>Hi Naomi
>
>Thanks for the reply.
>
>The issue isn't necessarily low expressing genes, but perhaps high 
>expressing genes with a small (ish) fold change.  DESeq seems to 
>only report as significant differences that are high fold changes.
>
>Contrast this to limma for microarrays, where small fold changes can 
>be reported as significant.
>
>For whatever reason, the transcriptomic community have become 
>fixated on "two-fold" as some kind of standard cut-off.  Now, I'm 
>not fixated on that, but the example in DESeq reports 428 
>significant genes with an estimated fold change at FDR 5%, however, 
>NONE of these are in the range -2 : 2.  The minimum positive logFC 
>is 2.18 (4.5 fold up-regulation), and the maximum negative logFC is 
>2.49 (5.65 fold down-regulation).
>
>So what I am concerned about is finding genes, either highly or 
>lowly expressed, that are differing by a small fold change - say two-fold.
>
>Thanks
>Mick
>________________________________________
>From: Naomi Altman [naomi at stat.psu.edu]
>Sent: 14 June 2010 17:42
>To: michael watson (IAH-C); bioconductor at stat.math.ethz.ch
>Subject: Re: [BioC] DESeq and number of replicates required for RNA-Seq
>
>The issue is a mix of expression level and sample size.  For count
>data, the power is higher when the expression is higher.  Also, the
>p-values are discrete - the lower the total read count, the fewer
>values are possible, which messes up the FDR estimation.
>
>Of course, understanding the problem does not necessarily suggest a
>solution.  But sample sizes will need to be large (or you need to
>sequence very deeply) if you want to detect differential expression
>in low expressing genes.
>
>--Naomi
>
>At 09:45 AM 6/14/2010, michael watson (IAH-C) wrote:
> >Hi
> >
> >This follows on slightly from my experimental design thread.
> >
> >Having worked through the vignette for DESeq, it seems to work
> >well.  However, for the TagSeqExample.tab data set, when using an
> >FDR cut off of 0.05, what we see is that we only find differential
> >expression for large fold changes - an average of log2 fold change
> >of 5 for up-regulated, and log2 fold change of -5 for
> >down-regulated.  There are very few significant results that even go
> >as far down as 2 or -2 - which is still a 4-fold change.
> >
> >So, the question is, how many replicates must we have to get more
> >sensitive results?  Say down to log2FC of 1? (two-fold up or down 
> regulated)?
> >
> >I can calculate this by using DESeq's own estimates of variance to
> >approximate replicates for T and N in the example data, and keep
> >going until my significant results start to hit a logFC of 1, but I
> >wanted to know if anyone else had done this yet?
> >
> >Thanks
> >Mick
> >
> >_______________________________________________
> >Bioconductor mailing list
> >Bioconductor at stat.math.ethz.ch
> >https://stat.ethz.ch/mailman/listinfo/bioconductor
> >Search the archives:
> >http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>Naomi S. Altman                                814-865-3791 (voice)
>Associate Professor
>Dept. of Statistics                              814-863-7114 (fax)
>Penn State University                         814-865-1348 (Statistics)
>University Park, PA 16802-2111
>
>_______________________________________________
>Bioconductor mailing list
>Bioconductor at stat.math.ethz.ch
>https://stat.ethz.ch/mailman/listinfo/bioconductor
>Search the archives: 
>http://news.gmane.org/gmane.science.biology.informatics.conductor

Naomi S. Altman                                814-865-3791 (voice)
Associate Professor
Dept. of Statistics                              814-863-7114 (fax)
Penn State University                         814-865-1348 (Statistics)
University Park, PA 16802-2111



More information about the Bioconductor mailing list